United States v. Mitchell
365 F.3d 215 (3d Cir. 2004)

BECKER, Circuit Judge.

This appeal by Byron Mitchell from a judgment in a criminal case raises important questions concerning the admissibility of latent fingerprint identification evidence under Fed.R.Evid. 702. We adjudicate on the basis of a voluminous record developed at a Daubert hearing, and explore in considerable detail the application of the various Daubert factors to the prosecution's expert testimony. We conclude that the testimony passes Daubert muster, and that there are "good grounds" for its admission. In a related matter, we must decide whether the District Court properly took judicial notice that "human friction ridges are unique and permanent throughout the area of the friction ridge skin, including small friction ridge areas, and that ... human friction ridge skin arrangements are unique and permanent." We conclude that the District Court erred in taking judicial notice, but that the error was harmless.

* * *

II. Facts and Procedural History

A. The Offense and Mitchell's First Trial and Appeal

This case began in 1991 when two men with handguns robbed an armored car employee of approximately $20,000 as he entered a check cashing agency at 29th Street and Girard Avenue in North Philadelphia. The robbers then got into a beige car driven by a third person, engaging in gunfire with the armored car employees as they fled. The beige car, which had been stolen about an hour beforehand, was abandoned by the robbers roughly a mile from the agency. The government sought to prove at trial that the robbers were William Robinson (a/k/a "Bookie") and Terrence Stewart (a/k/a "T"), and that the getaway driver was Mitchell. According to the government, the robbery had a fourth participant, Kim Chester, who knew of the plans, helped case the robbery site, and assisted the others in spending the proceeds of the robbery. Chester testified for the prosecution at Mitchell's trial as an uncharged accomplice. Both Robinson and Stewart died before trial, and thus Mitchell was the sole defendant.

Mitchell was charged with conspiracy to commit and commission of Hobbs Act robbery, 18 U.S.C. § 1951, and use of and carrying a firearm during a crime of violence, 18 U.S.C. § 924(c). In the first trial, at which Mitchell was convicted of all counts, the government introduced into evidence an anonymous note that had been left in the front seat of the abandoned beige car, apparently written by someone who observed the robbers exiting the beige car and getting into a different car. The note read, "Light green ZPJ-254. They changed cars; this is the other car." On appeal, we held the note to be inadmissible hearsay not subject to any exception in Fed. R. Evid. 803. United States v. Mitchell, 145 F.3d 572 (3d Cir. 1998). In view of the limited other evidence connecting Mitchell to the robbery--Chester's testimony was questionable, no robbery proceeds were ever linked to Mitchell, and the fingerprints recovered from the beige getaway car were identified as Mitchell's but in poor condition--we concluded that admission of the anonymous note was not harmless error. Id. at 579-80. Accordingly, we vacated Mitchell's conviction and remanded for a new trial. Id.

B. Latent Fingerprint Identification and the Daubert Hearing

Prior to the retrial, the District Court conducted a lengthy Daubert hearing on the admissibility under Fed. R. Evid. 702 of the government's expert testimony (and Mitchell's counter-experts) on the identification of fingerprints found on the gear shift lever and driver's side door of the beige getaway car. This hearing was to adjudicate a major attack mounted by Mitchell on the government's fingerprint evidence. As with any expert testimony, some background in the field and an introduction to the jargon is helpful, and so we discuss the field of latent fingerprint identification in general before turning to the particulars of the Daubert hearing.

1. The Field of Latent Fingerprint Identification

Criminals generally do not leave behind full fingerprints on clean, flat surfaces. Rather, they leave fragments that are often distorted or marred by artifacts, terms we explain in the margin.1 These "latent" prints--from the Latin lateo, "to lie hidden," because they are often not visible to the naked eye until dusted or otherwise revealed--are the typical grist for the fingerprint identification expert's mill. Testimony at the Daubert hearing suggested that the typical latent print is a fraction--perhaps 1/5th--of the size of a full fingerprint. A "full" fingerprint is familiar to anyone who has been fingerprinted for identification or law enforcement reasons: It is the print made by rolling the full surface of the fingertip onto a fingerprint card or electronic fingerprint capture device. (These prints are, for obvious reasons, also referred to as "rolled prints" or "full-rolled prints.") A full set of full-rolled fingerprints on a card—as would be taken during a police booking, for example—is known as a "ten-print card." Ten-print cards usually also have space at the bottom of the card for "flat impressions" or "plain impressions," where all four fingers of the hand are pressed at once onto the card without rolling. 1. In the jargon, artifacts are generally small amounts of dirt or grease that masquerade as parts of the ridge impressions seen in a fingerprint, while distortions are produced by smudging or too much pressure in making the print, which tends to flatten the ridges on the finger and obscure their detail.

Rolled prints and latent prints alike are subject to artifacts and distortions, though the problems with latent prints are more acute because they are smaller, and left more carelessly than full-rolled prints, and are left on surfaces that many other fingers have also touched. See Andre Moenssens et al., Scientific Evidence in Civil and Criminal Cases, § 8.08 at 514 (4th ed. 1995) ("Many latent impressions developed at crime scenes are badly blurred or smudged, or consist of partially superimposed impressions of different fingers."). Fingerprints are left by the depositing of oil upon contact between a surface and the friction ridges of fingers. The field uses the broader term "friction ridge" to designate skin surfaces with ridges evolutionarily adapted to produce increased friction (as compared to smooth skin) for gripping. Thus toeprint or handprint analysis is much the same as fingerprint analysis. The structure of friction ridges is described in the record before us at three levels of increasing detail, designated as Level 1, Level 2 and Level 3. Level 1 detail is visible with the naked eye; it is the familiar pattern of loops, arches, and whorls. Level 2 detail involves "ridge characteristics"—the patterns of islands, dots, and forks formed by the ridges as they begin and end and join and divide. The points where ridges terminate or bifurcate are often referred to as "Galton points," whose eponym, Sir Francis Galton, first developed a taxonomy for these points. The typical human fingerprint has somewhere between 75 and 175 such ridge characteristics. Level 3 detail focuses on microscopic variations in the ridges themselves, such as the slight meanders of the ridges (the "ridge path") and the locations of sweat pores. This is the level of detail most likely to be obscured by distortions.

The FBI—the agency that made the primary identification in this case—uses an identification method known as ACE-V, an acronym for "analysis, comparison, evaluation, and verification." The basic steps taken by an examiner under this protocol are first to winnow the field of candidate matching prints by using Level 1 detail to classify the latent print. Next, the examiner will analyze the latent print to identify Level 2 detail (i.e., Galton points and their spatial relationship to one another), along with any Level 3 detail that can be gleaned from the print. The examiner then compares this to the Level 2 and Level 3 detail of a candidate full-rolled print (sometimes taken from a database of fingerprints, sometimes taken from a suspect in custody), and evaluates whether there is sufficient similarity to declare a match. In the final step, the match is independently verified by another examiner, though there is some dispute about how truly independent this verification is.

The standards used by the FBI at the evaluation stage of the ACE-V protocol are somewhat less concrete than the numerical descriptions found in television police dramas that extol "twenty-point matches" and the like. An n-point match refers to a match between an unknown latent print and a known full print in which the examiner has identified n corresponding Galton points in the correct geometry relative to one another. A number of jurisdictions both outside the United States and within seem to rely on a system where a minimum number of corresponding points must be found before a match may be declared, irrespective of Level 3 detail. See, e.g., 2 Paul C. Giannelli & Edward Imwinkelried, Scientific Evidence § 16-7(A), at 768 (3d ed. 1999) ("In France, the required number [of points for a match] used most often is 24 while the number is 30 in Argentina and Brazil."). Such jurisdictions are said to use a "point system." On the other hand, Canada does not have a minimum point threshold for identification, and the United Kingdom recently eliminated a minimum point threshold. See United States v. Llera Plaza, 188 F.Supp.2d 549, 569-70 (E.D.Pa. 2002) (quoting Lord Lester of Herne Hill's colloquy with Lord Rooker). The alternative approach, which gained favor with the FBI in the late 1940s, is to use a combination of quantity and quality: If ridge characteristics are abundant, then the quality of Level 3 detail is unimportant; but a paucity of Galton points can be compensated for by high-quality Level 3 detail. While this has the advantage of allowing an examiner to find a match in situations where an examiner using a strict point-based standard would not find one, this flexibility comes at the price of substituting a degree of subjectivity for an objective numerical standard.

2. The Daubert Hearing

The District Court held a five-day hearing pursuant to Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), to rule on the admissibility of the government's and Mitchell's proposed expert testimony. The record of this marathon hearing alone comprises nearly one thousand pages of testimony and a similarly voluminous array of exhibits. The government called six witnesses (plus one rebuttal witness), and Mitchell, four. The District Court found all the offered expert witnesses to be qualified in their respective fields, and neither party raises a challenge to the qualifications, as such, of the witnesses. Rather, both sides' issues lie with the content of the testimony accepted by the District Court. We briefly describe the areas of testimony of each of the witnesses, starting with the government's witnesses.

a. The Government's Experts

Steven Meagher, an FBI special agent, testified at the hearing about Level 1, Level 2, and Level 3 detail (as described above), and other aspects of fingerprint identification. With regard to the FBI's practices, technology, and operations, he testified about the ACE-V protocol; that the FBI does not rely on a minimum "points" standard for matching fingerprints (and why it does not); and about the Automated Fingerprint Identification System ("AFIS") computer system (which automates some preliminary aspects of fingerprint matching). Meagher also described a survey (which we discuss, infra) of state fingerprint identification agencies that he prepared and circulated for the purpose of demonstrating that the fingerprint match in this case was, by wide consensus, correct. He also described an experiment (which we also discuss, infra) designed and run in cooperation with the contractor for the FBI's AFIS computer system, Lockheed Martin, that would search a portion of the AFIS database for identical fingerprints. Donald Zeisig, of Lockheed Martin, and Bruce Budowle, a statistician and population geneticist with the FBI,[*] were also involved in this experiment, and both testified at the Daubert hearing. Zeisig also testified in greater detail about the technical background of the AFIS computer system.

Editor's note: Bruce Budowle is a chemist who was a dominant figure in developing the FBI's procedures for forensic DNA analysis and defending these procedures in court.

The government offered two witnesses focusing principally on the biological aspects of fingerprints. Dr. William Babler, of Marquette University, testified about the prenatal development of friction ridges, opining that unique arrangements of friction ridges develop in the womb within a matter of months after conception. He also testified to the medical community's accepted understanding of the anatomical and cellular bases for the permanence of friction ridge arrangements. Ed German, of the United States Army Criminal Investigation Laboratory, testified to the lack of similarity found between corresponding fingerprints of identical twins, a conclusion established by his own research on identical twins and confirmed by other studies of identical twins.

The government also offered David Ashbaugh, of the Royal Canadian Mounted Police, who testified broadly about the development, comparison, and identification of friction ridge skin and impressions. Like the other government witnesses who were examined on the matter (viz., Agent German, Agent Meagher, and Dr. Budowle) he responded that it was his opinion that friction ridge arrangements were unique (the "uniqueness proposition") and permanent (the "permanence proposition"), and that positive identifications can be made from fingerprints containing sufficient quantity and quality of ridge detail. Dr. Babler also opined that friction ridge arrangements are unique and permanent. These propositions were the foundation of the government's argument that latent fingerprint identification evidence satisfies Daubert.

The government conducted two experiments in anticipation of the Daubert hearing: (1) a survey of state fingerprint identification agencies asking them, inter alia, if they could match the latent prints in this case to Mitchell's ten-print card;[**] and (2) a search for identical fingerprints using data in the AFIS computer system. The specifics of these experiments bear on their relevance as expert evidence, and so we describe them in some detail.

** Editor's note: Is this survey an "experiment"?

For purposes of this case, Meagher created a survey packet that was sent out to the principal law enforcement agency of each of the fifty states, plus the District of Columbia, Canada's Royal Canadian Mounted Police, and the United Kingdom's Scotland Yard. The survey contained three parts: Part A involved questions about whether the agency currently accepts fingerprints as a means to individualize (i.e., make an identification), and about whether the agency regards fingerprints as unique and permanent. All fifty-three recipients responded in the affirmative to both queries. Part C inquired whether the agencies had ever found two individuals to have the same fingerprint; the response was, unanimously, no. Part C also revealed that, in the aggregate, the ten-print records of nearly 70 million individuals—or about 700 million fingerprints--have been examined during the course of the agencies' operations.

Part B of the survey was designed as a demonstration of the ACE-V identification protocol, and it used the latent fingerprints at issue in this case. Part B offered each agency photographs of the two latent prints and of Mitchell's ten-print card. Agencies were asked first to attempt to identify the ten-print card using their own computerized fingerprint database. It is common practice (for efficiency's sake) to "filter" the database in making an identification, by considering only the subset of records (by race, sex, date of birth, etc.) that are likely to result in a match. Meagher requested that agencies not filter their database for this test, to ensure that the prints were compared against the maximum possible number of print records. Of the forty-seven agencies that responded, the only match that was found was in Pennsylvania, where Mitchell's ten-print record was already on file.

In the second segment of Part B, agencies were asked to attempt to match the latent prints to their existing records. The only "hits" were made by the two agencies (Mississippi and South Dakota) that inputted the ten-print card supplied by Meagher into their system prior to running the search (and thus raised the likelihood of a match). Pennsylvania was unable to run this search because of equipment troubles, but represented that it undoubtedly would have made a match if its system were fully operative.

The third segment of Part B asked agencies to perform manual comparisons of the latent prints to the ten-print card provided to them. This survey was single-blind, i.e., while Meagher knew that the latent prints had been identified as Mitchell's, knew that the ten-print card was Mitchell's, and believed the latents could be matched to the ten-print card, none of the survey recipients was told any of this. Roughly two thirds of the agencies responded to this portion. Over three quarters of the responding agencies matched both prints consistently with the FBI's identification. Of those that did not match both prints, half matched only one print consistent with the FBI's identification, and half matched neither print. In followup communications, the FBI either convinced these non-identifying agencies that a match did exist and they so acknowledged (though it took the strong suggestion of annotated blown-up photographs of the prints), or otherwise established reasons for the non-identification (e.g., the examiner deemed the quality of the supplied photographs to be too poor to make an identification, and would have preferred an original; or the comparison was performed by an inexperienced examiner, and on review, a senior examiner was able to find a match).

A critical summary point is that no agency ever registered a "false" positive (i.e., a positive match that contradicted the FBI's result): In the first segment of Part B, no agency matched Mitchell's ten-print card to someone else's ten-print card; in the second segment, no agency matched the latent prints to anyone other than Mitchell; and in the third segment, no agency matched a latent print to any finger other than the one to which the FBI had matched the latent print.

The second experiment conducted by the government's experts was known as the "50/50" experiment. This was an empirical examination by computer of a subset of the FBI's fingerprint records to search for pairs of very similar fingerprints taken from different sources. Finding such a pair would undermine the uniqueness proposition that the government's other experts testified was well-established. The experiment data set was a set of fifty thousand prints (out of about 340 million in the FBI's AFIS computer system). Rather than select these fifty thousand prints at random, the experimenters (Agent Meagher, Mr. Zeisig, and Dr. Budowle) took them from the subset of prints that were from white males and exhibited a left-sloped whorl pattern at Level 1 detail. The experimenters also ensured that multiple prints from the same person were included in the set of fifty thousand. The effect of these restrictions was to bias, from the outset, the prints toward being more similar (and hence more likely to contain a matching pair).3

3. An analogy may illustrate this biasing effect: Consider a large multicolored pile of crayons produced by mixing several boxes of crayons. If one chooses a dozen "dark" crayons at random, one is more likely to find among those dozen crayons a pair of exactly the same color than one is to find such a pair if one selects a dozen crayons at random from the pile at large.

In the first part of the test, a computer program-using the same algorithms as the FBI's AFIS computer system uses to match prints-attempted to match each of the fifty thousand prints against the full set of fifty thousand prints (hence the moniker "50/50"). Thus, a total of 50,000 x 50,000, or 2.5 billion, comparisons were performed. For each print, the best match was, by an enormous margin, itself.4 Based on statistical extrapolation from these results, the experimenters put the chances of a single full-rolled print matching another full-rolled print from anyone in the world other than the person who deposited the print at approximately one in ten to the eighty-sixth power (i.e., 1 chance in 1 followed by 86 zeroes), a very low probability indeed.

4. We note that the comparisons were run for each print against all 50,000 prints, not against the other 49,999 prints. Thus, every print was assured of having a tautologically perfect match (i.e., itself) that could serve as a baseline for statistical comparisons. This was done to quantify statistically how much better the perfect match was than all other comparisons. The cases in which a print was a strong match for a print other than itself were subsequently discovered to be the product of a double-entry in the database (i.e., a set of prints from the same person had been entered into the database twice). The experimenters testified that the system's ability to catch this unintentional duplication bolstered their confidence in its capabilities.

Apparently recognizing that analysis of full-rolled prints was not particularly germane to the question of the identification of latent partial prints, the government's witnesses conducted a second experiment. From each of the fifty thousand prints, they had the computer create a simulated latent print (referred to as a "pseudolatent print" or simply a "pseudolatent"), as might be recovered from a crime scene, by taking only about a fifth of the full-rolled print.5 They then ran a similar fifty thousand-by-fifty thousand comparison to see how strongly the pseudolatent prints matched full prints from which they had not been derived. With one exception which we identify in the margin, each pseudolatent was a strong match with the full print from which it had been derived, by a wide margin over any other full print.6 Statistical computations based on this experiment put the probability of a latent partial print matching the full print of anyone in the world other than the person who deposited the print at approximately one in ten to the sixteenth power (i.e., 1 in 10,000,000,000,000,000), also a very low probability.

5. The pseudolatents were 21.7% of the areal size of the full print, a figure which Meagher determined was the average size of a set of actual latent prints that he had previously used for testing.

6. Meagher explained that the sole exception was caused by a poorly created fingerprint card. On the card in question, the flat impression had strayed out of the region on the card designated for the flat impression, and had left part of a print in the box designated for one of the rolled impressions. Consequently, one of the boxes for a rolled print actually contained a rolled print, plus a fair-sized piece of a flat print of a different finger. As a result, the strong match found by computer was actually a match between the pseudolatent print and the stray portion of the flat print. As with the database error discovered in the first stage of the 50/50 experiment, the experimenters found this mistaken match to be evidence of the robustness of their computer system.

b. Mitchell's Experts

Mitchell's first witness at the Daubert hearing was Marilyn Peterman, an investigator with the Defender Association of Philadelphia who took statements from those fingerprint examiners at state agencies who had failed to match the latent prints to Mitchell's ten-print card in completing Part B of the FBI's survey.7 She described which agencies adhered to a point system, how many points they required to make an identification, and noted that the agencies that did not find a match generally reported that they had found an insufficient number of points of similarity between the latent print and the ten-print card. Ms. Peterman also reported on the varying levels of experience and accreditation of the examiners who performed the comparisons for the agencies.

7. It appears that, in the interest of efficiency, the parties consented to introducing hearsay from the examiners who completed the FBI survey-primarily through Agent Meagher for the government, and through Ms. Peterman for Mitchell.

The first of Mitchell's three major experts was Dr. David Stoney, the director of the McCrone Research Institute in Chicago, a not-for-profit organization engaged in teaching and research in the forensic sciences. Dr. Stoney was, in Mitchell's counsel's words at the Daubert hearing, offered as an expert "with respect to whether a fingerprint examiner's conclusion that a latent fingerprint came from a particular individual is a scientific determination." The nucleus of Dr. Stoney's opinion is summarized in a portion of his testimony at the hearing:

The determination that a fingerprint examiner ... makes when comparing a latent fingerprint with a known fingerprint, specifically the determination that there is sufficient basis for an absolute identification, is not a scientific determination.... It is a subjective determination without objective standards to it.

Now, by "subjective" I mean that it is one that is dependent on the individual's expertise, training, and the consensus of their agreement of other individuals in the field. By "not scientific" I mean that there is not an objective standard that has been tested; nor is there a subjective process that has been objectively tested. It is the essential feature of a scientific process that there be something to test, that when that something is tested, the test is capable of showing it to be false.

Dr. Stoney opined that the evaluation phase of the ACE-V protocol requires the examiner to make a binary determination: Either two prints match sufficiently to make an absolute identification, or they do not. This Dr. Stoney contrasted to certain other forensic disciplines in which intermediate determinations are expressed in probabilistic terms. Dr. Stoney further objected to any characterization of fingerprint identification as having a "zero error rate," explaining that "something with a zero error rate cannot be a science.... [I]f we start out saying fundamentally something can't be shown to be wrong, then it means that we can't test it. If we can't test it, ... there's no way to show that it is wrong."

Dr. Stoney also criticized the 50/50 experiment. He noted first the undisputed proposition that two impressions of the same friction ridges will not be identical-—artifacts and distortions will invariably appear.8 In that experiment, a fingerprint was compared against itself and 49,999 other fingerprints taken from the FBI's database. Hence, Dr. Stoney explained, the simulated task modeled by the 50/50 experiment was that of matching Print 1 and (the identical) Print 1 of Finger A. In his submission, the task in real-world fingerprint identification is one of matching Print 1 and Print 2 of Finger A. Thus, Stoney reasoned, the 50/50 experiment as executed assessed how much better a match is found between Print 1 and (the identical) Print 1 of Finger A than between Print 1 of Finger A and Print 1 of Finger B. A more meaningful version of the 50/50 experiment, Dr. Stoney explained, would have asked how much better a match is found between Print 1 and Print 2 of Finger A than between Print 1 of Finger A and Print 1 of Finger B.9

8. This point also underpins Dr. Stoney's more general criticism of the discipline of latent fingerprint identification: Dr. Stoney agreed that human friction ridges are unique and permanent, including small areas, but suggested that this alone is unhelpful on the question whether prints are identifiable, because fingerprints are so subject to distortion and the forensic identification process is so flawed.

9. We note, however, that such an experiment was beyond the immediate capability of the government because its database, by design, does not have multiple prints from the same finger.

Dr. Stoney further criticized the method used to create the pseudolatent prints in the second part of the experiment. Dr. Stoney explained that it was established in the literature that simple masking, and even computer-generated blurring, of full prints cannot adequately simulate real latent partial prints. Dr. Stoney's ultimate conclusion was that these experimental defects rendered the probabilities derived by the government experts meaningless.

The defense's second principal expert was James Starrs, a professor in the Department of Forensic Sciences and the law school at George Washington University. Prof. Starrs has had a long career at the intersection of law and forensic science; indeed, an article by Prof. Starrs was cited by the Supreme Court in Daubert. See Daubert, 509 U.S. at 591 (citing James E. Starrs, Frye v. United States Restructured and Revitalized: A Proposal to Amend Federal Evidence Rule 702, 26 Jurimetrics J. 249, 258 (1986)). Prof. Starrs was offered as an "exert [sic] in forensic science qualified to provide an opinion as to whether latent fingerprint examination meets the criteria of science." Like Dr. Stoney, Prof. Starrs testified that it was his opinion that "[the current practice of] fingerprint comparison and analysis is not predicated on a sound and adequate scientific basis for purposes of making an individualization to one person from a fragmentary print to the exclusion of all other persons in the world."

To support his conclusion, Prof. Starrs highlighted five aspects of fingerprint examination that in his opinion were inconsistent with a scientific discipline: (1) claims to "absolute certainty"; (2) "the failure to carry out controlled empirical-data-searching experimentation"; (3) a failure to engage in error-rate analysis; (4) the lack of uniformity, objectivity, systematization, and standards; (5) "a failure to show a due regard to a vigorous and uncompromising skepticism." In elaborating on each of these points, Prof. Starrs gave illustrations. For example, he briefly described a case of false identification; he described some of the subtle and non-systematized aspects of analyzing Galton points, and he criticized some aspects of the training of new fingerprint examiners. Prof. Starrs also explained that he viewed the government's testimony and experiments involving full-rolled prints as irrelevant to the question of latent partial print identification. However, under cross-examination Prof. Starrs was agnostic on whether the propositions he challenged as unproven might, in the end, be scientifically supportable.

Mitchell's final expert at the Daubert hearing was Simon Cole, a post-doctoral fellow at Rutgers University, with expertise in "science and technology studies with particular expertise regarding the fingerprint profession." Dr. Cole had no experience in latent print examination. From his research, Dr. Cole identified four explanations for the widespread acceptance of fingerprint identification evidence: First, from the earliest days of the discipline, fingerprint examiners have developed an "occupational norm of unanimity," i.e., examiners would not publicly disagree with one another about an identification. Second, in terms of the way in which the fingerprint examination community handled the instances of known misidentification, such cases would, Dr. Cole explained, be blamed on practitioner incompetence or misconduct.10 Third was a simple lack of judicial scrutiny--a sort of snowball effect of string citations to cases and treatises approving fingerprint identification evidence. Fourth was a lack of an organized counter-expert group, a notable difference, Dr. Cole explained, between fingerprint identification and, say, psychiatric diagnosis. Dr. Cole also opined that fingerprint identification was not scientific because, inter alia, the fingerprint identification community had not engaged in studies that attempt to falsify the discipline's premises; did not engage in anonymous, critical (as opposed to positive) peer review; and did not recognize error rates.

10. Dr. Cole noted that both of these first two explanations were well illustrated by the FBI's survey: Agent Meagher followed up with each agency until a match was agreed to, or otherwise identified inexperienced examiners as the source of nonidentifications.

c. Mitchell's Exhibits

As part of the Daubert hearing, Mitchell also introduced several hundred pages of documentary exhibits, principally journal articles and other excerpts from the corpus of literature criticizing the practice and theory of latent fingerprint identification, authored by his experts and by others. Also introduced were the results of some fingerprint proficiency tests, which suggested that examiners were prone to both false negatives (i.e., declaring a nonidentification where an identification should have been made) and false positives (i.e., making an incorrect identification). Finally, the defense introduced a survey of jurors that found that 93% agreed with the statement "fingerprint identification is a science" and 85% agreed with the statement "fingerprints are the most reliable means of identifying a person."

d. The Government's Rebuttal Witness

To respond to defense testimony regarding the "occupational norm of unanimity" among fingerprint examiners, the government offered Pat Wertheim, a fingerprint examiner, as a rebuttal witness. Wertheim testified that he and David Grieve (who was present but did not testify) were involved as defense experts in a case of false identification in the United Kingdom. Based on their examination of the evidence in that case--which was both independent of the U.K. authorities and independent of each other--they testified, in opposition to the prosecution's expert, that the latent print in that case could not be matched to the defendant. The purpose of this testimony was to counter Dr. Cole's contentions about the occupational norm of unanimity within the discipline.

3. The District Court's Daubert and Judicial Notice Rulings

Two months after the Daubert hearing concluded, the District Court ruled from the bench on the admissibility of expert testimony at trial. In relevant part, the Court stated:

The matter presently pending before the Court is in reference to the defense motion to exclude the government's fingerprint identification evidence, and based on the Daubert hearing and also Kumho, this Court denies the defendant's motion. And pursuant thereto, this court is not going to make a determination as to the particular area of scientific knowledge and technical or specialized knowledge.

* * *

Further, pursuant to this Court's ruling, this Court finds that the government's fingerprint evidence is highly probative and substantially outweighs any danger of unfair prejudice to defendant.

* * *

We find that the government's expert witness—at this juncture it appears it's Duane Johnson [sic Wilbur Johnson?], an FBI latent fingerprint examiner who testified first in the previous trial, and those other latent experts that testified in the Daubert hearing--are capable of testifying in these proceedings, and in that regard, I am not going to limit the defense from calling latent fingerprint experts to testify as to the ability not to identify or make an identification from the fingerprints, and I am also going to allow the defense to call any latent fingerprint expert who indicates that fingerprints are not reliable sources of information.

Only for that limited purpose and I am going to exclude evidence as to whether or not [latent fingerprint identification is] scientific, technical, or whatever. It has no relevance before the jury here. The question is whether or not an identification can be made by examination of fingerprints-latent fingerprints.
App. 1029a-1031a (repunctuated for clarity).

As we understand the ruling, the District Court held that the government's expert witnesses and Mitchell's expert witnesses could testify, but with the caveat that the latter could not testify to the question whether latent fingerprint identification is a "science." This ruling forms at least the baseline of two of Mitchell's issues on appeal: the admission of government experts, and the restriction of his own experts. The Court again discussed the admissibility of the defense's expert witnesses in a colloquy with counsel immediately before jury voir dire, an exchange that we will discuss in greater detail, infra Part IV.

Immediately following its ruling on the admissibility of expert testimony, the District Court addressed what would become another ground of Mitchell's appeal. Again from the bench, the Court ruled:

This Court will take judicial notice that human friction ridges are unique and permanent throughout the area of the friction ridge skin, including small friction ridge areas, and further that human friction skin arrangements are unique and permanent, and if called upon, we will instruct the jury as so.
App. 1031a (repunctuated for clarity). The Court so instructed the jury. On appeal, Mitchell asserts that it was error for the District Court to take judicial notice of these matters.

C. Mitchell's Second Trial

* * *

The case against Mitchell rested on eleven lay witnesses and two experts. * * *

Agent Meagher returned to testify at trial about many of the matters brought out by the government at the Daubert hearing. He discussed the embryology of friction ridge skin, the fingerprints of identical twins, and the biological basis for the permanence of fingerprints. He described how latent prints are left and how they are processed by examiners, and the various conclusions that examiners can draw from a comparison of prints. During Meagher's testimony, the government invoked the Court's promise to take judicial notice of the uniqueness of small areas of friction ridge skin. The government also read a stipulation detailing some of the results of the survey that Meagher testified about at the Daubert hearing, and the prosecutor examined Meagher regarding the agencies that did not make a positive identification of the latent prints. Meagher then demonstrated to the jury in some detail his use of the ACE-V technique in matching the latent prints to Mitchell's ten-print card. He stated definitively that the fingerprints from the beige car matched Mitchell's ten-print card. Agent Johnson also stated definitively that he had matched the latent prints from the beige car to Mitchell's ten-print card, though he did not give an in-depth demonstration to the jury as Agent Meagher did.

2. Mitchell's Case and Cross-Examination of the Government's Experts

The entirety of Mitchell's case was the testimony of individuals at state agencies who examined or supervised the examination of the latent prints sent by Agent Meagher in the survey. Specifically, Mitchell called thirteen latent fingerprint experts from nine states, all of whom were initially unable to identify one or both of the latent prints as belonging to Mitchell.

Mitchell also cross-examined the government's experts, Agents Johnson and Meagher. Cross-examination of Johnson concentrated on questions about his presentation to the jury of the fingerprints he matched-Johnson's demonstrative exhibits identified only nine points of Level 2 similarity between the latent prints from the car and Mitchell's ten-print card, despite Johnson's and Meagher's claims of a greater number of similarities. Through cross-examining Agent Johnson, Mitchell also probed the existence and maintenance of minimum-point standards and other quality-control measures at the FBI in particular, and in the discipline more generally. Cross-examination of Agent Meagher ranged into more general considerations, most notably the limited studies performed specifically to establish an error rate for fingerprint identification, and the limited means for detecting errors in particular examinations. Meagher was also cross-examined on his highly suggestive follow-up communications to those state agencies that did not match Mitchell's prints in the survey.

* * *

On February 7, 2000, the jury returned a verdict of guilty on all counts. * * *

E. This Appeal

The District Court had jurisdiction over this case under 18 U.S.C. § 3231. Mitchell filed a timely appeal from the final judgment of conviction and sentence, and we have jurisdiction under 28 U.S.C. § 1291.

On appeal, Mitchell asserts that the District Court committed five errors. First, he challenges the District Court's ruling following the Daubert hearing that admitted the prosecution's expert testimony on fingerprint identification. Second, Mitchell claims that the District Court erred in precluding his experts from testifying at trial that fingerprint identification is not a science, and is otherwise unreliable. Third, Mitchell finds error in the District Court's decision to take judicial notice of the uniqueness of small areas of friction ridge skin. * * *

III. Admissibility of the Government's Expert Testimony

A. Standard of Review

[W]e reject Mitchell's proposed standard of review, and adhere to the usual precepts of abuse-of-discretion review over the District Court's decision to admit the government's expert testimony.

B. Standard for Admissibility under Rule 702

The pathmarking Supreme Court cases interpreting Fed.R.Evid. 702 are Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), and Kumho Tire Co. v. Carmichael, 526 U.S. 137 (1999). The version of Rule 702 in effect at the time of the Daubert hearing and the trial provided:14

If scientific, technical, or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training, or education, may testify thereto in the form of an opinion or otherwise.

FN14. The rule was subsequently amended, effective December 1, 2000, to codify aspects of Daubert and its progeny. The Advisory Committee's note accompanying that amendment is a useful consolidation of commentary and precedent on the version of Rule 702 that applies in Mitchell's case, and so we will refer to it at points in our opinion.

Daubert identified the twin concerns of "reliability" (also described as "good grounds") and "helpfulness" (also described as "fit" or "relevance") as the "requirements embodied in Rule 702."  Daubert was "limited to the scientific context because that [wa]s the nature of the expertise offered [t]here," but Kumho Tire extended Daubert's "general principles" to all of "the expert matters described in Rule 702." Thus "technical knowledge," under which heading the discipline of latent fingerprint examination and identification seems to fall, is generally subject to the same considerations as "scientific" expertise.

The "general principles" adverted to in Kumho Tire comprised not only the fundamental concerns of reliability and helpfulness, but also a method for assessing reliability. The Daubert Court articulated "general observations" to this end by offering a nonexclusive list of five factors that a district court might consider in deciding whether to admit evidence under Rule 702. The Advisory Committee summarized these factors:

The specific factors explicated by the Daubert Court are (1) whether the expert's technique or theory can be or has been tested--that is, whether the expert's theory can be challenged in some objective sense, or whether it is instead simply a subjective, conclusory approach that cannot reasonably be assessed for reliability; (2) whether the technique or theory has been subject to peer review and publication; (3) the known or potential rate of error of the technique or theory when applied; (4) the existence and maintenance of standards and controls; and (5) whether the technique or theory has been generally accepted in the scientific community.
Fed.R.Evid. 702 advisory committee's note.

Citing Kumho Tire, the Advisory Committee noted that "[o]ther factors may also be relevant," id., and indeed, courts have augmented this list. In Paoli II we drew on Daubert and our earlier decision in United States v. Downing, 753 F.2d 1224 (3d Cir.1985), to lay out an expanded list of factors:

(1) whether a method consists of a testable hypothesis; (2) whether the method has been subject to peer review; (3) the known or potential rate of error; (4) the existence and maintenance of standards controlling the technique's operation; (5) whether the method is generally accepted; (6) the relationship of the technique to methods which have been established to be reliable; (7) the qualifications of the expert witness testifying based on the methodology; and (8) the non-judicial uses to which the method has been put.
Paoli II, 35 F.3d at 742 n. 8.

These factors address only reliability, and not "helpfulness" or "fit." But the fit inquiry in the case of fingerprint identification is not a significant factor, because identity evidence is the archetypal relevant evidence in criminal cases. Thus, the analysis that follows only addresses the reliability prong of Daubert.

C. Application of Daubert Factors to Government's Expert Testimony

1. Testability

We first consider whether the premises on which fingerprint identification relies are testable—or, better yet, actually tested. "Testability" has also been described as "falsifiability." See, e.g., Daubert, 509 U.S. at 593 (citing Karl R. Popper, Conjectures and Refutations: The Growth of Scientific Knowledge 37 (5th ed.1989)). A proposition is "falsifiable" if it is "capable of being proved false; defeasible." Webster's Third New International Dictionary 820 (unabridged ed.1966). Proving a statement false typically requires demonstrating a counterexample empirically—for instance, the hypothesis "all crows are black" is falsifiable (because an albino crow could be found tomorrow), but a clairvoyant's statement that he receives messages from dead relatives is not (because there is no way for the departed to deny this).

In this case, the relevant premises were posed as explicit questions to many of the government experts: (1) Are human friction ridge arrangements unique and permanent? and (2) Can a positive identification be made from fingerprints containing sufficient quantity and quality of detail? The government's experts responded in the affirmative. We must consider not whether we agree as a factual matter with their responses, see Paoli II, 35 F.3d at 744, but rather whether these hypotheses are testable (or tested). We conclude that they are.

Consider the first premise (which is really two hypotheses in one)--that human friction ridge arrangements are unique and permanent. The uniqueness proposition is testable because it would immediately be shown false upon the production of identical friction ridge arrangements taken from different fingers (either from different fingers on the same person, or from two different people). The uniqueness proposition has also been tested in several ways: First, the full-print matching portion of the FBI's 50/50 experiment tested it and found no true matches.16 Second, studies on identical twins (testified about by Agent German) showed unique fingerprints. While this is a small sample, there are independent and solid genetic grounds for believing that if identical friction ridge arrangements are to be found, they are most likely to be found in identical twins. Third, in the course of routine fingerprint examination, there are certainly opportunities to encounter identical fingerprints; as several witnesses testified, such a discovery would be very notable and word would spread quickly throughout the fingerprint examiner community. Yet no reports of non-unique friction ridge arrangements were introduced, and, indeed, the FBI survey sent to state agencies revealed that none had ever encountered two different persons with the same fingerprint.

16. The experiment had its limitations, though. First, the test sought to match fingerprints, not friction skin arrangements on actual fingers. Second, it was only a sample—50 thousand fingers tested, out of about 60 billion in the world. While this sample size seems quite large, and doubtless would be adequate in many if not most circumstances, we are unsure if it is adequate here. There is limited evidence on the record of why the government's experts chose a 50 thousand fingerprint set, and why they could confidently extrapolate from it. Indeed, there is some suggestion that purely practical technical concerns may have dominated this choice. See infra note 18.

The permanence component of the first hypothesis is also easily testable—simply take fingerprints from an individual at one time and compare them to the prints taken at another time. The Daubert hearing did not provide much evidence of actual testing of this hypothesis, however.

We turn next to the testability of the second hypothesis—that positive identification can be made from fingerprints containing sufficient quantity and quality of detail. Much of the debate in this case is masked by the word "sufficient." For example, a sufficiency standard of "100 points of matching Level 2 detail in an undistorted fingerprint lifted from a clean, smooth surface" would surely attract less objection than a sufficiency standard of "four points of matching Level 2 detail and passable quality." The actual standard employed by any given FBI examiner falls somewhere between these extremes, yet the FBI's reliance on an unspecified, subjective, sliding-scale mix of "quantity and quality of detail" makes meaningful testing elusive, for it is difficult to design an experiment to test a hypothesis with unspecified parameters. Two things rescue fingerprint identification from this apparent failure of testability: First, the examiner can testify to how much detail (quantitative and qualitative) was necessary for the particular identification at issue; and second, any testing directed toward falsifying the premise that a greater or equal amount of detail is sufficient to make an identification will serve as an attempt (albeit an imperfect one) to falsify the adequacy of the identification standard actually used.17

17. A concrete example may provide some clarity. In this case, Agent Meagher identified fourteen points of Level 2 detail (and unspecified supporting Level 3 detail, which we leave aside for simplicity) that matched Mitchell's right thumbprint to the latent print taken from the gearshift knob. Thus, for purposes of this particular identification, "sufficient quantity and quality of detail" really means "fourteen points of Level 2 detail." The hypothesis that "fourteen points of Level 2 detail is enough to make an identification" is falsifiable because one might be able to show that some latent print matches more than one full-rolled print under the "fourteen points of Level 2 detail" standard.
Actual testing (as opposed to mere testability) is harder to come by, probably because someone seeking to falsify this hypothesis has no a priori reason to choose 14 points instead of 13 or 15 as the standard. Nonetheless, any showing that a more stringent standard (e.g., a 20-point standard) is fallible necessarily implies that the 14-point standard is also fallible.

Just how much testing has been done to this end is unclear from the testimony at the Daubert hearing. On the one hand, it might be that examiners compare a latent print to a series of full-rolled prints until a match is found, and then terminate the process. If this protocol is used for routine examinations, those examinations will not tend to turn up multiple matches, because the examiner stops work after finding one match. In essence, the examiner has assumed the conclusion----that no other prints will match the latent, and therefore no further search is required. On the other hand, testimony at the Daubert hearing about the AFIS computer system suggests that the system tests a given latent print against its entire database (or a selected subset) of full-rolled prints, and returns a set of the best candidate matches. This protocol would tend to expose multiple full-rolled prints that match a given latent. Consequently, a lack of multiple matches from AFIS searches can constitute testing of the hypothesis that single positive identifications can be made from latent fingerprints. Whatever the case, no state agency claimed in response to the FBI survey that it had found a latent fingerprint that was "identified with two different fingers of the same person or even different persons." Joint Supp.App. at 55. This is perhaps the strongest support for the government on this point.

Modest support also comes from the second part of the government's 50/50 experiment, which matched simulated latent prints (pseudolatents) against the 50,000 full-rolled prints in the sample under examination. Setting aside spurious results due to mistakes in the FBI's database, the experiment found that each pseudolatent strongly matched one and only one full-rolled print. In other words, the experiment did not reveal any counterexample to the hypothesis that identifications can be made. Moreover, statistical computations extrapolating this to a much larger population of prints suggested that such duplicate matches would still be highly improbable.

Mitchell's experts, however, attacked the design of the 50/50 experiment, most effectively on the ground that pseudolatents are poor approximations of real latent prints.18 This lack of correspondence undermines the utility of the experiment because the issue for Daubert purposes is the testing of the hypothesis that positive identification be made from actual latent fingerprints containing sufficient detail. As we recount above, Mitchell's experts (particularly Dr. Stoney) convincingly explained why the process used by the government experts to generate the pseudolatents for the 50/50 experiment renders them poor substitutes for actual latent prints. In brief, the failing flagged by Dr. Stoney is that actual prints are subject to distortions and artifacts that were not simulated by the pseudolatent generator. Arguably, the pseudolatents resembled actual latents only in that the former were similar in areal size to the latter. Dr. Stoney's contention rings true: Distorted, real-world latent prints should tend to be harder to match to full-rolled prints than should computer-generated simulated latents. Since the 50/50 experiment did not adequately model real-world conditions, we cannot say that it significantly supports the government's position.

18. They also contended that actual tests on a larger data set (i.e., more fingerprints) would have been preferable to statistical extrapolations. However, significantly larger data sets may be computationally intractable: The experiments conducted for this case took on the order of a day to run on the computer. But for larger sets of fingerprints, the number of comparisons goes up as the second power (i.e., the square) of the number of prints in the sample. Thus, a 1 million / 1 million experiment would take 20 x 20 = 400 times longer than a 50 thousand / 50 thousand experiment-or on the order of a year to complete, given the same computing power. An experiment with the FBI's full AFIS database would take millennia.
In sum, if directed, specific actual testing were the requirement of Daubert, we might be hesitant to find this factor weighing in favor of the government. There is some force to Budowle's point that "[n]o one would say any one test or any kind of thing [that] has been done in one hundred years proves uniqueness." App. 1013a. But his further point about a long history of implicit testing is equally forceful: "It's the culmination of all of the experiences that [demonstrate uniqueness]." App. 1013a. Moreover, testability—which assures the opponent of proffered evidence the possibility of meaningful cross-examination (should he or someone else undertake the testing)—is one of the factors announced by the Daubert Court as an indicium of reliability. In sum, the hypotheses that undergird the discipline of fingerprint identification are testable, if only to a lesser extent actually tested by experience, and so we find this factor to weigh in favor of admitting the evidence.

2. Peer Review

The evidence at the Daubert hearing on peer review was not particularly extensive. Much of the testimony centered around the question whether the "verification" step in the ACE-V protocol—where a second examiner confirms the identification made by the first examiner-constitutes effective peer review. On the one hand, this could be viewed as stringent peer review, equivalent to the best sort used in, for example, the physical sciences, where peer review most often consists of anonymously reviewing a given experimenter's methods, data, and conclusions on paper. Sometimes the review takes the form of reproducing in full the results under review—that is, a second investigator repeats the entire course of experiments. Thus the verification step of ACE-V seems usually to be akin to this heightened form of peer review: The government's experts testified that verification often amounts to repeating the whole identification process de novo, though sometimes the verifying examiner will merely confirm the match found by the initial examiner. See App. 161a. Moreover, in this particular case, the survey of state law enforcement agencies constitutes verification many times over of the match of Mitchell's fingerprints.

Mitchell's experts, however, (Dr. Cole in particular) cast some doubt on the purity of the verification step. Backed by his research, Dr. Cole suggested that fingerprint examiners have developed an "occupational norm of unanimity" that strongly discourages the verifying examiner from challenging the identification made by the initial examiner. Moreover, Dr. Cole criticized peer review of latent fingerprint identification conclusions for not being anonymous. We also acknowledge that the cultural mystique attached to fingerprint identification may infect the peer review process. But the government's experts countered that they were aware of cases where the results of the verification step caused the initial examiner to withdraw his initial identification. Looking at the entire picture, the ACE-V verification step may not be peer review in its best form, but, on balance, the peer review factor does favor admission.

The peer review factor also encompasses publication, as the dissemination of a work tends to subject it to scrutiny in the same way that prepublication peer review does. See Daubert, 509 U.S. at 593-94. On the one hand, a significant fraction of the publications in the field concern articles on technique—for example, the best practices for preserving latent prints—and such materials say little about the field's reliability. On the other hand, there are articles—introduced both by the government and by Mitchell—that address more theoretical/foundational questions, such as an appropriate minimum point standard, the likelihood of two persons having identical friction ridge arrangements, and so on. Thus the publication facet of peer review is not a strong factor, and neither reinforces nor detracts from our conclusion that the peer review factor favors admission.

3. Error Rate

The parties have waged a considerable battle of experts over whether a known error rate exists for latent fingerprint identification. Assuming that such a rate has been soundly established, it is surely a low rate of error. But the existence of any error rate at all seems strongly disputed by some latent fingerprint examiners.

The question whether an error rate can be established on the existing data is subtler than the parties seem to acknowledge. Preliminarily, we must distinguish between two error rates: false positives and false negatives. In this context, false positives are incorrect affirmative identifications, and false negatives are incorrect findings of dissimilarity. A fair amount of the government's evidence—and also much of Mitchell's response-centers on the existence vel non of failed identifications. For example, the government stresses the large number of state agencies that confirmed its identifications, and Mitchell counters by pointing to the agencies that failed to identify the prints. But these observations go to the rate of false negatives: While a system of identification with a high false negative rate may be unsatisfactory as a matter of law enforcement policy, in the courtroom the rate of false negatives is immaterial to the Daubert admissibility of latent fingerprint identification offered to prove positive identification because it is not probative of the reliability of the testimony for the purpose for which it is offered (i.e., for its ability to effect a positive identification).19

19. Moreover, evidence of the false negative rate is often equivocal. While it might suggest a generally error-prone method, it is equally consistent with a very conservative method with a low false positive error rate. That is, a method may be designed to lower its false positive error rate by accepting a large number of false negatives out of an abundance of caution. One very familiar example of such a system is the criminal jury using the "beyond a reasonable doubt" standard: As the adage (attributed to Blackstone) says, "It is better that ten guilty escape [false negatives] than one innocent suffer [a false positive]." The same may be true for latent fingerprint identification—the examiners who declared they could not match the latent prints in the FBI's survey (the examiners responsible for the putative false negatives) may have done so because they would rather commit a likely false negative error rather than risk a small chance of a false positive identification.

Thus we must focus on evidence that is probative of the rate of false positives. Perhaps the government's most powerful evidence is the fact that, in the course of the FBI survey of state agencies, no jurisdiction ever matched the latent prints from the gearshift knob and door handle to anyone other than Mitchell himself—despite searches run against (in the aggregate) nearly 70 million ten-print records. Assuming that every record had 10 fingerprints, and that the latents actually were left by Mitchell, the test of the two latent prints against these records implies something on the order of 1.4 billion comparisons resulting in no false positives. The government can also draw support from the very limited number of reports of false positive identifications throughout the many decades that the technique has been in use. Furthermore, the government's 50/50 experiment using pseudolatents, representing 2.5 billion comparisons, also did not register any false positives, though as we have noted, it had flaws.

Mitchell counters this evidence in two different ways, but neither of them fully refutes the government's evidence. First, he raises a legal challenge, claiming that the burden of proof under Fed.R.Evid. 104(a) is up-ended by effectively requiring him to come forward with examples of false positives. While Mitchell is correct that Rule 104(a) places the burden of proof on the proponent of the evidence (here, the government), see Bourjaily v. United States, 483 U.S. 171, 175 (1987), this does not mean that the burden is static, at least in terms of a burden of going forward. Particularly in a case like this, where what is sought to be proved is essentially a negative (i.e., the absence of false positives), it seems quite appropriate to us to use a burden-shifting framework. Such a framework was applied here: The government's experts—qualified as knowledgeable in matters pertaining to fingerprint identification-testified to their being unaware of significant false positive identifications. At that point, it becomes quite reasonable to shift the burden to the opponent of the evidence (here, Mitchell) to counter this claim with affirmative examples.

Mitchell's second attack on the government's evidence of error rates is factual. He presented evidence that fingerprint examiners sometimes make false positive identifications on proficiency examinations. This evidence is troubling, but we view it as evidence relating only to the competency of those practitioners, leaving undisturbed the government's evidence about the near-absence of false positive identifications.20

20. Mitchell's experts respond by denying the existence of a dichotomy between method error rate and practitioner error rate, asserting that both are part of a unitary inquiry. We reject this view as a legal conclusion inconsistent with Paoli II. Paoli II makes clear that error rates and the qualification of the expert are distinct inquiries. 35 F.3d at 742. The corollary to this, however, raises an issue for any given fingerprint expert: His testimony would be more likely to be admitted (because he would be more qualified) if he himself demonstrated a low rate of false positives in his own work and/or on his own proficiency tests. Cf. Calhoun v. Yamaha Motor Corp., 350 F.3d 316, 322 (3d Cir. 2003) (holding that the scope of an expert's testimony was properly circumscribed by the scope of his expertise).

As suggested above, known false positives have been attributed to malice or incompetence on the part of the examiner, and not to a deeper flaw in the method itself. Dr. Cole testified that this "circling the wagons" behavior is yet another occupational norm of a fingerprint identification community bent on preserving the unimpeachability of its methods. But even if every false positive identification signified a problem with the identification method itself (i.e., independent of the examiner), the overall error rate still appears to be microscopic.

We therefore accept that the error rate has been sufficiently identified to count this factor as strongly favoring admission of the evidence. The error rate has not been precisely quantified, but the various methods of estimating the error rate all suggest that it is very low. This follows from three pieces of evidence we identify above as favoring the government: (1) the absence of significant numbers of false positives in practice (despite the enormous incentive to discover them), (2) the absence of false positives in the FBI's state agency survey, and (3) the statistical computations based on the 50/50 experiment.

4. Maintenance of Standards

Closely related to the question of error rate is the maintenance of standards to guide the application of the method. This is lacking here in some measure. The FBI maintains that its flexibility to consider a mixture of Level 2 and Level 3 detail in making identifications renders its method superior to and more flexible than the minimum-points standards used in some states and various foreign jurisdictions. The tradeoff, though, is that the FBI's method lacks a significant yardstick of standard-based objectivity. In contrast, with a minimum-point standard there is at least some agreement about what constitutes a Galton point and what does not.

Some standards do remain: There are procedural standards (such as ACE-V) and terminological standards (such as the naming conventions for Galton points). But these are insubstantial in comparison to the elaborate and exhaustively refined standards found in many scientific and technical disciplines. As such, we find that this factor does not favor admitting the evidence.

5. General Acceptance

Prior to the adoption of the Federal Rules of Evidence, admission of expert testimony was governed by the Frye test, which required that the evidence must have gained "general acceptance in the particular field in which it belongs." Frye v. United States, 293 F. 1013, 1014 (D.C. Cir. 1923). Daubert held that Congress's adoption of Rule 702 legislatively overruled Frye, but at the same time acknowledged that " 'general acceptance' can yet have a bearing on the inquiry," id. at 594. Thus we consider as one factor in the Daubert analysis whether fingerprint identification is generally accepted within the forensic identification community. The answer is yes, as demonstrated by the results of the FBI's survey of state agencies. Mitchell's only argument with respect to this factor is that there is no scientific community that generally accepts fingerprint identification. But the scientific/nonscientific distinction is irrelevant after Kumho Tire, and accordingly we reject the argument. We also note that the Court of Appeals for the Fourth Circuit, in addressing the same question that we are considering here, relied heavily on general acceptance to support the admission of fingerprint identification evidence. See United States v. Crisp, 324 F.3d 261 (4th Cir. 2003). We likewise conclude that this factor weighs in favor of admitting the evidence.

6. Relationship to Established Reliable Techniques

Although the parties have not provided us with extensive analysis of the relationship of the principles and practice of latent fingerprint identification to " 'more established modes of ... analysis,' " Paoli II, 35 F.3d at 742 (quoting Downing, 753 F.2d at 1238-39), it seems to us that this is the best heading under which to consider the government's evidence from the fields of developmental embryology and anatomy. The testimony and documentary materials introduced on these topics during the Daubert hearing-especially through Dr. Babler—tended to establish biological bases for the uniqueness and permanence of areas of friction ridge skin. Since no question was raised about the soundness and reliability of the work in these specialties, we are comfortable that the reliability of these fields is well-established. Independent work in these fields bolsters the underlying premises of fingerprint identification, and so we find that this factor lends additional support to admitting the latent fingerprint identification evidence.

7. Degree to Which the Expert Testifying Is Qualified

[T]here were essentially no challenges to the qualifications of the government's experts (or of Mitchell's experts, for that matter), but the binary question whether an expert is or is not qualified to testify to a particular subject is analytically distinct, under Rule 702, from the more finely textured question whether a given expert's qualifications enhance the reliability of his testimony. See Schneider ex rel. Estate of Schneider v. Fried, 320 F.3d 396, 407 (3d Cir.2003) ("[The defendant's] argument appears to challenge the qualification of [the plaintiff's expert]; although we note that 'the degree to which the expert testifying is qualified' also implicates the reliability of the testimony." (quoting Paoli II, 35 F.3d at 742)).

The qualifications of Agents Meagher and Johnson matter the most, because they were the government's experts at trial. Both had estimable qualifications. The putative blemish on their qualifications, which we hint at above, see supra note 20, is that neither testified extensively about his own known error rate as a practitioner (as might be revealed, for example, by proficiency tests they had taken). While this is by no means fatal to the admissibility of the testimony, prosecutors would be well-advised to elicit testimony about their experts' personal proficiency, rather than relying on the discipline's good general reputation among lay jurors. Failing that, we are confident that defense counsel will use cross-examination to expose incompetent fingerprint examiners. In this case, Agent Meagher's uniquely strong qualifications and the confirmatory identifications from state agencies are a surrogate for testimony about Agent Meagher's and Agent Johnson's personal proficiency as examiners.21 Thus this factor supports admitting the government's evidence.

21. Mitchell's counsel came close to inquiring on voir dire about Agent Meagher's results on proficiency examinations administered internally by the FBI, but did not actually ask a specific question. App. 1456a-1457a. The government did ask Agent Johnson about his results on FBI proficiency examinations, but defense counsel objected and the Court sustained the objection on the ground that Johnson had already been qualified as an expert. App. 1652a-1653a. As our discussion in the text suggests, this question was proper—even desirable—and the District Court was wrong to sustain the objection.

8. Non-Judicial Uses

We have recognized that evidence of the non-judicial uses of the technique in question is relevant to the Daubert reliability inquiry. See Paoli II, 35 F.3d at 742. This is because non-judicial use of a technique can imply that third parties—i.e., persons other than the proponent of the expert testimony, for whom the testimony is typically self-serving—would vouch for the reliability of the expert's methods.22 The government offered some evidence of the non-judicial uses of fingerprint identification, particularly through Dr. Budowle. In analyzing this factor, the government relies on three categories of non-judicial uses of fingerprints: (1) the identification of arrested persons (e.g., checking an arrestee's record at the time of booking); (2) biometric identification as a security measure (e.g., authenticated access to a computer system) or for regulatory purposes (e.g., fingerprinting for driver licensing as an anticounterfeiting measure); and (3) identification of partial remains following disasters. While at first blush this seems like a factor strongly supporting admissibility, the bloom recedes upon close analysis.

22. Keeping this rationale in mind is helpful, because some non-judicial uses will support the required inference of third-party confidence better than others. For example, no one would argue that the commercial popularity of astrology for non-judicial use makes it fit for admission under Rule 702. This case may provide another example: As we discuss below, the government introduced evidence of the widespread commercial use of biometric identification technology based on fingerprints. It is possible that commercial adoption of the method signals acceptance of its reliability. But, as Mitchell's uncontradicted survey evidence showed, fingerprint identification enjoys a near-mythical reputation for reliability, and so the evidence of commercial adoption is equally consistent with uncritical acceptance of a method that consumers merely believe—but do not know—to be reliable.

Latent fingerprint identification works from fingerprints that are partial and subject to distortions. All the nonjudicial uses listed above either use full-rolled prints, or avoid the difficulties introduced by distortion—or both. Both differences are critical, as Mitchell's experts testified and as the government's experts acknowledged: It is significantly easier to match one clean full-rolled print to another than it is to match a somewhat distorted latent fragment to a full-rolled print.23 Thus, in the case of identification of arrestees, the booking officer will take a ten-print card with a full set of full-rolled prints, and if the prints do not come out cleanly, the officer has the opportunity to take a second set of impressions. Likewise, the security and regulatory uses of fingerprinting generally rely on clean, full-rolled prints.24 As for disaster-victim identification, the government's experts did testify that fragments of friction ridge skin have been used to make identifications, but even those identifications still differ from latent fingerprint identification because identification using actual skin eliminates the challenges introduced by distortions.25 Thus there is less here than meets the eye, and while this factor supports admitting the government's evidence, it does so only weakly.

23. The government's experts implicitly acknowledged this—even before the Daubert hearing—in the very design of the 50/50 experiment: The first stage of that experiment was the matching of full-rolled prints to full-rolled prints, but the ultimate aim of the experiment was to test pseudolatent prints against full-rolled prints to better simulate the more demanding exercise of latent fingerprint identification. Of course, as we have noted above, even this refined experiment used pseudolatents, and thus failed to capture the complexities of matching latent prints marred by distortions and artifacts.

24. Dr. Budowle testified that current commercial research and development seeks to use as little as 6% of the area of the full print to make an identification. This makes such a technique more akin to latent fingerprint identification, but it still differs in significant ways. First, the fraction of the print will be distortion-free, unlike actual latent prints. Second, the 6% portion is likely to be taken from a portion of the finger with a high areal density of Level 2 detail, a luxury that latent fingerprint examiners do not have.

25. We also understand the task in disaster-victim identification as being (merely) to individualize one victim out of at most a few thousand victims, while forensic criminal identification seeks to individualize the defendant out of a pool of millions of potential perpetrators. Accordingly, there seems to be less of a threat of a false positive in the context of disaster-victim identification than in forensic criminal identification.

D. Application to the Record of Core Daubert Principles

Although it is clear from the foregoing analysis of the Daubert factors that the government's fingerprint evidence passes muster, Mitchell contends that the government's inability to establish that its evidence is correct, and its failure to show that its evidence meets the standards required of "science," mean that the government's evidence must be excluded. Mitchell is wrong. This is established by Daubert itself, which requires no more than that the Court satisfy itself that "good grounds" exist for the expert's opinion.

Judge Selya has put it well:

Daubert does not require that a party who proffers expert testimony carry the burden of proving to the judge that the expert's assessment of the situation is correct. As long as an expert's scientific testimony rests upon "good grounds, based on what is known," it should be tested by the adversary process—competing expert testimony and active cross-examination—rather than excluded from jurors' scrutiny for fear that they will not grasp its complexities or satisfactorily weigh its inadequacies. In short, Daubert neither requires nor empowers trial courts to determine which of several competing scientific theories has the best provenance. It demands only that the proponent of the evidence show that the expert's conclusion has been arrived at in a scientifically sound and methodologically reliable fashion.
Ruiz-Troche v. Pepsi Cola Bottling Co., 161 F.3d 77, 85 (1st Cir.1998) (citations omitted) (quoting Daubert, 509 U.S. at 590) (citing Kannankeril v. Terminix Int'l, Inc., 128 F.3d 802, 806 (3d Cir.1997); Paoli II, 35 F.3d at 744), quoted in part in In re TMI Litigation, 193 F.3d at 692. Good grounds for admission plainly exist here.

To the extent that Mitchell's attack rests on his experts' claim that latent fingerprint examiners do not engage in "science," he does not heed the text of Rule 702 or the Supreme Court's teachings in Kumho Tire. Rule 702 "makes no relevant distinction between 'scientific' knowledge and 'technical' or 'other specialized' knowledge." The very holding of Kumho Tire is that those categories simply address what type of testimony is covered by the rule, and that, in addressing admissibility under Rule 702, the same factors generally apply to all categories of expert testimony. Kumho Tire explicitly rejected as unworkable and unnecessary any "distinction between 'scientific' knowledge and 'technical' or 'other specialized' knowledge." That a particular discipline is or is not "scientific" tells a court little about whether conclusions from that discipline are admissible under Rule 702; at best, there will be some overlap between the factors that bear on a field's status as "science" and Daubert' s factors addressed to reliability. Reliability remains the polestar.

Mitchell seeks a significantly higher threshold of admissibility under Rule 702, and, consequently, a very different allocation of responsibility between judge and jury. Yet Rule 702 and Daubert put their faith in an adversary system designed to expose flawed expertise. Mitchell misconceives this balance struck by the framers of Rule 702 and the Daubert Court. As the Advisory Committee explained in the context of the December 1, 2000 amendment to Rule 702, "Daubert did not work a 'seachange over federal evidence law,' and 'the trial court's role as gatekeeper is not intended to serve as a replacement for the adversary system.' "  Daubert itself emphasized the point: "Vigorous cross-examination, presentation of contrary evidence, and careful instruction on the burden of proof are the traditional and appropriate means of attacking shaky but admissible evidence." These trial practices and procedural devices like the directed verdict, "rather than wholesale exclusion under an uncompromising ... test, are the appropriate safeguards where the basis of scientific testimony meets the standards of Rule 702." We echoed this in Paoli II, where we noted "Rule 702 mandates a policy of liberal admissibility."

In this context, the court is often referred to as a "gatekeeper." This metaphor is particularly apt because it works two ways: On the one hand, the court must exclude some evidence as a gatekeeper, by "preventing opinion testimony that does not meet the requirements of qualification, reliability and fit from reaching the jury," Schneider, 320 F.3d at 404. But on the other hand, the court is only a gatekeeper, and a gatekeeper alone does not protect the castle; as we have explained, "[a] party confronted with an adverse expert witness who has sufficient, though perhaps not overwhelming, facts and assumptions as the basis for his opinion can highlight those weaknesses through effective cross-examination." Stecyk v. Bell Helicopter Textron, Inc., 295 F.3d 408, 414 (3d Cir.2002).

Indeed, as our discussion of the various Daubert factors suggests, many of them are guarantees that cross-examination and adversary testing will be possible: Testability ensures the basic possibility of meaningful cross-examination. Peer review and publication also provide raw material for the cross-examining attorney to confront the expert with. The existence of a known error rate may force an expert to admit to the limitations of his or her methods. The maintenance of standards provides an objective benchmark to confirm that the expert did indeed follow her method. And so on. Since these factors were well-satisfied in this case, it was with confidence that the baton was passed from the Court to the adversary system.

The principle that cross-examination and counter-experts play a central role in the Rule 702 regime has three important applications to this case. First is the core holding of United States v. Velasquez, 64 F.3d 844, 848-49 (3d Cir.1995): Experts with diametrically opposed opinions may nonetheless both have good grounds for their views, and a district court may not make winners and losers through its choice of which side's experts to admit, when all experts are qualified. Rather, the same standards of reliability and helpfulness should be applied to both sides, with a "'preference for admitting any evidence having some potential for assisting the trier of fact.'" We return to this in the next section, where we discuss the District Court's handling of Mitchell's experts.

Second, district courts will generally act within their discretion in excluding testimony of recalcitrant expert witnesses--those who will not discuss on cross-examination things like error rates or the relative subjectivity or objectivity of their methods. Testimony at the Daubert hearing indicated that some latent fingerprint examiners insist that there is no error rate associated with their activities or that the examination process is irreducibly subjective. This would be out-of-place under Rule 702. But we do not detect this sort of stonewalling on the record before us.

Third, this case does not announce a categorical rule that latent fingerprint identification evidence is admissible in this Circuit, though we trust that the foregoing discussion provides strong guidance. And as we explain in Velasquez, both Rule 702 and the Sixth Amendment's Confrontation Clause permit any criminal defendant to put the prosecution to its proof at trial. None of this, however, should be read to require extensive Daubert hearings in every case involving latent fingerprint evidence. The Supreme Court has emphasized that district courts "have the same kind of latitude in deciding how to test an expert's reliability" as they do in deciding "whether or not that expert's relevant testimony is reliable." Kumho Tire, 526 U.S. at 152. Thus a district court would not abuse its discretion by limiting, in a proper case, the scope of a Daubert hearing to novel challenges to the admissibility of latent fingerprint identification evidence--or even dispensing with the hearing altogether if no novel challenge was raised.

E. Conclusion on the Admissibility of the Government's Evidence

We conclude, on the record before us read in light of the basic Daubert principles, that most factors support (or at least do not disfavor) admitting the government's latent fingerprint identification evidence. There are good grounds for its admission. We therefore conclude that the District Court did not abuse its discretion in holding the government's evidence admissible.

* * *

V. The District Court's Declaration of Judicial Notice

We next turn to the question whether the District Court properly took judicial notice that "human friction ridges are unique and permanent throughout the area of the friction ridge skin, including small friction ridge areas, and that ... human friction skin arrangements are unique and permanent." App. 1472a. "[A] court's decision whether to take judicial notice of certain facts is reviewed for abuse of discretion." In re NAHC, Inc. Sec. Litig., 306 F.3d 1314, 1323 (3d Cir.2002).

A. Appropriateness of Judicial Notice

Federal Rule of Evidence 201(b) specifies what matters are the proper subject of judicial notice:28

A judicially noticed fact must be one not subject to reasonable dispute in that it is either (1) generally known within the territorial jurisdiction of the trial court or (2) capable of accurate and ready determination by resort to sources whose accuracy cannot reasonably be questioned.

28. Rule 201 also provides that a party be "heard as to the propriety of taking judicial notice," Fed.R.Evid. 201(e); Mitchell was heard in the course of the Daubert hearing. Further, the Rule requires that "[i]n a criminal case, the court shall instruct the jury that it may, but is not required to, accept as conclusive any fact judicially noticed," Fed.R.Evid. 201(g), a caveat that the Court included in the jury instructions.

The actual phrasing offered by the government and adopted by the District Court is opaque; while we can comprehend the notion that friction ridge arrangements are permanent, we are unsure what it means to describe "arrangements," considered in the abstract, as "unique." On one level, this seems irrelevant: Since the issue at trial was latent fingerprints, it is difficult to see how general propositions about "arrangements" are related to any "fact that is of consequence to the determination of the action," Fed.R.Evid. 401. Moreover, "small friction ridge areas" seems problematic—what is "small"? (In light of the issues at trial, we imagine that it was a reference to areas the size of typical latent fingerprints.) Even without reference to the substantive standard in Rule 201(b), we wonder whether the very phrasing of the judicially noticed material signals that the District Court erred.

Vagueness and irrelevance aside, judicial notice of these matters clearly failed Rule 201(b). The Rule requires that the matter "not [be] subject to reasonable dispute." Yet much of Mitchell's presentation at the Daubert hearing was directed at disputing this very proposition;29 if the question merited such an extensive Daubert hearing, it surely was not suitable for resolution by judicial notice. Moreover, Rule 201 speaks in terms of "fact[s]." Here, the Court took judicial notice of a scientific conclusion—something which is subject to revision—not a "fact."30 One of the purposes of a Daubert hearing is to educate the Court as to the relevant expertise. That the Daubert hearing consumed five days before the Court could take judicial notice only further compels the conclusion that this "fact" was neither "generally known" nor "capable of ... ready determination."

29. One of Mitchell's own experts, Dr. Stoney, did agree, however, that small areas of friction ridge skin are unique.

30. The distinction implied by Rule 201(b)'s use of "fact" can be made clearer by the use of more polarized examples: Matters like "February 7, 1977 was a Monday" (a fact) are suitable for judicial notice, while propositions like "daily exercise reduces the likelihood of heart disease" (a scientific conclusion) are not.

The government's defense of the District Court's taking of judicial notice focuses on the large number of cases where courts have taken judicial notice of the uniqueness of fingerprints. None of the cases cited by the government is binding on this Court. More to the point, none of them concern judicial notice of the uniqueness and permanence of "small areas" of friction ridge skin--rather, the cases generally concern the uniqueness of full fingerprints, or the method of fingerprint identification. While we have doubts about the propriety of taking judicial notice even in those cases (one need only look at our Daubert analysis above to see that the matter is in dispute), for present purposes we need only note that the cases cited by the government are clearly distinguishable. Thus we conclude that it was error for the Court to take judicial notice as it did.

B. Harmless Error Analysis

Having concluded that it was error for the District Court to take judicial notice as it did, we must consider whether the error was harmless. Under our precedent, an error is harmless if " 'it is highly probable that the error did not contribute to the judgment.' " United States v. Davis, 183 F.3d 231, 255 (3d Cir. 1999). We conclude that the error was harmless.

* * *

VIII. Conclusion

The judgment of the District Court will be affirmed.