Click to return home


D.H. Kaye1

This article is published in the Arizona State Law Journal, Vol. 28, No. 4, 1997, pp. 1035-1077

Eight years ago, it was celebrated as "the single greatest advance in the 'search for truth,' and the goal of convicting the guilty and acquitting the innocent, since the advent of cross-examination,"(2) Six years ago, it was endorsed by the U.S. Congress's Office of Technology Assessment.(3) Four years ago, it received a mixed blessing from the National Academy of Sciences.(4) Three years ago, the Arizona Supreme Court held that it was erroneously admitted into evidence in a brutal child sex-murder case.(5) Last year, it was the centerpiece of the notorious murder case against O.J. Simpson. This year, the National Academy issued a second report on it,(6) and the Arizona Supreme Court revisited it.(7)

The object of these conflicting opinions is, of course, DNA evidence. DNA, or deoxyribonucleic acid, is a complicated molecule. Scientists would love to learn the precise details of how the DNA of a single human being is arranged. These details determine the human genome--all the genetically inherited characteristics that a person possesses. Forensic scientists are satisfied, as they must be, with less. Commercial laboratories run by companies with names like Lifecodes and Cellmark Diagnostics, and crime laboratories operated by the FBI and the Arizona Department of Public Safety use the tricks of molecular biology to compare tiny, but extremely revealing, fragments of DNA from crime scenes to fragments derived from suspects to see whether they "match."

Before the 1993 opinion in State v. Bible,(8) the lower courts in Arizona were not quite sure what to do about DNA evidence.(9) After reading Bible, they remain adrift. Decisions bobbing every which way in the wake of Bible(10) are the flotsam of an opinion that could not hold water. This article describes some of the defects in that opinion. It shows that the court's analysis was flawed by the failure to come to grips with the scientific debate about the population genetics of DNA profiles. It then discusses the court's effort to reread Bible in State v. Johnson.(11)

The article shows that although Johnson is a step in the right direction, limitations in the court's understanding of the scientific issues render unconvincing Johnson's responses to some of the criticisms of DNA testing that swayed the Bible court. It proposes alternative reasoning that would support the result in Johnson, and it discusses procedures that might improve opinions in cases that turn on scientific evidence.

I. Bible and Its Aftermath in the Court of Appeals

In State v. Bible, the prosecution introduced evidence of a match to buttress an already powerful case. A nine-year-old girl bicycling to a ranch in Flagstaff disappeared, and her battered body was found hidden in the woods three weeks later. The day she disappeared, police saw Richard Lynn Bible driving a stolen car that matched the description of a car driving away from vicinity in which the girl last was seen. A high speed car chase ensued. Bible fled from the car. Police tracked him with dogs. They found him hiding under a ledge and covered with twigs, leaves, and branches. Numerous items in the stolen car -- vodka bottles, cigars, hot chocolate, rubber bands, even metal from the steering column -- matched items later found strewn about the girl's decomposing corpse. Fibers of hair on the defendant's clothing and wallet and in the car matched the girl's hair. Hair fibers near the little girl's corpse matched the defendant's hair. Blood was spattered across defendant's boots, pants and shirt. Bloodstains on the shirt contained the same enzyme as the girl's blood -- an enzyme found in under 3% of the population. The pièce de resistance in this surfeit of circumstantial evidence came when Cellmark Diagnostics reported that DNA in the blood on Bible's shirt matched the girl's DNA and that this particular DNA was exquisitely rare in the Caucasian population, occurring with a relative frequency between 1/60 million and 1/14 billion.

This frequency is also the probability that a randomly selected person will have the DNA profile; consequently, it often is called the random match probability.(12) Cellmark's estimate was produced by what I shall call the simple product rule -- frequencies (found in a particular convenience sample of the Caucasian population) of the individual characteristics that composed the DNA profile, along with suitable coefficients, were multiplied together.(13)

Before trial, the defense objected to expert testimony about this estimate. The court held a pretrial hearing and concluded that Cellmark's methods of DNA identification were generally accepted in the scientific community. It admitted the laboratory findings and calculations as well as opposing testimony from a geneticist retained by the defense. This, the Arizona Supreme Court held, was error.(14) Adhering (with newfound reservations)(15) to the standard for admitting scientific evidence applied by the trial judge,(16) the supreme court relied on articles in Science magazine, news accounts, and cases in other jurisdictions to find "a lack of general acceptance of Cellmark's statistical probability calculations in the relevant scientific community of population geneticists."(17) In essence, the court concluded that the principles and methods employed by Cellmark to declare a DNA match were scientifically accepted,(18) but that Cellmark's procedure for calculating the frequency with which matching profiles are present in the general population was not accepted. Rather than decide what to do about this computational gap, the court simply told the lower courts not to admit the numbers. With Olympian detachment (or perhaps Delphic inspiration), the supreme court left it to the lower courts to figure out whether an expert could testify to the bare fact of a match, or could go further to offer an opinion about the significance of the match that did not rest on the controverted computational procedure.(19)

And the lower courts have certainly tried. The first post-Bible case to generate a court of appeals opinion, State v. Clark,(20) is a brief and apparently straight-forward application of Bible's rejection of random match probabilities obtained with the simple product rule. A jury found Kevin James Clark guilty of attempted first-degree murder, kidnapping, first-degree burglary, and two counts of sexual assault, all stemming from a brutal attack on a woman in her Tempe apartment in 1990. Cellmark determined that the victim's DNA matched DNA found on a glove and a scarf seized from Clark's apartment and that Clark's DNA matched DNA found on the glove. According to the court of appeals, at Clark's trial, a "molecular geneticist" testified that "the chance that the DNA on the scarf was that of someone other than the victim [w]as one in 87 million; the chance that the DNA on the glove was that of someone other than the victim [w]as one in 6.9 million; and the chance that the other DNA on the glove was that of someone other than Clark [w]as one in 10 million."(21) Ignoring this mischaracterization of the evidence,(22) the court of appeals summarily held that "[u]nder . . . Bible, the trial judge erred by permitting the State to introduce Cellmark's calculations regarding the probability of a random match."(23)

State v. Bogan(24) is more interesting -- so much so that it has received national attention. One Sunday morning, a boy riding his dirt bike through a dry wash in the desert saw the nude body of a woman, lying face down in the brush near a cluster of palo verde trees. She had been strangled to death. A man in the vicinity volunteered that he had seen a white truck leave the area "pretty quick" at about 1:30 that morning. The police found a pager a few feet from the body. It was registered to Earl Bogan, but used primarily by his son, Mark, who drove a white pickup truck and lived about 18 minutes from the scene. In the bed of the truck, police found two seed pods from a palo verde tree. Still other evidence suggested that Mark Bogan was the culprit. Bogan maintained that a female hitchhiker had "swiped" his pager from the truck and run away. He denied having been in the area where the body was found.

An enterprising detective observed that one of the palo verde trees -- later designated as "PV-30" -- had a fresh abrasion on one of its lower branches. He contacted Dr. Timothy Helentjaris, a professor of molecular genetics at the University of Arizona, who compared DNA from the seed pods found in the truck with the DNA in seed pods from the palo verde trees at the crime scene. He also analyzed DNA from other palo verde seed pods collected at various sites around the county. He concluded that the seed pods found in the truck originated from PV-30.

On appeal from the resulting murder conviction, Bogan argued that the professor's testimony should have been excluded under Bible. During a Frye hearing, Dr. Helentjaris had testified that the odds of a random match were one in a million, but the trial judge ruled that estimate to be inadmissible. Instead of giving his calculation, Dr. Helentjaris testified at trial that the samples from the truck bed "matched completely with ... PV-30," that he felt "quite confident in concluding that these two samples ... most likely did come from [PV-30]," and that he was "quite comfortable" in concluding that PV-30's DNA would be distinguishable from that of "any tree that might be furnished" to him. Reasoning that forensic scientists routinely testify about "matches" in hair, fingerprints, and other items without giving statistics, and that there was no disagreement about the generally acceptance of the laboratory techniques used to ascertain the DNA "match" here, the court of appeals affirmed the conviction.(25)

In contrast to Bogan, the court of appeals in State v. Hummert,(26) overturned a conviction where the state's experts testified to an apparently unique match. At 3:30 a.m., man accosted a nineteen-year-old woman as she was getting out of her car. With a handgun to her head, he forced her into the yard of a nearby house, where he raped, choked, and struck her when she resisted. He drove off in a red Honda CRX with a grey out-of-state license plate with black lettering on it, bearing the numbers 939. Steven Hummert owned just such a car, and there was evidence that he had contrived to create an false alibi. His pubic hair matched one of four hairs taken from the victim's underpants. Blood group testing was inconclusive, but DNA from a semen stain matched the DNA taken from Hummert.

The trial court excluded estimates of the probability of random match, but allowed testimony as to the match itself. An FBI examiner testified a match meant that "[e]ither you're brothers, identical twins, or that would be a very unique experience." Another expert testified that "one can, by carefully choosing particular parts of the DNA that vary a lot between people, uniquely identify every person with just a sample of each person's DNA."(27)

The court of appeals read Bible as permitting testimony that DNA analysis did not exclude a defendant, but it reasoned that here, "[i]n the absence of generally accepted population frequency statistics for determining the probability of a random match, the experts' opinions overstated the significance of the DNA test results." However, the opinion added that "[b]y informing the jury that a match over three probes was a 'rare' event or that a DNA match could 'uniquely identify' an individual, the experts effectively conveyed to the jury, at least implicitly, the random match probability statistics found inadmissible by the trial court." Inasmuch as there seems to be general scientific agreement that multiple single-locus VNTR matches are rare, the first part of this dictum is problematic.(28)

The court of appeals reiterated this dubious dictum in State v. Boles.(29) Timothy Roosevelt Boles was convicted of multiple counts of burglary, kidnapping, sexual assault, sexual abuse, sexual conduct with a minor, and child molestation, involving four victims in neighboring apartment complexes. The evidence against Boles included pubic hairs, sneaker prints, and DNA samples linking him to two victims. In this case, the court of appeals emphasized that "the state's experts offered opinions to the effect that it was highly unlikely that someone other than defendant was the source of both samples." The state's two experts and the defendant's expert all testified that they had never seen or heard of two unrelated individuals whose DNA profiles matched over five probes. Indeed, the state's principal expert went so far as to say that to find such a match would require a sample size equal to or greater than the world population. Once again, the court of appeals reversed the conviction, holding that testimony tantamount to uniqueness is not only inadmissible, but also so fundamental an error as to require reversal even without an objection to the testimony. The court reasoned that without a generally accepted method of computing the random match probability, Bible bars testimony suggesting that a profile is unique or even unusual.(30)

The final case in this little collection of criminal horrors is State v. Johnson.(31) A man attacked and sexually molested the owner of a restaurant in Sierra Vista as she entered the restaurant one morning. He then drove away in a small, red car. Two women in the town reported that Robert Johnson, who drove a small red car, was wearing clothing that morning that the victim had noticed on her attacker. DNA from Johnson was indistinguishable from DNA left by the victim's attacker. An analyst from the Arizona Department of Public Safety testified at trial that the probability of a random match was no greater than 1/312 million. The court of appeals found no error in the introduction of this estimate, and it affirmed Johnson's conviction. It distinguished Bible on the ground that Bible merely precludes estimates obtained with the simple product rule, while the estimate here was obtained with the "ceiling principle" then recommended for courtroom use by a committee of the National Academy of Science.(32)

One thing is clear -- Bible has left the lower courts adrift. The various Arizona Court of Appeals opinions hold that the following testimony regarding the significance of a "match" is inadmissible: simple product rule estimates; expert opinions that human DNA profiles are unique; and analysts' statements that they have never seen two matching DNA profiles of unrelated people. On the other hand, experts can report that tree DNA profiles are unique, and they can estimate profile frequencies with the ceiling method. Beyond this, the Court of Appeals has stated in dicta that even opinions that a human DNA profile is "rare" are inadmissible.

Given this chaotic state of affairs, one would hope that the Supreme Court would step in to clarify matters. And step in its has. The court granted review in almost all these cases,(33) and it has issued an opinion in Johnson. That opinion purports to "address[] . . . those [questions] left open by State v. Bible," but this is more a statement of the court's aspirations than of its accomplishments. I turn now to a detailed look at the analyses in Bible and Johnson to determine what they decide, what they leave open, and whether the court's reasoning supports its conclusions. My review reveals that the court has a ways to go before it finishes building a bridge over the troubled waters raised in Bible.

II. Bible: Mistakes and Unanswered Questions

In Bible, the Arizona Supreme Court joined the ranks of courts that expressed -- and acted on -- concerns about the simple product rule estimates of the probability of a random match.(34) Unlike most courts, however, it discerned two other apparently grievous flaws in the calculations -- the use of convenience rather than random samples, and an allegedly flawed database. I begin by explaining the product rule, the criticism of it, and the other concerns voiced in Bible. I show that even at the time, none of the criticisms justified exclusion of the evidence.

A. Bible's Treatment of the Simple Product Rule

The DNA analysis in Bible involved "VNTR loci." A locus is a location on a chromosome, and a chromosome is a bundle of DNA wrapped in proteins inside the nucleus of a cell. The genetic information in DNA results from the order in which certain components, known as base pairs, are arranged.(35) VNTR loci are extremely variable among individuals -- different people tend to have different base pair sequences at these points.(36) The differences give rise to variations in the lengths of DNA fragments found after laboratory workers apply a bacterial enzyme to "digest" the DNA. Technicians then apply a chemical (called a "probe") that binds only to the DNA fragments from the VNTR loci.(37) In the most commonly used method of detecting the VNTR "alleles" (fragments of different lengths), the probe molecules are radioactive, causing the alleles to show up as bands on a photographic film called an "autoradiogram." The positions of the bands on the autoradiogram reflect the lengths of the VNTR fragments. Figure 1 depicts some steps in the process of producing an autoradiogram of the alleles at a single locus.

Figure 1. Schematic portrayal of some major steps in single locus RFLP profiling. In step 1, all the "raw" DNA in the cells in the sample are extracted, including many copies of the two duplex strands of the DNA from the two homologous chromosomes per cell that the single-locus test probe will characterize. In step 2, the many duplex DNA strands are treated chemically or heated to separate the strands. In step 3, restriction enzymes "digest" the long, single strands into shorter fragments by cutting the strands at restriction sites. In step 4, the many single-stranded DNA fragments are separated by length on an electrophoretic gel, then transferred to a nylon membrane for ease in handling. In step 5, "probes" designed to bind to a specific base-pair sequence are added to the restriction fragments, marking those pairs with the target sequence. In step 6, the fragments to which the probes have become bound are photographed on an autoradiogram; the many other fragments, which do not contain the sequence to which the probe is sensitive, are not seen.

Most people have two distinct alleles at a given locus; one allele comes from the chromosome inherited from the father, and the other comes from the chromosome inherited from the mother. Such "heterozygotes" have two bands on a autoradiogram for a probe at a single locus. This is the situation depicted in Figure 1. Sometimes, however, the mother and father have the same alleles at a particular locus on the chromosomes that they pass on to their child. The resulting "homozygous" individual shows only one band -- because the fragments from both chromosomes are the same length.(38) The one or two bands at a single locus are known as a single-locus profile or genotype. The set of all bands seen at all loci is known as a multiple single-locus profile or, more simply, a multilocus profile or genotype.(39) Figure 2 shows an autoradiogram with the single-locus profiles of 12 individuals. One or two bands can be seen in each vertical strip of the picture. The twelve DNA samples are easily distinguished from one another. Some individuals have only one band; others have two; and in no case do all the bands from one individual line up with all the bands from another person.

Figure 2. Single locus VNTR profiles at one locus for 12 individuals. Source: FBI, 1988 (as reproduced in Office of Technology Assessment, Genetic Witness: Forensic Uses of DNA Tests 47 (1990))

The simple product rule estimates the expected frequency of genotypes in a population of individuals who choose their mates and reproduce independently of the alleles. Although population geneticists describe this situation as "random mating," these words are terms of art. People do not choose their mates by a lottery, but "random mating" merely means that the choices are uncorrelated with the specific alleles that make up the genotypes in question. We shall see that the court in Bible and especially Johnson did not appreciate the meaning of random mating, but we are getting ahead of the story.

The Bible court begins its description of the simple product rule as follows:

"Cellmark uses the 'product rule' -- sometimes called the 'multiplication rule' -- to make its random match [probability] determination. This rule is described as follows: 'Suppose, for example, that a pair of DNA [samples] match on two bands, and that one band reflects an allele found in ten percent of the population and the other an allele found in fifty percent of the population. Applying the product rule, an analyst would conclude that the probability of a coincidental match on both alleles is 0.10 x 0.50 = 0.05, or a five percent probability.' The 0.05 result in this example means that there was a one in twenty probability of a random match (leaving a nineteen in twenty chance that the samples came from the same person). The validity, and corresponding accuracy, of the product rule depends on the presence, or absence, of several factors."(40)

This describes the calculation of a single-locus genotype, but it misstates the product rule in two respects. First, the probability of a random match at this locus is not 5%, but 10%. Table 1 shows why.

     Table 1. Expected single-locus genotype proportions under
     random mating. Ten percent of the sperm in the gene pool 
     carry allele 1 (A1), and 50% carry allele 2 (A2). 
     Similarly, 10% of the eggs carry A1, and 50% carry A2. 
     With random mating, we expect 5% of the fertilized eggs to be (A1,A2) 
     and another 5% to be (A2,A1). Both configurations produce identical 
     autoradiograms -- a band for A1 and another band for A2. So the 
     expected proportion of heterozygotes A1A2 is 5% + 5% = 10%.

                       Allele 1 (10%)      Allele 2 (50%)
     Allele 1 (10%)    10% x 10% = 1%      10% x 50% = 5%
     Allele 2 (50%)    50% x 10% = 5%      50% x 50% = 25%

More generally, when the frequency of two alleles is p1 and p2, the single-locus genotype frequency for the corresponding heterozygotes in a randomly mating population is expected to be 2p1p2. The single-locus genotype frequency for the corresponding homozygotes is expected to be p12 and p22 (1% and 25% in Table 1). These proportions are known as "Hardy-Weinberg equilibrium" proportions. Even if two populations with distinct allele frequencies are thrown together, within the limits of chance variation, random mating produces Hardy-Weinberg equilibrium in a single generation.

Second, even if the 5% were the Hardy-Weinberg A1A2 proportion, it would be wrong to say, as the court does, that "the 0.05 result in this example means that there was a one in twenty probability of a random match (leaving a nineteen in twenty chance that the samples came from the same person)." At best, the 1 in 20 figure merely means that 1/20th of the population will be A1A2, or, to put it another way, that there is a probability of 0.05 that a randomly selected person will be A1A2. The probability that two samples that have the same single-locus genotype A1A2 come from different people is not necessarily 0.05. If there are 100 plausible suspects, then 5 of them would be expected to have the incriminating A1A2 genotype. Knowing nothing else about the suspects, one might well think that the probability that the incriminating sample came from the one suspect who was tested first is 1/5 rather than 19/20.(41)

The court's errors in describing the simple product rule as applied to a single locus are understandable -- few judges have mastered even these rudiments of population genetics.(42) But the errors make the simple product rule look more extreme in its power to incriminate a defendant than it really is. The court compounds these errors as it continues its exposition:

"As applied to this case, the individual frequencies -- the necessary components of the product rule (the 0.10 and 0.50 in the example quoted above) -- come from, and are based on frequencies in, Cellmark's database. That database apparently bases these frequencies on samples obtained from blood banks as well as paternity and forensic cases. These frequency figures -- vital components of the product rule -- are valid and accurate only if they come from a truly random sample, and the database for the frequency figures must be large enough to be statistically [reliable]. The nature of the product rule indicates that any errors, or shortcomings, in the database may have a profound and significant impact on the random match calculations.(43)"

This is an odd claim. The "nature of the product rule" is not a reason to think that small sampling errors will have "profound and significant effects." The court gives a contrived arithmetic example to prove its claim. Not only does the example use an incorrect form of the product rule for a single-locus genotypes, but it also assumes, unrealistically, that all the errors must work in the same direction.(44)

Once the single-locus genotype frequencies are obtained, these estimates are multiplied across the loci to obtain the multiple single-locus frequency. When the frequency of a multilocus genotype in a population is the product of the frequencies of the single-locus genotypes, the population is said to be in linkage equilibrium.(45) The court, however, uses the phrase to refer to independence of alleles at a locus as well as independence across loci:

"The product rule also is based on the assumption that each band on the autorad represents a DNA segment that is independent of the other bands on the autorad. For this assumption to be valid, the DNA segments tested must be in linkage equilibrium -- i.e. "the probability of a match on each band is unaffected by the occurrence of a match on any other band." If this assumption of independence is not correct, the results of the product rule may be incorrect by a substantial margin.(46)"

To illustrate the margin of error, the court against resorts to an exaggerated and miscomputed example:

"As before, using eight frequency figures, and if the frequency rate used for each frequency figure is 0.1 (or 1 in 10), the probability of a random match still would be 0.1 x 0.1 x 0.1 x 0.1 x 0.1 x 0.1 x 0.1 x 0.1 or 0.18 or 1 in 100,000,000. If, however, band "2" and band "3" are always present when band "1" is present, the actual probability of a random match would be 1.0 x 1.0 x 0.1 x 0.1 x 0.1 x 0.1 x 0.1 x 0.1 (or 1.0 x 1.0 x 0.16) or 1 in 1,000,000.(47)"

A correct calculation for four heterozygous loci (which is one way to obtain eight alleles) with identical allele frequencies of 10% gives the probability of a random match P = (2 × 0.1 × 0.1)4 = 16/100,000,000 if there is linkage equilibrium among the four loci. If two of the four loci were perfectly correlated, then P would be considerably less: P = (2 × 0.1 × 0.1)3 × 1 = 8/1,000,000. But only one plausible hypothesis for thinking that there is any substantial departure from independence of forensic VNTR loci has ever been proposed -- population structure(48) -- and it is as likely to lead to underestimates of the random match probability as to overestimates.(49) Furthermore, even when Bible was decided, all the available studies of VNTR loci indicated that any deviations from linkage equilibrium were small.(50) Of course, this does not mean that there was no debate in the scientific community on the possible lack of independence of VNTR loci. But it is another reason to think that the court's exposition of the simple product rule gives a distorted picture of the difficulties with that rule.

Indeed, the court continues in this vein, purporting to identify "a third relevant assumption upon which the product rule is based," namely, "a truly random mating population (where mating is random and the gene pool is evenly intermixed)."(51) At best, non-random mating is not an additional point, but a precondition for the independence of all alleles (and their combinations). If mating is correlated with genotypes, then the genotypes will not occur in the proportions expected under Hardy-Weinberg and linkage equilibrium.(52) Indeed, the possibility that mating patterns result in population structure provides the only substantial reason to doubt the product rule, precisely because major population structure undermines the blithe assumptions of equilibrium. Random mating therefore is not a "third assumption," but an aspect of the previous two assumptions of Hardy-Weinberg and linkage equilibrium.

Nevertheless, the court was on the right track in asking whether mating might be markedly non-random. Cellmark's use of the simple product rule to estimate of the multiple single-locus frequency made it appropriate to ask whether the occurrence of alleles at each locus are independent events (Hardy-Weinberg equilibrium), and whether the loci are independent (linkage equilibrium). In these regards, the major criticism raised against the product rule, both at the trial and in the scientific literature, was that the equilibrium frequencies do not follow the simple model of a homogeneous population mating without regard to VNTR loci because the major racial populations are composed of ethnic subpopulations whose members tend to mate among themselves. Within each ethnic subpopulation, mating still can be random, but if, say, Italian-Americans have allele frequencies that are markedly different than the average for all whites, and if they only mate among themselves, then using the average frequencies for all whites in the simple product formula could understate or overstate a multiple single-locus profile frequency for the subpopulation of Italian-Americans.(53) Furthermore, and perhaps less obviously, using the population frequencies tends to overstate the multiple single-locus profile frequencies in the white population itself.(54)

Therefore, if we want to know the frequency of an incriminating profile among Italian-Americans, the simple product rule applied to the white allele frequencies could be in error. One might presume that the extent of the error could be determined by looking to the variations across racial groups,(55) but, for a time, a few scientists insisted that variations from one ethnic group to another within a race were larger than variations from one race to another.(56) In light of this literature,(57) the court had reason to conclude that the simple product rule, used with broad population frequencies, was not fully accepted for estimating frequencies within subpopulations.

On the other hand, the court failed to recognize that there was much less explicit dissension over the ability of the rule to estimate profile frequencies within a general population.(58) Had the court appreciated this distinction in the population genetics literature, it could have upheld the use of the simple product rule in Bible, for that case did not involve any particular subpopulation like Italian-Americans.(59) The issue is Bible was whether the blood on Bible's shirt came from the girl who had been killed, or whether it had come from someone else with whom Bible had had contact. Unless Bible had limited his contact to members of one ethnic group, the more pertinent statistic to consider was the frequency of the DNA profile in a major racial group.

Nevertheless, the court had other reasons to reverse the admission of the product rule estimates than the arguments among population geneticists and statisticians about the extent and effect of population structure on estimates of profile frequencies within subpopulations. It had doubts about the procedures that Cellmark had used to assemble its Caucasian database, and it believed that the database itself was somehow flawed. Yet, the court's treatment of Cellmark's database is even more disturbing than its mildly confused explanation of the relationship between concepts in population genetics and the simple product rule.

B. Bible's Treatment of Cellmark's Database

As we have just seen, the scientific criticism of the simple product rule that justifiably concerned the Bible court is that population substructure vitiates the use of the simple product rule with allele frequencies derived from the population as a whole. I have suggested that the court did not appreciate the limited force of this criticism as applied to estimating the multiple single-locus profile frequency in the substructured population as a whole, as opposed to a particular subpopulation. In this respect, the Bible opinion is but one of many in the early 1990s that perceived a debate among population geneticists, but did not appreciate the subtleties of the debate.(60) But unlike most other courts, the Bible court did not squarely hold that simple product rule estimates were never admissible. Instead, its "legal analysis"(61) concluded that

"[T]he Cellmark method of deriving the random match probability figures is not generally accepted in the relevant scientific community. For Frye purposes, these probability calculations are flawed in three ways: (1) they are impermissibly based on the disputed assumption of linkage equilibrium; (2) the database relied on is of disputed statistical validity; and (3) the database relied on is not in Hardy-Weinberg equilibrium.(62)"

We already have discussed the first putative flaw. "The disputed assumption of linkage equilibrium" apparently refers to the debate over the importance of population structure in various situations. The court does not state whether this controversy, standing alone, would justify excluding product rule estimates, but the cases on which the Arizona court relies reach precisely that conclusion.(63)

The second "flaw" has received much less attention in the legal and scientific literature. By "disputed statistical validity," the court apparently means that Cellmark's database was not obtained by random sampling. No other reported case has excluded random-match probability estimates on this ground alone, but more than one statistician has suggested that he would not accept anything less than a true probability sample.(64) I shall assess the merits of this position in discussing Johnson.

Read literally, the third "flaw" -- that "the database . . . is not in Hardy-Weinberg equilibrium" -- is meaningless. Populations, not databases, are or are not in Hardy-Weinberg equilibrium. But one can count the alleles in a database, compute the single-locus genotype frequencies under the assumption of Hardy-Weinberg equilibrium, and compare the computed figures to the single-locus frequencies actually seen in the database.(65) In the absence of substantial inbreeding, substructure, or natural selection for certain genotypes,(66) the Hardy-Weinberg proportions are 2p1p2 (for heterozygotes) and p12 and p22 (for homozygotes).(67) Inbreeding or population structure generally decreases the proportion of heterozygotes and increases the proportion of homozygotes.(68) It is easy to see why. Relatives have more alleles in common than unrelated individuals. Therefore, the offspring of relatives have a greater chance of receiving the same alleles from both parents -- of being homozygotes -- and a smaller chance of inheriting dissimilar alleles -- of being heterozygotes.

An analysis of Cellmark's database showed an excess of homozygotes, and the Supreme Court insisted that "[s]tate expert Dr. [Lisa] Forman [of Cellmark] conceded that the 1988 Caucasian database used by Cellmark in this case was not in Hardy-Weinberg equilibrium."(69) However, an excess of homozygotes corresponds to a deficit of heterozygotes, which favors the defendant. The database is used only to estimate the allele frequencies, and these are combined to estimate genotype frequencies. Since the heterozygote frequencies as counted directly in the database are less than the estimated quantity 2p1p2, the product rule makes it appear as if all the heterozygous single-locus frequencies are more common than they really are.

On the other hand, if any of the loci tested in Bible were homozygous and if the Hardy-Weinberg proportions for homozygotes (p12 or p22) were used, but the population actually had an excess of homozygotes, the estimates would be on the low side, making the homozygous genotypes seem rarer than they are. This is not a difficulty, however, because forensic laboratories do not estimate homozygote VNTR frequencies as p12 or p22. Instead, they use the far larger number 2p1 or 2p2. They do so not because they suspect departures from Hardy-Weinberg equilibrium, but because a single band is not necessarily the result of homozygosity. A second allele might be present in the DNA, but it might be so small that it is pulled to edge of the gel and is not detected. Or the band might be obscured for some other reason. To be safe, the laboratory assumes that there was a second band; not knowing its frequency in the database, it uses the highest conceivable value, 100%.(70) Consequently, the alleged departure from Hardy-Weinberg benefits the defendant, whether or not any loci are homozygous; that the Bible court would rely on it as a reason to exclude the product rule estimate seems perverse.

Yet, the argument about the database in Bible cannot be dismissed quite this easily. A database that shows genotype proportions that depart greatly from the Hardy-Weinberg proportions could be a symptom of a deeper pathology. Perhaps the population is highly structured, and the multiplication across loci is invalid. The multilocus frequency then might be too low (prejudicing the defendant) or too high (prejudicing the state), but the error is not necessarily cured by the practice of using 2p for the genotype frequency of apparent single-locus homozygotes.(71)

Thus, checking databases for excess homozygosity was (and continues to be) a common practice. In the early days of DNA population analysis, there appeared to be a clear excess of homozygotes, but it did not take long for the explanation to emerge. As already noted, for technical reasons, not all heterozygotes can be detected. When the limitations of the laboratory method are taken into account, the excess homozygosity vanishes.(72)

And this is exactly what happened at the trial in Bible. Dr. Forman testified as follows:

"Q. Does the data base that was used in 1988 meet Hardy-Weinberg equilibrium?

"A. In the way that it was analyzed, with the gel system that was used, you could not say that it meets Hardy-Weinberg expectations.

"Q. Have you done further testing on the database?

"A. Certainly.

"Q. And what have the results of your further investigation been?

"A. The results of the further investigation show that the reason we appeared to have to many individuals with one band, too many homozygotes than would be predicted from this model, was because we were only looking at a small window. We were only looking at the top of the gel down to where one of these molecular weight markers had migrated 20 centimeters. In that case we were cutting off all bands that were smaller than the molecular weight marker band.

"When we left those bands on, when we ran the gel for a shorter period of time, .. . . we found the bands that were missing were there. We had just cut them off in the data base, but they were still there.

. . .

"Q. What has your decision been in terms of how to evaluate that statistically?

"A. What we do is the most conservative approach and that is to say everybody has two bands, even if you can only see one, so instead of doing a p2 . . . we treat everyone as though they are a heterozygote.

"Q. So in essence, does that drive your numbers down, make them --

"A. It makes the appearance of that DNA band seem more common. It is more generous to the person who has that band. . . ."(73)

Had the Supreme Court understood the meaning of Hardy-Weinberg equilibrium and had it realized that the population structure argument maintains that the population (not the database) is not in equilibrium, perhaps it would not have misconstrued this testimony as a concession that there was some unresolved problem with estimating the multilocus frequency from Cellmark's database. As one reads beyond the first question and answer, the testimony clearly makes two points that were supported by the scientific literature. First, it asserts that the database is consistent with a population in Hardy-Weinberg equilibrium; appearances to the contrary are artifacts due to small alleles running off the gel. Thus, the deviations in the database from Hardy-Weinberg expectations do not support the claim that the Caucasian population is so structured as to undermine estimates that assume Hardy-Weinberg or linkage equilibrium. Second, the testimony asserts that the use of this database instead of one in which all single-locus profiles were true homozygotes was generous to the defendant. Misunderstanding the testimony and the short-lived scientific debate about excess homozygosity, the Court mischaracterized Cellmark's database as "flawed" for "Frye purposes."

C. Bible's Unanswered Questions

For several reasons -- some more convincing than others -- Bible holds that it was error to allow testimony about a random-match probability of 1 in 14 billion obtained by applying the simple product rule to allele frequencies estimated from a small, convenience sample that did not exhibit Hardy-Weinberg single-locus proportions. What might happen if the expert did not speak to the probability, but merely reported a match? If she described matches among unrelated people as unusual or impossible?(74) If she reported the allele frequencies, but made no effort to combine them into a genotype frequency? If she computed the random match probability from a database that exhibited Hardy-Weinberg proportions but still was not a random sample? If she used a different method from the simple product rule for estimating genotype frequencies? If she used other probes than those for VNTRs to analyze the DNA samples?

The list could continue, but it is plain that no single opinion could lay the issue of DNA evidence to rest. And, the Bible opinion was not crafted to do so. To the contrary, the Chief Justice declared:

"We take a cautious, conservative approach. Not knowing what records in other cases will show, what issues those cases will raise, or what new technology will bring, we neither write in stone nor go farther than we must. For the moment, and at least with respect to DNA evidence, we leave Frye untouched. We make no final judgment on how far, if at all, the court may go in allowing a party to inform the jury about the declaration of a match and its meaning in any specific case. We hold only that statistical probability evidence based on Cellmark's database is not based on generally accepted scientific theory and is not admissible."(75)

In this way, the court left itself -- and the lower courts -- a vast space in which to maneuver. As increasingly contradictory opinions issued from the court of appeals,(76) the pressure to return to the issues raised by DNA evidence became irresistible. In the summer of 1996, the court issued its second opinion on the subject.

III. Johnson: Missed Opportunities

In State v. Johnson, the supreme court answered two of the many questions left unresolved by Bible. First, it made it plain that Frye was alive and well in Arizona. In Bible, the Chief Justice had, for the first time,(77) recognized some of the limitations of Frye.(78) Like a lover's quarrel, these were all but forgotten in Johnson:

"We . . . find nothing in the arguments or briefs to persuade us that this case presents us with a reason to abandon Frye and follow Daubert. The federal courts have not yet had a fair opportunity to apply Daubert; thus, it is too early to properly evaluate it. We therefore conclude that for the present, and for the reasons stated in Bible, the Frye rule, which has been followed without causing significant problems since it was first adopted in 1962, remains the rule in Arizona."(79)

With these facile remarks, the court missed the opportunity to provide its first thoughtful examination of the merits of Frye.(80)

Second, Johnson clarifies one aspect of DNA evidence.(81) An expert can testify to the random match probability computed according to a modification of the product rule known as the "interim-ceiling" method.(82) In coming to this position, the court broke no new ground, but followed in the steps of the supreme courts of Colorado, Massachusetts, Minnesota, New Hampshire, New Mexico, Wyoming, and Washington.(83) It skirted the pitfall of thinking that the extreme distaste expressed by many population geneticists for the ceiling method reflects a rejection of the view that the method is extremely generous to defendants.(84) As the court recognized, much of the criticism is based on the belief that the method "produces excessively conservative results."(85)

But as with the superficial discussion of Frye, the court missed an opportunity, this time to correct the defects in Bible's treatment of basic concepts in population genetics and statistics. If anything, the court raised the level of confusion about the scientific issues, depriving its otherwise reasonable conclusions of much persuasive value. There are conceptual problems in the discussion of linkage equilibrium, Hardy-Weinberg equilibrium, and random sampling. Following a short description of the interim-ceiling method, I consider these in turn, and I attempt to offer more convincing rationales for the result in Johnson.

A. The Interim-Ceiling Method

The first NRC committee proposed the interim-ceiling method as a way to give an upper bound on the frequency of a genotype in any population or subpopulation.(86) Applied to a single racial group, like whites, the product rule estimates the frequency of the multilocus genotype as the product of the single-locus frequencies, and it estimates each single-locus frequency as 2p1p2 for heterozygotes (or as 2p for homozygotes), where p refers to frequencies estimated from the database for that race. The interim-ceiling method uses the same general formula, but different values of the frequencies. Instead of multiplying together the allele frequencies from any single, major racial database, the procedure picks, for each allele in the DNA profile, the largest value seen in any race.(87) If that value is under 10%, the procedure rounds it up to 10%. Those values are then multiplied according to the formulas that apply when there is Hardy-Weinberg and linkage equilibrium, that is, according to the product rule.(88) Thus, the ceiling method employs a mix-and-match, round-up, and multiply strategy. The result, it is widely (but not universally) believed, is an extremely conservative estimate of the profile frequency that more than compensates for the possibility of any population structure that might undermine the assumptions of Hardy-Weinberg and linkage equilibria in the major racial populations.(89)

B. Linkage Equilibrium

As explained in Part II, linkage equilibrium merely means that the frequency of a multiple single-locus genotype in a population is the product of the frequencies of the single-locus genotypes.(90) Consider three VNTR alleles, labeled 1, 2, and 3, and two loci, denoted A and B. There are three possible heterozygous single-locus genotypes at the first locus (A1A2, A1A3, and A2A3) and three possible homozygous genotypes (A1A1, A2A2, and A3A3). Whatever factors might cause the population to be in linkage equilibrium, if a proportion P12 (say, 1%) of the population has the single-locus genotype A1A2, and a proportion P13 (say, 2%) has the genotype B1B3, then the proportion that has the two-locus genotype A1A2 B1B3 is just P12 P13, or 0.2%. In short, linkage equilibrium is a statement about the frequencies with which various combinations of genotypes occur in a population.(91)

As discussed in Part II, if population structure is severe and there are large differences across subpopulations with respect to the single-locus proportions P12, P13, and so on, then the multilocus genotype frequencies in the population will not be simple products. The Johnson opinion uses an example from the 1992 NRC Report to illustrate this phenomenon:

"Thus, by way of illustration only, linkage equilibrium assumes that whether a person inherits the allele for blue eyes is unrelated to whether that person inherits the allele for blond hair or fair skin. Of course, as the NRC report points out, these three traits tend to co-occur in Nordics. Therefore the actual frequency of these three traits occurring together (assuming each trait occurs one time in ten) is not simply a straight calculation under the product rule . . . . Instead, because of the co-occurrence of such observable, physical traits in certain sub-populations, the actual frequency in the total population of all three traits appearing in any one individual is probably considerably higher . . . ."(92)

Unfortunately, this example masks an important subtlety that leads the Johnson court astray. The blond-blue-fair folk presumably find these visible characteristics attractive and choose mates accordingly. The court seems to think that only visible characteristics can interfere with linkage equilibrium. The Chief Justice writes that "[t]his does not, however, necessarily invalidate the assumption of linkage equilibrium because the alleles chosen to create the DNA profile with the RFLP protocol are non-coding, that is, they are not responsible for producing any observable characteristic."(93) But it is a mistake to think that non-coding loci must be in equilibrium. Suppose that the blond-blue-fair folk are color-blind and cannot even distinguish shades of gray, but that they have a common religion that no other group shares and that they marry within their religion. Even though their hair, eye, and skin colors are no longer visible, the combination of blond-blue-fair will persist at a high frequency within this group. In short, the fact that VNTR alleles do not code for physical or behavioral characteristics does not respond adequately to the population structure argument -- a point that has been made very clearly in case after case.(94)

Johnson offers a second reason for thinking that linkage equilibrium is assured for VNTRs. It observes that "these alleles are known to be extremely variable from person to person, and scientific studies have not shown any statistical correlation between them."(95) The first of these points is helpful. The extreme variability of VNTRs among individuals suggests that it is unlikely that major differences in the proportions will arise or be maintained across the subpopulations.(96) If each allele occurs rarely in each subpopulation, then the fact that members of the subgroups tend to mate among themselves will make little difference. The product of allele frequencies that are rare in each subpopulation will be a very small number in each subpopulation and in the population as a whole. In other words, where the allele proportions are the same in each randomly mating subpopulation, population structure has no effect on linkage equilibrium. But the similarity of the single-locus proportions in all subpopulations is a conjecture that must be verified before it can be used to erase the lack of general acceptance about linkage equilibrium that the court found so distressing in Bible.

The Johnson court evidently believed that empirical confirmation is now available -- that "scientific studies have not shown any statistical correlation between" VNTR loci. If this view is generally accepted, then there is general acceptance of the part of the product rule that involves multiplication across loci. It is shocking, however, that Johnson states that the 1992 NRC Report "makes [it] clear" that linkage equilibrium is generally accepted.(97) Three years ago, the Bible court cited the same report for the opposite assertion. The Bible opinion had it right. The 1992 NRC report could hardly have been clearer in refusing to pronounce linkage equilibrium generally accepted.(98) The committee squarely rejected the position that Johnson attributes to it.(99) It advocated the interim-ceiling method not because linkage equilibrium holds, but just in case it does not.(100) No other appellate court has suggested that the 1992 report endorsed the view that linkage equilibrium exists for VNTRs in the major racial groups.(101)

Even so, Johnson's shaky handling of linkage equilibrium does not vitiate the court's approval of ceiling estimates. The court may not have understood the fundamental fact that the ceiling method is a way to circumvent departures from linkage and Hardy-Weinberg equilibria, but it was correct in concluding that the method is generally accepted as a conservative procedure for estimating VNTR frequencies.(102) No finding of general acceptance of the proposition that VNTR loci are in linkage equilibrium is necessary to arrive at that result.

Why, then, did the court write that linkage equilibrium is no longer an obstacle to random-match-probability estimates? Its many errors in describing concepts in genetics suggest that it simply did not understand the population genetics issues, but a more devious interpretation is possible. Conceivably, the court wanted to undermine Bible and signal that henceforth simple product rule calculations will be admissible. To do that, it would have had to dispose of Bible's finding that linkage equilibrium was seriously disputed in the scientific community. And that is what it tried to do.

The pity of the Johnson opinion is that it would have been easy to retreat unscathed from Bible's position on linkage equilibrium, There now are ample studies confirming the view that forensic VNTR loci are largely uncorrelated,(103) and a new NRC report that recommends the simple product rule for cases like Johnson.(104) If that is what the court was after, its error merely lay in citing the wrong NRC report.

C. Hardy-Weinberg Equilibrium

Johnson's discussion of Hardy-Weinberg equilibrium occupies only three short paragraphs, but the terse discussion replicates many of the errors in the court's discussion of linkage equilibrium. Despite the clarity of the description in the 1992 NRC report, the court fails to perceive that the population structure argument undermines both the simple product rule's combination of allele frequencies at a given locus (the Hardy-Weinberg proportions) and its multiplication of these single-locus frequencies across loci (valid for linkage equilibrium).(105) The court does not seem to recognize that the point of the ceiling method is to forge ahead even in the absence of Hardy-Weinberg equilibrium. Instead, as with linkage equilibrium, the court makes the naive argument that simply because VNTR alleles do not code for "physical, racial, cultural, and behavioral characteristics,"(106) Hardy-Weinberg equilibrium is assured.(107)

Although the court's understanding of the conditions required for Hardy-Weinberg equilibrium is wrong,(108) can all be put right by the fact that "the DPS [Department of Public Safety] database [was] in Hardy-Weinberg equilibrium"?(109) Alas, probably not. Just as finding that a large database has a few statistically significant deviations from Hardy-Weinberg proportions may not reveal that the population from which the database is drawn is highly structured, so too the discovery that a small database(110) does not exhibit statistically significant deviations is no clear indication that the population is not highly structured.(111)

Fortunately, as explained in Part II, there are stronger arguments for using the Hardy-Weinberg proportions for heterozygotes and the 2p figure for homozygotes. In addition, many studies using a variety of analyses suggest that the actual departures from Hardy-Weinberg equilibrium in the major U.S. racial populations has little effect on the genotype frequency estimates.(112)

As with linkage equilibrium, then, the Johnson court reaches out to decide that Hardy-Weinberg equilibrium is present and offers unconvincing reasons for that view. But once again, the general acceptance of ceiling estimates does not turn on the existence of equilibrium; and even if it did, the conclusion that the racial population represented in the Arizona database is mating without regard to VNTRs -- and without regard to membership in subpopulations that are correlated with VNTRs -- can be defended.

D. Sample Size and Random Sampling

The final obstacle to affirming the conviction in Johnson has nothing to do with random mating, population structure, equilibria, and the other features of population genetics models. It revolves around the way the DPS built its database. That database is a sample of the population. For a sample to lead to accurate inferences about a population, it must be representative of that population. The best way to obtain representative samples is to select people according to some objective, chance process -- to draw a "probability sample."(113) Such a sample need not be huge, and the range of probable error in estimates of the population's numerical characteristics resulting from the luck of the draw in random sampling can be quantified.(114)

The DPS database was not large, but Johnson pointed out that it was sufficient for reasonably accurate estimates of allele frequencies. Although some of the authorities the court cited for this proposition are not on point,(115) and the premise that there is some pre-ordained, minimum size is dubious,(116) there is ample authority for the view that a database of a couple hundred people is quite adequate.(117)

Likewise, the court's manner of disposing of the argument for random sampling is flawed, but not beyond repair. Like most forensic DNA databases, the DPS database was not acquired through probability sampling. It was "comprised of samples from blood banks,"(118) and blood banks do not select units at random from a well-defined sampling frame. They take whatever they can get, using incentives that are not calculated to produce a random cross-section of the community. Johnson, however, accepted the premise that probability sampling is essential, because "[a]s for randomness, the NRC report concludes that to be sufficiently random, the database need only consist of samples drawn at random from designated populations."(119) So far, so bad. The 1992 report never demanded probability sampling for the racial databases for the interim-ceiling calculations. On the contrary, it recommended "that estimates of population frequencies be based on existing data"(120) until random samples are collected from genetically homogenous ethnic groups across the world.(121)

Having misread the report as demanding random samples in all situations, the Johnson court imagined that they existed. "Randomness," the Chief Justice wrote, "is satisfied when there is linkage equilibrium and Hardy-Weinberg equilibrium."(122) Unlike Gertrude Stein's peach, however, there is randomness and there is randomness. Over time, a large, randomly mating, unstructured, closed population achieves Hardy-Weinberg equilibrium and approaches linkage equilibrium with respect to genes that do not mutate and that do not affect reproductive success.(123) While one can draw a random sample from a randomly-mating population, one also can draw a non-random sample from a randomly mating population. Similarly, one can use probability sampling methods to draw a random sample from a non-randomly mating population, or one can take a convenience sample from that population. Random mating is one thing; random sampling is another.

Yet, the court's discussion of the randomness in the population is not as foolish as all that. Consider the molecules buzzing about in a child's balloon. Presumably, about four-fifths are nitrogen, and about one-fifth are oxygen. It would be impossible to ascertain these proportions by identifying each molecule and picking a sample according to a table of random digits. Of course, one might try to collect molecules from randomly selected points in the balloon, but why bother? The molecules, careening and ricocheting, execute random walks. Nature, having randomized the molecules for us, has removed the need for random sampling. We do not bias the sample by taking molecules from one place as opposed to any other. Any convenience sample will do a good job of representing all the molecules in the balloon.

Thus, the Johnson court is on to a deep truth. If a closed population is randomly mating, and if alleles segregate according to Mendel's laws, then the equilibrium state of the pool of genes is much like the state of the molecules of air at thermal equilibrium. Any equally large subset of the population, drawn without regard to the genotypes of those sampled, will have the same chance of being representative of the entire gene pool. As long as VNTRs do not influence the likelihood of being a blood donor (and are not associated with any factors, like socio-economic status or religion, that might), a convenience sample from the blood bank is as good as a probability sample of voters picked for a Gallup poll. Consequently, population geneticists are comfortable with samples from sources like blood banks and genetic-counseling and disease-screening centers. They believe that "there is no reason to suspect that persons who contribute to blood banks . . . differ from a random sample of the population with respect to DNA markers."(124) In this sense, the "convenience samples are effectively random."(125)

If the existence of random mating is why the lack of random sampling is acceptable, then Johnson is much more radical than first appears. On the surface, the opinion merely holds that the ceiling method is generally accepted and can be used with the DPS database. But if there is random mating, then there is no reason to use the ceiling method, for that procedure was offered solely to account for departures from random mating.(126) Since the product rule always has been generally accepted in estimating a random match probability in a randomly mating population, if the premise of random mating supports non-random sampling, then it also supports the product rule. The logic of Johnson implies that nothing remains of Bible.(127)

On the other hand, the logic of Johnson, we have seen, is invalid. The absence of physical linkage between alleles or loci does nothing to undercut the argument that population structure precludes population models that use random mating to deduce genotype frequencies. Had the court understood this nasty fact, it might have proceeded in one of two ways. To keep its holding narrow, it could have upheld the ceiling method as an antidote to the possibility of non-random mating, and defended DPS's convenience sampling on the ground that empirical study shows that non-random sampling is adequate in this context.(128) Alternatively, it could have abandoned efforts to confine the holding to approving of the interim-ceiling method of calculation and followed other courts that have held that the simple product is generally accepted after all.(129) Either approach would have produced a sturdier opinion than the jury-rigged structure of Johnson.

E. Unanswered Questions

If one limits Johnson to its facts or takes the opinion at face value, the case establishes only that interim-ceiling estimates of VNTR-profile frequencies made from a database that is not obviously inconsistent with the assumption of Hardy-Weinberg equilibrium are admissible. All the other questions purposefully left open in Bible continue to plague the lower courts. Are qualitative rather than quantitative presentations of the DNA evidence allowed? Are estimates made with the simple product rule admissible? What of estimates made with modifications of the simple product rule that are more firmly grounded in the theory of structured populations than was the ceiling method?(130)

Of course, one would not expect a single case to answer all such questions, and caution is a passive virtue. But one might hope for guidance in the reasons a court offers for a wisely narrow holding. When we turn to use the reasoning that led the Johnson court to accept ceiling-method calculations to infer how the cluster od questions about DNA evidence should be answered, however, chaos follows. The reasoning is so infected with misstatements of the scientific principles that it neither supports the outcome in Johnson nor helps justify a particular outcome in any other case.

IV. Lessons from Bible and its Progeny

The Bible and Johnson opinions misuse scientific terms, misrepresent the simple product rule, and misread expert testimony. These criticisms would be academic if they had no impact on the court's reasoning. They would show that the court has not mastered the basics of population genetics, but that is hardly an impeachable offense. Regrettably, in this instance, the superficial rendition of the science makes a difference to the structure of the opinions and perhaps to the outcomes of the cases. For example, had the Bible court read and understood the testimony of Cellmark's experts about the reasons for the excess of homozygotes in the database (reasons that were consistent with the scientific literature on the point), it could not have held that one of these experts "conceded" or "admitted" that the database was "defective" and that this defect deprived the simple product rule computation of general acceptance in that case. Similarly, the Johnson court's misapprehension of the relationships between population structure, physical linkage, gene expression, random mating, Hardy-Weinberg equilibrium, and linkage equilibrium does more than result in an opinion that fails to use scientific terms correctly or precisely. It apparently leads the court to miss the very reason that the segment of the scientific community that favored the ceiling method gave for adopting it. As a result, the court ends up approving of that method on grounds that undercut the ceiling method's principal raison d'être.(131)

Unless judges are to become scientists -- a prospect that is unlikely to advance scientific progress or enhance the overall quality of judging -- can appellate courts confronted with complex scientific and statistical evidence fare any better? Are there procedures that promise to produce opinions that are scientifically as well as legally literate? I believe so.

One trivial approach would be to eschew opinions that try to explain scientific principles.(132) Instead, a court would confine itself to an opinion that merely announces the result and cites the scientific literature. An opinion that does not use scientific terminology cannot be accused of misusing that terminology. But such efforts to slip in under the scientific radar, so to speak, are unlikely to produce convincing opinions about controversial scientific evidence unless the court understands enough of the science to write a scientifically literate opinion in the first place.(133) A stealth bomber of an opinion is unlikely to get off the ground.

However elaborately or concisely a court chooses to write its opinions, it must understand what needs to be understood. Sometimes counsel do not provide adequate clarification, and a court might consider appointing a scientific panel to submit an amicus brief. Such a cry for help would be particularly appropriate when the pertinent scientific literature has yet to be digested and simplified by legal scholars. By the time Bible was decided, however, all the pieces were in place in the legal literature; before the Johnson opinion issued, they were polished and assembled in a comprehensive report by a second committee of the National Academy of Science. The Johnson court cited this literature, and the court evidently thought it understood the issues of population structure and random mating well enough to proceed without assistance.

Such confidence, however, was misplaced. Judges are experts in law, not science, and even with concise reference materials and judicial education programs, it may be unfair to expect them to know when they do not know what they need to know about science.(134) A solution might be to screen for scientific accuracy the final drafts of opinions on controversial scientific developments. For centuries, opinions have been screened for legal accuracy. Law clerks check citations (and serve as sounding boards for their judges' arguments). In appellate courts, judges review their colleagues' work and must be persuaded of its legal soundness before they will subscribe to an opinion. The process amounts to an internal peer review for legal analysis and writing that tends to extirpate gross errors about the law from opinions. Courts should consider instituting comparable prepublication review(135) by scientists of opinions that address scientific controversies or that include mathematical or statistical analyses that originate with the court.(136)

This proposal would not entail adding a scientist to the court's staff.(137) Many scientists are willing to review journal articles and grant proposals without compensation, and it seems likely that they would respond favorably to occasional requests from the judiciary. Neither is this a proposal for peer review as it is practiced by scientific journals. Indeed, it could be called a proposal for non-peer review. The reviewers would not be peers of the authors, but experts from another discipline, and the purpose obviously is not to help decide which opinions will be published. But the proposal for prepublication review is similar to peer review in bringing relevant expertise to bear and in having the potential to improve the content of what ultimately is published.

The major difficulty for the courts would be identifying suitable, independent experts to review presumably final drafts of opinions.(138) This task already faces trial judges who supplement the parties' efforts to provide expert testimony by appointing special masters(139) or experts who report to the court.(140) Professional societies and organizations, editors of leading journals, and distinguished senior scientists probably would be willing to name individuals who could provide constructive criticism of the technical portions of an opinion.(141) The parties also might suggest possible reviewers.(142) Carefully written instructions could help reviewers focus on possible errors or infelicities in the court's description of the science or scientific reasoning and steer them away from commenting on the evidence in the case.(143)

The layer of scientific review would have one untoward effect -- delay. Not only would a reviewer take some time to respond, but the parties should be given the opportunity to comment on any revisions that the court might decide to make as a result of the review. If the revisions are minor and do not affect the outcome, however, the judgment or order could issue before the opinion is published, and in all situations an expeditious schedule for review and objections could be imposed. In cases like State v. Johnson, where the opinion should be revised, a short delay is preferable to modifying an opinion after publication.(144)

In short, prepublication review of portions of opinions that discuss science is practicable. It would alert the courts to gross errors in their use of scientific principles or terminology. Cases like Bible and Johnson demonstrate all vividly that such errors do occur. They can affect the reasoning, if not the outcomes, of opinions. When that happens, they can rob an opinion of the persuasive value that it should have in forging a consensus as to what the law should be. When scientific evidence is at issue, bad science makes bad law. Perhaps good scientists can contribute to better law.


1. Regents' Professor, Arizona State University College of Law, Tempe AZ 85287-7906, Parts of this paper were delivered at the Arizona Judicial College's Seminar on Supreme Court Developments in September 1996. I am grateful to James Crow for comments and corrections and to Marilyn Taylor for the trial transcript in State.v. Bible, 858 P.2d. 1152 (Ariz. 1993), cert. denied, 114 S.Ct. 1578 (1994). [BACK]

2. People v. Wesley, 533 N.Y.S. 2d 643, 644 (Sup. Ct. 1988), aff'd, 589 N.Y.S. 2d 197 (App. Div. 1992). [BACK]

3. Office of Technology Assessment, Genetic Witness: Forensic Uses of DNA Tests (1990). [BACK]

4. National Research Council Committee on DNA Technology in Forensic Science, DNA Technology in Forensic Science (1992) [hereinafter 1992 NRC Report]. The Council is the operating arm of the National Academy of Sciences, the National Academy of Engineering, and the Institute of Medicine. [BACK]

5. State v. Bible, 858 P.2d. 1152 (Ariz. 1993), cert. denied, 114 S.Ct. 1578 (1994). [BACK]

6. National Research Council Committee on DNA Forensic Science: An Update, The Evaluation of Forensic DNA Evidence 4-6 to 4-7 (1996) (pre-publication copy) [hereinafter 1996 NRC Report]. [BACK]

7. State v. Johnson, 922 P.2d 294 (Ariz. 1996). [BACK]

8. 858 P.2d. 1152 (Ariz. 1993), cert. denied, 114 S.Ct. 1578 (1994). [BACK]

9. See, e.g., State v. DeSpain, No. 15589 (Super. Ct. Yuma Co. Feb. 12, 1991) (following substantial expert testimony and review of transcripts of similar testimony in other cases, the court found that "[t]here is not general acceptance within the relevant scientific community of the FBI procedures at this time to permit the test results to be offered in evidence"); State v. McComb, No. CR 90-06024 (Super. Ct. Maricopa Co. Mar. 29, 1991) (Memorandum of Decision and Order) (state's experts may give opinions based on Cellmark's single-locus VNTR tests, but they may not state "the probabilities of the alleged match within the eight banding patterns to have occurred by reasons of chance"). [BACK]

10. See infra Part I. [BACK]

11. 922 P.2d 924 (Ariz. 1996). [BACK]

12. For example, one expert testified in Bible that "the likelihood of a random individual sharing that same DNA banding pattern that was seen in the muscle tissue of Jennifer Wilson is approximately one in 14 billion." State v. Bible, No. 14105 (Ariz. Super. Ct. Mar. 28, 1990) (testimony of Lisa Forman, 13 Transcript 103). [BACK]

13. See infra Part I. [BACK]

14. In light of the other overwhelming circumstantial evidence incriminating Bible and the presence of a credible defense expert who contradicted Cellmark's personnel, the court deemed the error to be harmless and allowed the conviction and sentence to stand. [BACK]

15. The court noted "significant shortcomings" in the Frye test and quoted the criticism of Frye and the description of a competing approach given in 1 McCormick on Evidence § 203 (John W. Strong ed., 4th ed. 1994). 858 P.2d at 1181. The discussion is in contrast to Chief Justice Feldman's prior opinion in State ex rel. Collins v. Superior Court, 644 P.2d 1266 (Ariz. 1982). There, without so much a nod to decades of opinions and scholarship proposing alternatives to the general acceptance standard, the Chief Justice wrote that "Frye has been in use for almost 60 years without the development of any alternative as a general test of reliability. No such alternative has been seriously suggested in the cases or in the literature, nor does any occur to this court." Id. at __. [BACK]

16. The "general acceptance" standard emerged in United States v. Frye, 293 F. 1013 (D.C. Cir. 1923). Despite its novelty, the Frye court presented it as if it were a settled feature of the law on expert witnesses. See 1 McCormick, supra note 14, at § 203. [BACK]

17. 858 P.2d at __. [BACK]

18. Id. at 1180, 1185. [BACK]

19. Given the court's holdings as to general acceptance, this disposition reflects a classical and wise judicial caution against deciding issues unnecessarily. Having indicated why the expert testimony was erroneously admitted, any definition of testimony that would have been admissible could only have constituted dicta. To this extent, the court is correct in characterizing its opinion as adopting a "careful and cautious approach." 858 P.2d at 1193. In State v. Superior Court, 718 P.2d 171 (Ariz. 1986), however, the author of the Bible opinion penned an opinion for the court about another forensic technique that was then more novel than DNA profiling and that remains far less securely validated. See Joseph R. Meany, Note, Horizontal Gaze Nystagmus: A Closer Look, 36 Jurimetrics J. 383 (1996). There the Chief Justice opined that the roadside test for intoxication (the horizontal gaze nystagmus, or HGN, test) was admissible at trial even though the opinion could have ended with its simple holding that HGN (like much other evidence that is not necessarily admissible at trial) can be considered in ascertaining probable cause to arrest. [BACK]

20. 887 P.2d 572 (Ariz. Ct. App. 1994). [BACK]

21. Id. at __. [BACK]

22. If this is what the geneticist said, then the geneticist committed what is known in the statistical literature as the transposition fallacy. See infra Part II. [BACK]

23. Id. Unlike Bible, however, the majority opinion of Judge Ruth McGregor holds that the error was not harmless. [BACK]

24. 905 P.2d 515 (Ariz. Ct. App. 1995), rev. granted. [BACK]

25. Unfortunately, the court did not explain how the record established that geneticists generally agree that Dr. Helentjaris's analysis was capable of uniquely identifying DNA from palo verde trees. [BACK]

26. 905 P.2d 493 (Ariz. Ct. App. 1994), rev. granted. [BACK]

27. This witness was Professor Mary-Claire King. In addition to implying that the three loci identified in Hummert are sufficient to "uniquely identify every person," Dr. King testified that "the match based on those three probes indicates to me that the DNA . . . came from the same individual." Transcript, Oct. 17, 1991, at 25. The first NRC committee, on which Dr. King had served, had concluded that "an expert should -- given with [sic] the relatively small number of loci used and the available population data -- avoid assertions in court that a particular genotype is unique in the population." 1992 NRC Report, supra note 3, at 92. [BACK]

28. See, e.g., D.H. Kaye, The Forensic Debut of the NRC's DNA Report: Population Structure, Ceiling Frequencies and the Need for Numbers, 96 Genetica 99 (1995), also published in slightly different form in 34 Jurimetrics J. 369 (1994). [BACK]

29. 905 P.2d 572 (Ariz. Ct. App. 1995), rev. granted. [BACK]

30. Judge Philip Toci wrote the opinions in Hummert and Boles. [BACK]

31. State v. Johnson, 905 P.2d 1002 (Ariz. Ct. App. 1995), aff'd, 922 P.2d 924 (Ariz. 1996). [BACK]

32. 1992 NRC Report, supra note 3. [BACK]

33. The exception is State v. Clark, 887 P.2d 572 (Ariz. Ct. App. 1994). [BACK]

34. See David H. Kaye, DNA Evidence: Probability, Population Genetics, and the Courts, 7 Harv. J.L. & Tech. 101 (1993). [BACK]

35. Four nucleotide bases (abbreviated A, T, G, and C) are located along the double-helical backbone of the DNA molecule. One base (such as A) is attached to one helix. Another base is attached to the other helix, and the two bases are weakly bonded together in between the two strands of DNA. An A always pairs with T, and G binds to C. The sequence of the nucleotide bases is what carries the genetic information in the DNA molecule. For example, the sequence ATT on one strand (or TAA on the other stand) "means" something different than GTT (or CAA). [BACK]

36. According to the court:

The basis for DNA identity testing is the well-accepted proposition that "except for identical twins each individual has a unique overall genetic code." Present technology, however, does not permit testing of the entire DNA sequence but only of discrete, very limited DNA segments. "Because 99.9% of the DNA sequence in any two people is identical," accurate analysis is vital to determine whether there is a match of the remaining 0.1 percent of the DNA sequence from the samples compared.

858 P.2d at 1179-80 (citations omitted). "Accurate analysis" is vital, of course, but not because scientists must make hundreds of precise measurements, any one of which could lead to false results. Forensic DNA testing involves loci that are many times more variable than the average. By examining only these "hypervariable" regions, forensic DNA typing is quite efficient. See, e.g., Alec J. Jeffreys et al., Hypervariable "Minisatellite" Regions in Human DNA, 314 Nature 67 (1985). [BACK]

37. The hypervariable regions arise where there are "tandem repeats" of shorter sequences of base pairs. The number of repeats varies widely in the population. "Restriction Fragment Length Polymorphism" (RFLP) testing can detect these "Variable Number Tandem Repeats" (VNTRs). [BACK]

38. The phenomenon of homozygosity in some single-locus genotypes turned out to be crucial in Bible, although more explanation is required to see why. See infra Part IIB. [BACK]

39. The multilocus profile results from the use of a series of single-locus probes. Another kind of multilocus profile results from the use of a multilocus probe. A.J. Jeffreys et al., Individual-Specific "Fingerprints" of Human DNA, 316 Nature 76 (1985) (letter). Random match probabilities for multilocus probes, although extremely small, are more difficult to ascertain. As a result, multilocus probes have not been employed in criminal cases in this country. [BACK]

40. 858 P.2d at 1185-86 (citation and footnote omitted).[BACK]

41. Oddly, in interpreting the miscalculated random match probability of 1/20 as the source probability, the court ignored its own admonition that:

Any argument that the random match probability constitutes a "guilt probability" is, of course, incorrect and misleading. Indeed, as Dr. Forman testified, the DNA random match probability "says nothing about guilt or innocence." The random match probability assesses the likelihood that DNA samples selected at random would match. Guilt probability is "[t]he probability that the suspect is guilty of the crime in question." Although the random match probability may factor into the guilt probability calculation, the opposite is not true. Nor are the formulae for determining the two different probabilities the same. This court has never condoned jury use of guilt probability calculations, nor do we in this case.

. at 1185 n. 18 (citations omitted). In State v. Johnson, the court again seemed willing to transform a random match probability into a probability of guilt. It stated, without referring to any evidence in the case except the DNA match, that "[t]he jury evidently believed that odds of one to 312 million established guilt beyond a reasonable doubt." 922 P.2d at 294. [BACK]

42. Indeed, the court cited the fact that DNA analysis is "a complex scientific field" as a reason to use the deferential Frye test of general acceptance rather than the potentially more demanding and probing Daubert standard of scientific soundness. Id. at 1183. [BACK]

43. Id. at 1186 (citations and footnotes omitted). [BACK]

44. Id. at 1186 n. 23:

A simplistic hypothetical illustrates this point. Using eight frequency figures, and assuming the frequency rate for each figure is 0.1 (or 1 in 10), the probability of a random match would be 0.1 x 0.1 x 0.1 x 0.1 x 0.1 x 0.1 x 0.1 x 0.1 or 0.18 or 1 in 100,000,000. Using the same formula, if the true and correct frequency rate for each frequency figure is 0.3 (or 3 in 10), the probability of a random match would be 0.3 x 0.3 x 0.3 x 0.3 x 0.3 x 0.3 x 0.3 x 0.3 or 0.38 or 1 in 15,242. On the other hand, and again using the same formula, if the true and correct frequency rate for each frequency figure is 0.05 (or 1 in 20), the probability of a random match would be 0.05 x 0.05 x 0.05 x 0.05 x 0.05 x 0.05 x 0.05 x 0.05 or 0.058 or 1 in 25,600,000,000.

For a single locus, there cannot be eight bands. The relevant formulas for a multiple single-locus genotype and the probability of the types of sampling errors depicted in this example are discussed below. [BACK]

45. 1992 NRC Report, supra note 3, at 78. [BACK]

46. Id. at 1186 (citation and footnotes omitted). [BACK]

47. Id. at 1186 n. 25. [BACK]

48. The general population is composed of racial and ethnic subpopulations; under certain conditions, this population structure can produce correlations among the alleles that appear at the various loci. See infra Part IIIB. [BACK]

49. The court may have believed that the only threat to the assumption that loci are independent is linkage -- the tendency of the alleles at two or more loci to be inherited as a package rather than to assort independently as sex cells are formed. Indeed, the court relied on this faulty premise in State v. Johnson. See infra Part III. This would explain why the court thought that "truly random mating" was an assumption that had nothing to do with linkage equilibrium. See infra note 51.

If this is what the court meant by lack of linkage equilibrium, then this possible source of error in the product rule should have been dismissed as remote and not part of the scientific debate. There was never any serious dispute over the independent assortment of VNTRs used in forensic work, which are normally on entirely different chromosomes. Sex cells (eggs and sperm) differ from body cells in that they contain half of the full genetic complement of 23 pairs of chromosomes. One of each pair of chromosomes is enclosed in a sex cell, and no scientist has suggested that which one ends up there depends on VNTR alleles. The court itself recognized in a footnote that "[l]inkage disequilibrium may occur when DNA segments tested are in close physical proximity to each other. Testing DNA segments that are physically remote from each other may diminish linkage disequilibrium." Id. at 1186 n. 24 (citations omitted).  [BACK]

50. See, e.g., B. Devlin & Neil Risch, NRC Report on DNA Typing, 260 Science 1057 (1993) (letter arguing that there is less genetic variation within ethnic subpopulations than across races); B. Devlin & Neil Risch, Ethnic Differentiation at VNTR Loci, with Special Reference to Forensic Applications, 51 Am. J. Hum. Genetics 534 (1992) (estimates frequencies of true alleles in various populations and ethnic subgroups and concludes that VNTRs vary less among certain subgroups within ethnic groups than among racial groups); B. Devlin et al., Statistical Evaluation of DNA Fingerprinting: A Critique of the NRC's Report, 259 Science 748 (1993) (argues that substructure is not significant, that NRC Report ignores much of the scientific literature on this point, that NRC recommendations for analysis of ethnic subpopulations are flawed); Neil J. Risch & B. Devlin, On the Probability of Matching DNA Fingerprints, 255 Science 717 (1992) (number of matching loci in all pairs of people in FBI and Lifecodes databases for certain VNTRs is consistent with linkage equilibrium). [BACK]

51. 858 P.2d at 1186 (citation omitted). [BACK]

52. The court seems to think that "random mating" is pertinent only to Hardy-Weinberg equilibrium, and that "[a]s with the other assumptions, if there is no Hardy-Weinberg equilibrium, the product rule results may be incorrect by a substantial margin." Id. at 1186-87 (citations omitted). In reality, lack of random mating (in the technical sense of mating that is correlated with genotype) can cause departures both from Hardy-Weinberg and linkage equilibrium. As far as Hardy-Weinberg equilibrium goes, it has been shown that departures from Hardy-Weinberg equilibrium due to population substructure are more likely to overstate a single-locus genotype frequency than to understate it. That is, this possible error in the model that gives rise to the simple product rule inures, on average, to the benefit of the defendant. See Kaye, supra note 33, at 142. [BACK]

53. On average, the use of population-wide allele frequencies understates the genotype frequencies within defendant's subpopulation. See Dan E. Krane et al., Genetic Differences at Four DNA Typing Loci in Finnish, Italian, and Mixed Caucasian Populations, 89 Proc. Nat'l Acad. Sci. 10583 (1992); Stanley Sawyer et al., DNA Fingerprinting Loci Do Show Population Differences: Comments on Budowle et al., 59 Am. J. Hum. Genetics 272 (1996) (letter). [BACK]

54. See Kaye, supra note 33, at 142. [BACK]

55. On the problems in defining racial populations, compare C. Loring Brace, Region Does not Mean "Race"-- Reality Versus Convention in Forensic Anthropology, 40 J. Forensic Sci. 171 (1995), with Kenneth A.R. Kennedy, But Professor, Why Teach Race Identification if Races Don't Exist?, 40 J. Forensic Sci. 797 (1995). [BACK]

56. Compare R.C. Lewontin & Daniel L. Hartl, Population Genetics in Forensic DNA Typing, 254 Science 1745 (1991) ("there is, on average, one-third more genetic variation among Irish, Spanish, Italians, Slavs, Swedes, and other subpopulations than there is, on average, between Europeans, Asians, Africans, Amerindians, and Oceanians."), with Richard C. Lewontin, Discussion, 9 Stat. Sci. 259 (1994) ("all parties agree that differentiation among [major ethnic groups] is as large, if not larger than, the difference among tribes and national groups [within major ethnic groups]."). [BACK]

57. For a review of the literature on genetic differences across the globe, see Bernard Devlin & Kathryn Roeder, DNA Profiling: Statistics and Population Genetics, in Modern Scientific Evidence § 18-3.2.1 (David Faigman et al. eds., forthcoming 1997). [BACK]

58. See Kaye, supra note 33, at 146. [BACK]

59. For other cases that could have been disposed of similarly but were not, see id. at 142-47. [BACK]

60. See Kaye, supra note 33, at 142-47. [BACK]

61. 858 P.2d at 1187. [BACK]

62. Id. at 1118-89. [BACK]

63. See People v. Atoigue, DCA No. CR 91-95A (Guam Dist. Ct. App. Div. 1992); People v. Barney, 10 Cal. Rptr.2d 731, 744 (Ct. App. 1992); People v. Wallace, 17 Cal. Rptr. 2d 721, 726-27 (Ct. App. 1993); Commonwealth v. Lanigan, 596 N.E.2d 311, 316 (Mass. 1992); Commonwealth v. Curnin, 565 N.E.2d 440, 444-45 (Mass. 1991); State v. Schwartz, 447 N.W.2d 422, 428-29 (Minn.1989); State v. Vandebogart, 616 A.2d 483, 493-94 (N.H. 1992); State v. Cauthron, 846 P.2d 502, 514-17 (Wash. 1993). [BACK]

64. E.g., Seymour Geisser, Some Statistical Issues in Forensic DNA Typing 7-8, 10 (1996) (paper presented at the Third International Conference on Forensic Statistics, Edinburgh, July 1996). [BACK]

65. See, e.g., 1996 NRC Report, supra note 5, at 4-6, 4-7. [BACK]

66. Selection occurs when persons with different genotypes survive and reproduce at different rates. The VNTR and other loci traditionally used in forensic work are thought to be selectively neutral, or nearly so. [BACK]

67. See supra text accompanying Table 1. [BACK]

68. 1996 NRC Report, supra note 5, at 4-11. [BACK]

69. 858 P.2d at 1187. Another expert made no such concession. Id. [BACK]

70. No more than 100% of the population can have an undetected allele A2. The quantity 2p1p2 then becomes 2p1 × 1 = 2p1. [BACK]

71. As indicated in Part IIB, when excess homozygosity exists, the simple product rule applied to allele frequencies in the general populations typically benefits the defendant when the relevant population is the general population, but typically disadvantages the defendant when the appropriate reference group is defendant's subpopulation. Using 2p for an apparently homozygous locus results in a single-locus estimate that is always too large, but if the population is so highly structured as to produce a sample database with excess homozygosity, the multilocus estimates sometimes will be too small. [BACK]

72. 1996 NRC Report, supra note 5, at 4-17; Kaye, supra note 33, at 126 (describing the cases and the rejection in the scientific literature written before Bible of the view that reports of excess homozygosity indicates any serious problem). [BACK]

73. State v. Bible, No. 14105 (Ariz. Super. Ct. Mar. 28, 1990) (testimony of Lisa Forman, 13 Transcript 66-69). [BACK]

74. Consider the following exchange:

Q. Doctor, in the time that you have been looking at autorads for these probes, approximately how many autorads have you taken a look at?

A. In my lifetime as a scientist?

Q. Yes.

A. Thousands.

Q. Have you ever seen two DNA patterns that are exactly the same except for identical twins?

A. Even in inbred incest cases, I have never seen identical DNA banding patterns except for identical twins.

Q. Doctor, based on your review of the autorads in this case, and the notes, your understanding of population genetics, and the database that Cellmak uses, and all of your experience, do you have an opinion as to whether the blood that was found on the defendant's shirt is that of Jennifer Wilson's?

A. Yes, I do.

Q. What is that opinion?

A. The blood from the shirt and the DNA that we obtained from the muscle from the little girl identified as Jennifer Wilson are indistinguishable from one another.

Q. Thank you. I have no further questions for this witness, Your Honor.

Id. at 100-01. [BACK]

75. 858 P.2d at 1193. Even this restatement of the holding is overly broad. The court held only that the random match probability for a VNTR profile estimated via the simple product rule as applied to Cellmark's 1988 database for VNTRs is inadmissible. Cf. id. at 1180 n.15. ("Polymerase chain reaction technology was not used in this case. Thus, we do not consider any additional or differing issues surrounding that technology."); 1180 n.17 ("We are not presented with, and do not determine, the admissibility of DNA evidence when DNA testing is used to determine paternity. In paternity cases, different DNA testing technology apparently is used. [In fact, VNTR probes are often used in paternity testing. See, e.g., Jeffrey W. Morris & David W. Gjertson, The Scientific Status of Parentage Testing, in Modern Scientific Evidence § 19-2.0 (forthcoming 1997).] . . . Thus, the analysis in this case is limited to criminal cases in which RFLP technology is used and a match is declared."). [BACK]

76. See supra Part I. [BACK]

77. See supra note 14. [BACK]

78. 858 P.2d at 1181-82:

The Frye test, however, has significant shortcomings. New discoveries are not immediately accepted in the scientific community. Rigid application of the general acceptance test would forbid judicial use of a new discovery even though there may be direct experimental or clinical support for the principle. Furthermore, history shows that generally accepted scientific theory is not always correct.

Due in part to these concerns, a leading commentator writes that a "drumbeat of criticism ... provides the background music to the movement away from the general acceptance test." 1 McCormick on Evidence § 203, at 873. Although acknowledging Frye's worthwhile objectives, this commentator's further observations are worth repeating: [Frye's] objectives can be attained satisfactorily with less drastic constraints on the admissibility of scientific evidence. In particular, it has been suggested ... that courts look directly to reliability or validity rather than to the extent of acceptance, ... and that the traditional standards of relevancy and the need for expertise--and nothing more -- should govern. ... [This suggestion] avoids the difficult problems of defining when "scientific" evidence is subject to the general acceptance requirement and how general this acceptance must be, of discerning exactly what it is that must be accepted, and of determining the "particular field" to which the scientific evidence belongs and in which it must be accepted. [BACK]

79. Id. at 296. [BACK]

80. An analysis of the relative merits of Daubert, Frye, and other standards for screening scientific evidence would take us far afield. It may be worth noting, however, that (1) the Arizona Supreme Court did not find Frye attractive enough to adopt until nearly 40 years passed, (2) the court (like other courts that invoked it selectively for polygraph and not other cases) "adopted" it quite casually in State v. Valdez, 371 P.2d 894, 896-98 (Ariz. 1962), (3) the court's application of it to the "scientific community" of "highway safety professionals" in State v. Superior Court, 718 P.2d 171 (Ariz. 1986), illustrates one of the many problems with Frye, and (4) there are hundreds, if not thousands of opinions applying Daubert. At bottom, however, the problems that have surfaced with Frye in Arizona may be more closely related to the minimal scientific literacy shown in cases like Valdez and Superior Court than to the choice of any particular standard for passing on scientific evidence. See D.H. Kaye, Science in Evidence (forthcoming 1997). [BACK]

81. It also responds to the argument that the proponent of DNA evidence must state at the outset that the incriminating DNA profile was not seen in the laboratory's database of (some number of) DNA samples. It states that this presentation, although proposed by the NRC committee along with the interim-ceiling method, is not part of the ceiling method. 922 P.2d at 300. Although true as a matter of definition, this reasoning seems superficial. The second NRC committee reviewed the counting method and found it even less necessary and more problematic than the ceiling method. 1996 NRC Report, supra note 5, at 5-33. [BACK]

82. The first NRC committee proposed an "interim-ceiling" method (discussed below) as a stopgap pending the completion of studies to ascertain allele frequencies in a large number of ethnic subpopulations. The latter frequencies were to the used in full-blown ceiling calculations. See 1992 NRC Report, supra note 3. Both ceiling methods are described below. [BACK]

83. See 1996 NRC Report, supra note 5, at Appendix 6A (collecting cases). [BACK]

84. A few courts have reached a contrary conclusion. In State v. DeFroe, No. 92-1-03699-8 (Super Ct. King County June 23, 1993), and State v. Hollis, No. 92-104603-9 (Super Ct. King County June 23, 1993) (Findings of Fact and Conclusions of Law), the trial court ruled that even with ceiling frequencies, DNA evidence was not generally accepted and therefore inadmissible. After reviewing articles on the ceiling method and affidavits and testimony from many expert witnesses, the court found the interim-ceiling method to be "a statistical technique without scientific basis, contrived by compromise or pressure from the law enforcement community." [BACK]

85. But there are other grounds for complaint. Many scientists seem to object to the method because it seems downright clutzy. It uses some rules of thumb and builds on little or no population genetics theory. As one respected statistician put it in an affidavit filed in Hollis and DeFroe, the ceiling method is "data-driven, interest-ridden, pseudo-statistical, ad hoc methodology, to which no statistician or scientist should be a party." (This criticism was directed not just at the ceiling method, but also at various other attempts by both sides to adjust the uses of the ceiling method to their perceived goals.) [BACK]

86. See 1992 NRC Report, supra note 3, at 91-92. [BACK]

87. Actually, an even larger figure is used -- the upper 95% confidence limit on the allele frequency estimate for that race. This is intended to account for sampling error due to the limited size of the databases. Id. at 92. [BACK]

88. See supra Part II. [BACK]

89. See, e.g., 1996 NRC Report, supra note 5, at 5-30 ("sufficiently conservative to accommodate the presence of substructure . . . a lower limit on the size of the profile frequency"); 1992 NRC Report, supra note 3, at 91 ("conservative calculation"). [BACK]

90. The court asserts that linkage equilibrium means something quite different: "Linkage equilibrium refers to the principle of independent assortment, which states that the frequency of occurrence of alleles expressing different genetic traits will be determined independently of the frequency of the occurrence of other alleles in the sample. See Monroe W. Strickberger, Genetics 104-05 (3d ed., Macmillan Publishing Co., 1985)." 922 P.2d at 927. No such statement is to be found at those pages, which discuss how Mendel's classic experiments in crossing pea plants established that genetic traits did not blend in offspring, but were inherited as dominant and recessive traits.

Independent assortment of genes actually refers to the formation of gametes (sex cells, which contain one of each pair of chromosomes). Mendel's first law states that the two alleles of one locus segregate from each other into the gametes in a 50:50 ratio. Independent assortment (Mendel's second law) states that during gamete formation, the way that the alleles of one locus segregate is independent of the way the alleles of another locus segregate. Genes that are "linked" (by being close together on the same chromosome) do not assort independently. Linkage thus interferes with linkage equilibrium, but so do other things -- like population structure. Consequently, genes that assort independently need not exhibit linkage equilibrium in a population.

The use of the same term "linkage" in the two contexts is a notorious source of confusion to students of genetics, but most of the scientific literature on which the court purports to rely throughout its opinion is clear. See M. Krawczak & J. Schmidtke, DNA Fingerprinting 63-64 (1994) (distinguishing between "allelic association" and "linkage disequilibrium" as one possible cause of allelic association); 1992 NRC Report, supra note 3, at 78-79.[BACK]

91. It is not a statement about the "linkage" between loci. [BACK]

92. 922 P.2d at 297. [BACK]

93. Id. [BACK]

94. E.g., United States v. Yee, 134 F.R.D. 161 (N.D. Ohio 1991), aff'd sub nom. United States v. Bonds, 12 F.3d 540 (6th Cir. 1993). [BACK]

95. 922 P.2d at 297. [BACK]

96. For some reason, the defense expert in Bible opined that the hypervariability in VNTRs makes population structure more likely to be a problem. [BACK]

97. The Johnson court writes that "[t]hus, as the NRC report makes clear, the assumption of linkage equilibrium inherent in protocols such as RFLP is well-grounded and has been proved accurate for purposes of DNA profiles. . . . Accordingly, the assumption of linkage equilibrium for purposes of RFLP analysis and use in applying the product rule has been demonstrated to be generally accepted in the relevant scientific community." 922 P.2d at 297.

A smaller correction is that no "assumption of linkage equilibrium [is] inherent in protocols such as RFLP." RFLP analysis yields an RFLP genotype. Linkage equilibrium is an assumption that can be used in answering the distinct question of how rare or common an RFLP profile might be. [BACK]

98. The report defines linkage equilibrium as follows: "The frequency of a complete genotype is calculated by multiplying the genotype frequencies at all the loci. [T]his calculation assumes that there is no correlation between genotypes at different loci; the absence of such correlation is called linkage equilibrium." 1992 NRC Report, supra note 3, at 78. It then states that "the validity of the multiplication rule depends on the absence of population substructure, because . . . in this special case . . . the different alleles [are] uncorrelated with one another." Id. at 79. Next, it observes that "[t]he key question underlying the use of the multiplication rule is whether actual populations have significant substructure for the loci used for forensic typing." Id. This question, the committee reports, "has provoked considerable debate among population geneticists . . . ." Id. [BACK]

99. It appears that the Johnson court misunderstood a sentence in the 1992 report that reads "[p]airwise comparisons of alleles frequencies have not revealed any correlation across loci." 1992 NRC Report, supra note 3, at 77. As the opening and closing sentences of the paragraph make plain, however, this sentence refers to "blood-group frequencies," not VNTR loci. For those, the committee states, the "situation is substantially different." Id. [BACK]

100. The committee accepted the reservations of some population geneticists as serious enough to justify procedures that do not assume that alleles and loci are uncorrelated: "Although mindful of the controversy, the committee has chosen to assume for the sake of discussion that population substructure may exist and provide a method for estimating population [genotype] frequencies in a manner that adequately accounts for it." Id. at 80. The concerns over Hardy-Weinberg and linkage equilibria thus motivated the committee's presentation of the ceiling methods. Id. at 80. [BACK]

101. If anything, a few courts have made the opposite mistake of suggesting that the committee sided with the population geneticists who contended that population structure exists, is likely to be substantial, and makes the simple product rule scientifically unacceptable. See Kaye, supra note 27, at 100-01.

Johnson cites a second authority, M. Krawczak & J. Schmidtke, supra note 89, at 74, for the proposition that "the assumption of linkage equilibrium for purposes of RFLP analysis and use in applying the product rule has been . . . generally accepted in the relevant scientific community." 922 P.2d at 297. This monograph, written for a British "Medical Perspectives Series" by two German workers at the Institut für Humangenetik, Medizinische Hochsule, in Hannover, says nothing of the kind. It merely describes the 1991 Science paper by Lewontin and Hartl and the rejoinder by Chakraborty and Kidd. It refers to this exchange as "an intense debate" in which Lewontin and Hartl "seriously criticized the use of the multiplication rule." Krawczak & Schmidtke, supra, at 73. Ironically, Bible presented the same two papers, with their "radically conflicting views of statistical probability calculations" as proof of a "bitter dispute" and a "lack of general acceptance." 858 P.2d at 1187. In short, the discussion in Krawczak & Schmidtke gives no more support to the court's new-found faith in linkage equilibrium than does the 1992 NRC report. [BACK]

102. 1996 NRC Report, supra note 5, at 5-31 to 5-32. [BACK]

103. See id. at 4-20 to 4-26 (discussing studies); R. Chakraborty et al., Intraclass and Interclass Correlations of Allele Sizes Within and Between Loci in DNA Typing Data, 133 Genetics 411 (1993); D.W. Gjertson & J. W. Morris, Assessing Probability of Paternity and the Product Rule in DNA Systems, 96 Genetica 89 (1995); J.W. Morris & D.W. Gjertson, The Paternity Index, Population Heterogeneity, and the Product Rule, in 5 Advances in Forensic Haemogenetics 435 (W. Bar et al. eds., 1993); Bruce S. Weir, The Second National Research Council Report on Forensic DNA Evidence, 59 Am. J. Human Genetics 497 499 (1996) (invited editorial) (although "[t]he staunchest critics of the product rule have stressed the difficulty of testing for independence between alleles over several loci," "two-locus tests have been found to behave satisfactorily, and . . . it is unlikely that there will be dependence at larger numbers of loci when there is no evidence for dependence at one or two loci"). [BACK]

104. See 1996 NRC Report, supra note 5 (Recommendation 4.1). [BACK]

105. See also M. Krawczak & J. Schmidtke, supra note 89, at 64-66; H. Eldon Sutton, An Introduction to Human Genetics 508-09 (4th ed. 19__) (discussing the effect of population of population stratification on Hardy-Weinberg equilibrium). [BACK]

106. 922 P.2d at 297. [BACK]

107. The premise that any genes might code for "cultural" characteristics is interesting, but not central to the present discussion. In full, the relevant paragraph reads:

Of course people who live in close geographic proximity to each other are more likely to choose each other as mates, and people often select mates on the basis of certain physical, racial, cultural, and behavioral characteristics. However, the alleles used in DNA profiling do not represent physical, racial, cultural, and behavioral characteristics and are therefore not the basis for the choice of mates. Accordingly, the alleles used for profiling remain in Hardy-Weinberg equilibrium.

(citation omitted).

When people choose mates because of particular, genetically-based traits, mating is said to be "assortative." The court assumes that non-assortative mating ensures Hardy-

Weinberg equilibrium. It does not, for assortative mating is but one of many possible reasons for deviations from Hardy-Weinberg equilibrium. See, e.g., H. Eldon Sutton, supra note at 104, at 508-12 (listing six "causes of nonrandom mating"). [BACK]

108. If non-coding DNA were sufficient to guarantee random mating and hence Hardy-Weinberg equilibrium, the NRC committee would not have proposed the ceiling method to compensate for possible departures from Hardy-Weinberg equilibrium. Again, however, the court's error does not mean that the ceiling-method estimates should be excluded. [BACK]

109. The full paragraph is this:

Our concern with Hardy-Weinberg equilibrium in Bible was not with the general acceptance of the scientific principle but instead was limited to Cellmark's admittedly defective database. Bible, 175 Ariz. at 585-86, 858 P.2d at 1160-61. Unlike the situation with Cellmark's database, Hogan testified to testing for and finding the DPS database in Hardy-Weinberg equilibrium. Nothing in the record refutes this testimony.

922 P.2d at 297. As explained in Part II, no one admitted that the database in Bible was defective. [BACK]

110. The DPS database had about 200 individuals per race. [BACK]

111. See 1996 NRC Report, supra note 5, at 4-17 ("the power of standard methods to detect a statistically significant deviation is very small"); 1992 NRC Report, supra note 3, at 81("even large and significant differences between subgroups will produce only slight deviations from Hardy-Weinberg expectations"). [BACK]

112. See 1996 NRC Report, supra note 5. [BACK]

113. See, e.g., Judith M. Tanur, Samples and Surveys, in Perspectives on Contemporary Statistics 55-70 (David C. Hoaglin & David S. Moore eds., 1992); Hans Zeisel, The Uniqueness of Survey Evidence, 45 Cornell L.Q. 322 (1960). [BACK]

114. See, e.g., 1 McCormick on Evidence, supra note 14. § 208; Vic Barnett, Sample Survey Principles and Methods (1994); William G. Cochran, Sampling Techniques (3d ed. 1977). [BACK]

115. The court mangled the 1992 NRC report in an effort to support the use of a database of 200 individuals. It observed that whereas "the recommended sample size for the NRC's 'non-modified' ceiling method . . . is [only] 100 for a given racial group," the "DPS database consisted of approximately 200 samples for each of four racial groups." 922 P.2d at 298. The NRC committee, however, was not referring to samples from "racial groups." It "strongly recommend[ed]" drawing "[r]andom samples of 100 persons . . . from each of 15-20 populations, each representing a group relatively homogeneous genetically." 1992 NRC Report, supra note 3, at 83. These 15-20 samples were to "span the range of ethnic groups" and the highest frequency of an allele seen in any of the 15-20 samples (rounded up, if need be, to 5%) was to be used tin a ceiling calculation. Id. Because the objective was to choose the largest values from across 15-20 samples, and small values would be forced up to 5%, sampling error in each ethnic sample was not a major concern. Although Johnson's misreading of this part of the report is all too typical of the opinion, it does suggest another, more persuasive argument. The fact that the interim-ceiling method uses samples from the three major racial groups and rounds up to 10% also diminishes the importance of random sampling error, and hence reduces the need for large samples. [BACK]

116. Kaye, supra note 33, at 121 ("The appropriate reaction to the sample size concern is neither to reject the sample statistic out of hand nor to accept it without qualms, but to press for a range of estimates indicating the extent to which the calculation might vary from one small sample to another."). [BACK]

117. See, e.g., Ranajit Chakraborty, Sample Size Requirements for Addressing the Population Genetic Issues of Forensic Use of DNA Typing, 64 Hum. Biology 141, 156-57 (1992). The court apparently did not examine this study directly, for it cited it as having been cited in a military law review article. A work that the court did cite repeatedly in Johnson undercuts this claim. See M. Krawczak & J. Schmidtke, supra note 89, at 76 ("the NRC's recommended sample size of 100 individuals per population" is "difficult to justify" because "[i]n the case of a highly polymorphic locus with alleles that are rare all over the world, . . . a sample of 100 people may turn out to be too small."). [BACK]

118. 922 P.2d at 298. [BACK]

119. Id. (citing "NRC report at 77, 83"). [BACK]

120. 1992 NRC Report, supra note 3, at 91; cf. id . at 77 (stating that the allele frequencies used in the simple product rule are valid for "a sample that is truly random with reference to the genetic type"); id. at 83 (recommending "random samples of 100 persons . . . from each of 15-20 populations" for the final ceiling method). [BACK]

121. See supra note 114. [BACK]

122. 922 P.2d at 298. [BACK]

123. Even with mutation and selection, random mating leads to Hardy-Weinberg ratios in the zygote stage. There may be departures as the population ages if there is differential mortality. The effect of mutation and selection on multi-locus genotype proportions is more complicated. [BACK]

124. 1996 NRC Report, supra note 5, at 5-2. [BACK]

125. Id. Some research supports these expectations. See id.; Bernard Devlin & Neil Risch, Ethnic Differentiation at VNTR Loci, with Special Reference to Forensic Applications, 51 Am. J. Hum. Genetics 534, 545-46 (1992). Some statisticians are still skeptical. See supra note 63. [BACK]

126. See supra Part IIIB. But see Richard Lempert, DNA, Science and the Law: Two Cheers for the Ceiling Principle, 34 Jurimetrics J. 41 (1993) (defending the ceiling principle as an unintended but reasonable way to cope with the possibility that close relative of the person whose DNA matches the evidence sample is the source of the DNA in that sample). [BACK]

127. Almost nothing -- the court's mistaken reading of the expert testimony about Cellmark's database remains. That reading is moot, however, because that database has fallen into desuetude. [BACK]

128. For an argument along these lines, see 1996 NRC Report, supra note 5, at 5-1 to 5-2. [BACK]

129. The Washington Supreme Court recently reversed its stance on the product rule. See State v. Copeland, 1996 WL 528846 (No. 62417-8, Sept. 19, 1996) (product-rule estimates generally accepted), overruling State v. Cauthron, 846 P.2d 502 (1993) (product-rule estimates not generally accepted). [BACK]

130. For an exposition of these methods, see, e.g., 1996 NRC Report, supra note 5, at chapter 5. In State v. McMillan, No. CR 93-10076 (Super. Ct., Maricopa County, Ariz. Oct. 16, 1995), the state argued that estimated obtained with these procedures, as well as the interim-ceiling estimates, should be admissible. Judge Reinstein, anticipating Johnson, ruled that the interim-ceiling method was generally accepted as being generous to defendants, but that until the second NRC committee issued its report, it would be premature to make a finding on the other methods that the committee was examining. Judge Reinstein also held that qualitative expert testimony on the infrequency of matching DNA profiles was admissible in addition to or in lieu of numerical statements. [BACK]

131. But see Lempert, supra note 125, for an independent justification (also ignored by the approach in the opinion). [BACK]

132. Cf. Panel on Statistical Assessments as Evidence in the Courts, The Evolving Role of Statistical Assessments in the Courts 15 (Stephen E. Fienberg ed., 1989) ("[I]n general, judges should not conduct analytical statistical studies on their own."). [BACK]

133. The Arizona Supreme Court's treatment of the scientific literature in certain cases other than Bible and Johnson paints a bleak picture. In State v. Valdez, 371 P.2d 894 (Ariz. 1962), the case that offhandedly adopted Frye, the court deemed polygraph evidence admissible upon a stipulation prior to the testing from both parties. It justified this liberalized rule by the observation that "polygraphic interrogation . . . has been considerably improved since Frye v. United States was decided in 1923." Id. at 900. The evidence of this improvement consisted of a "conservative estimate" derived from "experiments" that established that "5 per cent or less is the margin of error." Id. The accompanying footnote reads: "These statistics are taken from Dean Wicker's discussion of Inbau's experiments regarding accuracy of the polygraph. See 22 Tenn. L. Rev. at 713." Inspection of the Tennessee Law Review article reveals that the sole support for this "conservative estimate" comes from a 1953 article by an attorney describing the remarks in a 1948 book by another attorney who served also as director of a crime laboratory. There is no indication in the article of a single experiment. The 5% figure comes from the director's impression of "several thousand examinations" covering "a period of sixteen years." Wicker, The Polygraphic Truth Test and the Law of Evidence, 22 Tenn. L. Rev. 711, 713 (1953).

Of course, Valdez was decided before the Soviet Union's Sputnik shocked a complacent United States into efforts to improve science education, and the casual approach to the experimental method reflected in that case might be dated. In State v. Superior Court, 718 P.2d 171 (Ariz. 1986), the Arizona Supreme Court showed that it had to the capacity to locate and cite a more substantial body of scientific literature. Unfortunately, its notion of what constituted scientific literature remained primitive. The opinion, written by Chief Justice Feldman, included two appendices intended to demonstrate general acceptance of the proposition that Horizontal Gaze Nystagmusis is diagnostic of alcohol intoxication. The first appendix listed the literature cited by the state. The list consisted of seven articles or reports. The majority, four, never appeared in any scientific journal, but were published by the Department of Transportation, which presumably funded them. Another was a second-hand discussion in a looseleaf treatise for attorneys, 1 Richard Erwin, Defense of Drunk Driving Cases § 815A[3] (3d ed. 1985), asserting that "[a] strong correlation exists between the BAC [blood alcohol concentration] and the angle of onset of [gaze] nystagmus." Only two of the seven were refereed papers in respected journals, and neither claimed that measuring the angle of the onset of nystagmus was a reliable indicator of blood alcohol concentration. C. Rashbass, The Relationship Between Saccadic and Smooth Tracking Eye Movements, 159 J. Physiol. 326 (1961) (barbiturate drugs interfere with smooth tracking eye movement); J.M. Wilkinson et al., Alcohol and Human Eye Movement, 97 Brain 785 (1974) (oral dose of ethyl alcohol impaired smooth pursuit eye movement of all human subjects).

The court also undertook its own, unaided study of the scientific literature on horizontal gaze nystagmus and intoxication, which it summarized by listing in an appendix to its opinion the 22 papers it located. The citations there suggest that the court's study went no further than a review of abstracts from computerized databases. Indeed, one paper apparently has never been published, and only the abstract of a conference presentation is mentioned. Some of the papers have little to do the validity or reliability of detecting intoxication by nystagmus. E.g., W.J. Oosterveld et al., Quantitative Effect of Linear Acceleration on Positional Alcohol Nystagmus, 45 Aerospace Med., July 1974, at 695 (G-loading brings about Positional Alcohol Nystagmus even when subject has not ingested alcohol; however when subjects ingested alcohol, no PAN was found when subjects were in supine position, even with G-force at 3). Others appear in unrefereed periodicals that can hardly be considered part of the scientific literature. E.g., Norris, The Correlation of Angle of Onset of Nystagmus With Blood Alcohol Level: Report of a Field Trial, Calif. Ass'n Criminalistics Newsletter, June 1985, at 21, 22; Seelmeyer, Nystagmus, A Valid DUI Test, Law and Order, July 1985, at 29 (horizonal gaze nystagmus test is used in "at least one law enforcement agency in each of the 50 states" and is "a legitimate method of establishing probable cause."). [BACK]

134. The Federal Judicial Center, with the support of the Carnegie Corporation of New York, has made major and sustained efforts at judicial education in science. See Joe S. Cecil et al., Preface, in Reference Manual on Scientific Evidence vii (Federal Judicial Center ed., 1994). The National Judicial College in Reno, Nevada, provides continuing judicial education courses on scientific evidence. These are useful efforts, but having lectured for both organizations, I doubt that they are nearly sufficient to solve the problem of scientific error in opinion-writing. Other efforts at judicial education also exist. See, e.g., Eliot Marshall, The Genome Project's Conscience, 274 Science 488 (1996) (reporting that a portion of the Department of Energy's budget for studies of the ethical, legal, and social implications of the Human Genome Project goes toward providing seminars on genetics for judges). [BACK]

135. The idea is neither original nor unprecedented. Chief Judge Jack Weinstein of the Eastern District of New York circulated a draft opinion to non-judicial personnel in United States v. Shonubi, 895 F. Supp. 460 (E.D.N.Y. 1995). [BACK]

136. Arguably, courts should not even attempt their own analyses of statistical data. See Panel on Statistical Assessments as Evidence in the Courts, supra note 131, at 15 (because "the modes of self-education available to judges are unlikely to give them full appreciation of various factors that must be considered in performing and interpreting statistical analyses," "in general, judges should not conduct analytical statistical studies on their own."). [BACK]

137. In contrast, in United States v. United Shoe Machinery Corp., 100 F. Supp. 295 (D. Mass. 1953), aff'd, 347 U.S. 521 (1954), Judge Charles Wyzanski appointed an economist as his technical adviser by hiring him as a law clerk. See Carl Kaysen, An Economist as the Judge's Law Clerk in Sherman Act Cases, 12 A.B.A. Antitrust Section L. Proceedings 43 (1958). Judge Wyzanski later said that it would have been preferable to appoint a special master who communicates with the parties jointly, prepares a written report for the judge and the parties, and is available for cross-examination by both sides. Charles Wyzanski, The Law of Change, 1968 N.M.Q. 5, 19-20. The use of experts to review aspects of judicial opinions is less intrusive, but avoiding ex parte communications remains desirable. [BACK]

138. Cf. Samuel R. Gross, Expert Evidence, 1991 Wisc. L. Rev. 1114, 1191 (describing factors that may have constrained the use of court-appointed experts). [BACK]

139. See, e.g., Margaret G. Farrell, Special Masters, in Reference Manual on Scientific Evidence 575 (Federal Judicial Center ed., 1994). [BACK]

140. For examples of court appointments, see, e.g., DePyper v. Navarro, No. 83-303467-NM, 1995 WL 788828 (Mich. Cir. Ct. Nov. 21, 1995) (four court-appointed experts testified to standards and methods for determining causation in Bendectin case); In re Swine Flu Immunization Products Liability Litigation, 495 F. Supp. 1185 (___) (panel of five medical experts); Joe S. Cecil & Thomas S. Willging, Accepting Daubert's Invitation: Defining a Role for Court-Appointed Experts in Assessing Scientific Validity, 43 Emory L.J. 995 (1994); Carl B. Rubin & Laura Ringenbach, The Use of Court Experts in Asbestos Litigation, 137 F.R.D. 35, 36 (1991). [BACK]

141. Cf. Marcia Angell, Science on Trial: The Clash of Medical Evidence and the Law in the Breast Implant Case 205 (1996) ("[r]eputable experts could be recommended to the courts [for appointment as independent witnesses] by established scientific organizations such as the National Academy of Sciences or the American Association for the Advancement of Sciences"); Eliot Marshall, New York Courts Seek 'Neutral' Experts, 272 Science 189 (1996)(reporting that the American Association for the Advancement of Science and Duke Law School's Center for Private Adjudication are "trying to develop information centers that would help the courts find reliable experts"). [BACK]

142. Cf., e.g., BIEC Int'l, Inc. v. Global Steel Servs. Ltd., 791 F. Supp. 489, 542 (E.D. Pa. 1992) (each party proposed 10 names of qualified persons to serve as a special master); Superior Beverage Co. V. Owens-Illinois, Inc., No. 83-C-512, 1987 WL 9901 (N.D. Ill. Jan 30, 1987) ("the Court invited the parties jointly to submit a list of five experts acceptable to the parties from which the Court would then select the person to be appointed. The parties were, unfortunately, unable to reach agreement," and the court choose its expert after canvassing "many individuals within the judicial and academic communities".); cases cited, Joe S. Cecil & Thomas E. Willging, Court-Appointed Experts, in Reference Manual on Scientific Evidence 525, 546 n.54 (Federal Judicial Center ed., 1994). [BACK]

143. Cf. Cecil & Willging, supra, note 141, at 547-48 (describing methods of instructing court-appointed experts). [BACK]

144. For an instance of such a modification required by an inopportune choice of statistical terminology, see Brock v. Merrell Dow Pharmaceuticals, Inc., 884 F.2d 166 (5th Cir.), modifying 874 F.2d 307 (5th Cir. 1989), cert. denied, 494 U.S. 1046 (1990). [BACK]

home page publications

updated 9/2/97