The Dynamics of Daubert:
Methodology, Conclusions, and Fit
in Statistical and Econometric Studies

D.H. Kaye*


This paper is published in the Virginia Law Review, Vol. 87, No. 8, Dec. 2001, pp. 1933-2018. © 2001 Virginia Law Review Association. The Sixth Circuit's opinion in the case discussed in Part II is available as a slip opinion filed May 15, 2002. This opinion, which appeared after this article was written, is discussed in D.H. Kaye, Adversarial Econometrics in United States Tobacco Co. v. Conwood Co.,43 Jurimetrics J. 343 (2003).

[----------1933----------]
INTRODUCTION 1933
I. STANDARDS FOR ADMITTING SCIENTIFIC EVIDENCE 1937
A. The Classical Period: Relevant Expertise 1938
B. The Modern Period: Heightened Scrutiny for Scientific Evidence 1943
1. General Acceptance: Frye 1945
2. Relevancy-Plus: The Road to Daubert 1956
3. Scientific Soundness: Daubert 1958
C. The Puzzles of Strict Scrutiny 1964
1. The Boundary Problem 1964
2. The Usurpation Problem and the Methodology-Conclusion Puzzle 1972
D. Looking Back at Statistical Evidence 1985
II. BRINGING DAUBERT AND KUMHO TO BEAR: CONWOOD CO. v. UNITED STATES TOBACCO CO. 1988
A. Conwood's Complaint: Monopolizing Moist Snuff 1988
B. Conwood's Resistance Theory 1989
C. Conwood's Data 1990
D. Regression Analysis to Show Causation 1992
1. The Regression Results 1992
2. The Causal Inference 1992
E. Regression Analysis to Estimate Damages 2002
1. Estimating Effect with a "Regression Rectangle" 2002
2. Applying Daubert to the "Regression Rectangle" 2006
3. "Internal" Criticisms of the Regression 2011
CONCLUSION 2013
APPENDIX 2015

[----------1934----------]

INTRODUCTION

    In Daubert v. Merrell Dow Pharmaceuticals, Inc., (1) the Supreme Court stated the obvious--trial judges have a "gatekeeping role" (2) when it comes to scientific evidence. The Court's conclusion--that the Federal Rules of Evidence dispense with the "general acceptance" standard that previously dominated the field--is less obvious. (3) Still, the "reliability" standard announced in Daubert was nothing new. Rather, this standard reiterates the law as it then stood in many jurisdictions. (4) The striking feature of both the reliability and the general acceptance standards is that the court must subject "scientific" evidence to heightened scrutiny. (5) This approach creates two broad problems -- the "boundary problem" of identifying the type of evidence that warrants such careful screening (6) and the "usurpation problem" of keeping the trial judge from closing the gate on evidence that should be left for the jury to assess. (7)

    Being less revolutionary than one might think from the volumes that have been written about it, Daubert does little to resolve these perdurable puzzles and problems. The Supreme Court's more recent opinion in Kumho Tire Co. v. Carmichael, (8) sidesteps the boundary problem by making the reliability standard applicable to all expert [----------1935----------] testimony (9) and demanding more "rigor" for all expert testimony. (10) The emphasis on intellectual rigor, however, has the potential to exacerbate the usurpation problem. (11) This threat is intensified by the Court's opinion in General Electric Co. v. Joiner, (12) which encourages the trial court to exclude testimony because it disagrees with the expert's conclusions as well as the underlying scientific method. (13).

    This paper will propose at least partial solutions to the boundary and usurpation problems, and it applies them to statistical and econometric proof. In addition, it reviews the developments that have culminated in the modern use of sophisticated statistical equations and models to prove factual claims such as the presence of illegal discrimination, (14) racial polarization in voting, (15) the identity of criminals, (16) the existence of forgeries, (17) the causes of [-----------1936-----------] trends in sales or prices, (18) and the quantum of damages caused by illegal conduct. (19)

    Part I will show that before Daubert, the admissibility of complex statistical evidence usually was taken for granted, and arguments centered on the weight to be accorded to this evidence in particular cases. Today, pretrial motions challenging the admissibility of statistical studies have become commonplace. Federal courts now must fit this type of expertise into the framework for determining admissibility constructed in the Daubert-Joiner-Kumho trilogy and codified in the Federal Rules of Evidence. (20) Part I also will analyze the admissibility issue under other standards for screening scientific evidence. Some states that use a scientific validity standard à la Daubert might not follow Kumho Tire. These jurisdictions will have to determine whether statistics and economics are subject to any form of heightened scrutiny. Some states might resist Joiner's blurring of the distinction between methodology and conclusion. These jurisdictions will have to decide what aspects of statistical testimony constitute the methodology that must be scientifically valid. Finally, states that adhere to the standard of general scientific acceptance face [----------1937----------] comparable challenges in defining the subject matter of statistics and economics and the scope of this test for the admissibility of scientific evidence.

    After analyzing the leading cases on scientific evidence and discussing their effects on efforts to introduce statistical proof, this paper will consider these emerging issues in the context of an antitrust case in which an econometric analysis was introduced to show both causation and damages. By describing the arguments on a pending appeal, Part II illustrates the difficulty of distinguishing between statistical methodology and conclusion, but concludes that the distinction is viable and valuable. The discussion also reveals the extent to which the dictum in Kumho Tire concerning the need for rigor encourages arguments as to admissibility that, in an earlier era, would have been treated as affecting only the weight of expert evidence. Finally, the case shows how difficult it can be to explain to judges and juries serious methodological defects in statistical assessments.

    The paper will conclude that Daubert-like screening of complex statistical analyses is a salutary development, but that the task requires the elaboration of standards that attend to the distinction between a general methodology and a specific conclusion. Screening statistical proof demands some sophistication in evaluating the choice of a research design or statistical model, the variables included in a particular model, the procedures taken to verify the usefulness of the model for the data at hand, and the inferences or estimates that follow from the statistical analysis. The factors enumerated in Daubert work reasonably well with some of these aspects of the expert's work, but these factors are less well adapted to others. If the "intellectual rigor" standard of Kumho is used to fill the gap, it must be applied with some caution lest it become a subterfuge for excluding expert testimony that is less than ideal but still within the range of reasonable scientific debate.

I. STANDARDS FOR ADMITTING SCIENTIFIC EVIDENCE

    Statistics are part of science, and science is one type of expertise. To appreciate how the law of evidence pertains to statistical proof, we must consider the rules of evidence as they apply to experts in general and to scientific testimony in particular. With that necessary prolegomenon, we will be in a position to determine how [----------1938----------] these approaches to regulating scientific and expert testimony have been and should be applied to statistical and econometric proof.

A. The Classical Period: Relevant Expertise

    For centuries, the law did not distinguish one type of expert testimony from another. (21) On the surface, a uniform standard governed the admission of the testimony of all qualified experts. (22) The evidence had to be relevant and not too prejudicial or time-consuming, and it had to deal with matters not comprehensible to ordinary jurors without the assistance of an expert. A few jurisdictions continue in this tradition, (23) although the beyond-the-ken-of-the-jury [----------1939----------] standard (24) usually has been softened to require only that the expert's knowledge be helpful to the jury. (25)

    Although the relevance-expertise requirement applies to scientific and nonscientific expertise alike, it need not have the same impact on all types of expert testimony. Scientific evidence tends to be time-consuming and difficult to understand. (26) Courts fear that it comes cloaked in an aura of infallibility and that this leads jurors to give it more credence than it deserves. (27) Consequently, ad hoc [----------1940----------] balancing of probative value and its counterweights can operate to exclude scientific evidence, especially if the science is not well-established. (28)

    Perhaps the earliest reported instance of a statistical assessment admitted under this classical approach is Robinson v. Mandell. (29) On July 25, 1865, Sylvia Ann Howland died. An 1863 will left half the estate, worth more than two million dollars, to a number of individuals and institutions and provided that half was to be held in trust for Sylvia's niece, Hetty Robinson. Although Hetty had recently inherited more than one million dollars from her father, she sought her aunt's entire estate under an 1862 will that named her as the sole heir and that provided that no later will should be honored. The executor, Thomas Mandell, claimed that two of the three signatures on the earlier will were traced from an 1864 codicil to the 1863 will, and that even if the earlier will were genuine, the later one applied. (30)

    Both Oliver Wendell Holmes, Sr., Parkman Professor in the Harvard Medical School, and Louis Agassiz, another Harvard professor and one of the world's leading naturalists, examined the contested signatures under a microscope and testified for Robinson that they saw no evidence of tracing. (31) Mandell countered with testimony from Benjamin Pierce, Professor of Mathematics at Harvard and his son, Charles Sanders Pierce. (32) The Pierces purported to demonstrate that the signatures were forgeries by contrasting the similarities between one of the disputed signatures and its counterpart in the 1864 codicil with the less extensive [----------1941-----------] similarities between the disputed signature and 42 others on documents written by Sylvia Ann Howland in her later years. C.S. Pierce examined every possible pair of signatures to see how many of the downstrokes in the words "Sylvia Ann Howland" coincided in position and length. (33) He found agreement in approximately one in every five downstrokes. Professor Pierce then testified to an "extraordinary" coincidence in the positions of the thirty downstrokes in the disputed signature and the 1864 signature. He described "complete coincidence of position" as "infallible evidence of design." (34) Being a professor of mathematics, Pierce was not content to rest on intuition alone. He insisted that "[t]he mathematical discussion of this subject has never, to my knowledge, been proposed, but it is not difficult; and a numerical expression applicable to this problem, the correctness of which would be instantly recognized by all the mathematicians of the world, can be readily obtained." (35) He reasoned that the probability of 30 matches in a given pair of authentic signatures was (1/5)30, or "once in 2,666 millions of millions of millions." (36) "This number," he added, "far transcends human experience." (37) Decided in a century in which scientific and statistical studies received no more scrutiny than any other expert testimony, the admissibility of these calculations was not challenged, (38) and even the cross-examination of Pierce was largely ineffectual. (39) [----------1942----------]

    In 1915, however, the New York Court of Appeals held in People v. Risley (40) that even under the relevance-expertise regime, another mathematician's testimony about an alleged forgery was inadmissible in a criminal case. An attorney was charged with fraud in the course of representing a corporate client in a civil matter. Apparently, he had removed a document that had been placed in evidence and typed in the words "the same" to make the meaning more favorable to his client. (41) An expert on typewriters testified that as typed on the document, the six distinct letters in the words "the same" exhibited eleven specific peculiarities. (42) For example, the "t" was not strictly vertical, but slanted, other letters were missing serifs, and so on. This expert reported that a typewriter removed from Risley's office produced characters with the same peculiarities. A second expert, described by the New York Court of Appeals as "a professor of mathematics in one of the universities of the state," testified that "by the application of the law of mathematical probabilities, the chance of such defects being produced by another typewriting machine was so small as to be practically a negative quantity." (43) [----------1943----------]

    Over a dissent, the New York Court of Appeals reversed this conviction. The majority questioned the assumption that merely because a letter could slant or not slant, the probability that it would slant is one-half. Observing that the mathematician had no particular knowledge about the frequency of defects in typewriters, the court dismissed his statement of the probability because it "was not based upon actual observed data, but was simply speculative . . . ." (44) In Robinson, Pierce had arrived at one-fifth for the probability of two matching strokes by a study of genuine signatures. (45) In Risley, the mathematician had no such empirical foundation for using a value of one-half. Accordingly, the statistical evidence in Risley was inadmissible under general principles of relevancy. (46) As we shall see, even when the doctrinal basis for evaluating scientific testimony became more rigorous, the courts continued to apply the classical relevance-expertise standards to statistical evidence.

B. The Modern Period: Heightened Scrutiny for Scientific Evidence

    When a major category of evidence is thought to be unusually prejudicial, ad hoc balancing often crystallizes into more [----------1944----------] specialized rules. (47) For example, evidence of bad character generally is not admissible merely to show a general tendency to act wrongly. (48) Evidence of insurance is not admissible to suggest that the insured might behave carelessly. (49) In principle, there may be no difference between the pattern of decisions under an ad hoc balancing of probative value and prejudicial effect, but in practice, the presence of a specialized rule reinforces the recognition that the evidence poses special problems. To this extent, it ensures that the evidence receives heightened scrutiny, and it highlights the factors that go into this scrutiny. Furthermore, if the rule is not too amorphous, it channels discretion, producing a more uniform and predictable pattern of decisions. If all judges and counsel were perfect and effortlessly could discern the proper outcome of ad hoc balancing, then case-by-case balancing would be ideal. The reality is that unstructured, ad hoc balancing is difficult to do well, and it may be that a cruder but more easily applied rule will produce more consistent outcomes with less effort and little loss in accuracy across all cases. (50) This is a major argument for categorical rules as opposed to vague standards in many areas of law. (51)

    Given the pressures for specialized rules of relevance and the perception that scientific evidence poses special problems, it is hardly surprising that courts would come to supplement the relevance-expertise standard with more specific rules that attend to the special features of scientific evidence. (52) Two forms of additional [----------1945----------] scrutiny--general acceptance and scientific soundness--are dominant.

1. General Acceptance: Frye

    The general acceptance standard made its debut in the now celebrated case of Frye v. United States. (53) Alphonse Frye, a young black man in the District of Columbia was charged with murder. He sought to introduce the testimony of a psychologist, William Moulton Marston, who had administered a systolic blood pressure test to Frye. According to Dr. Marston, the test revealed that Frye was truthful when he denied committing the murder. (54) Dr. Marston had developed this forerunner of the polygraph test for truthfulness, but it is not clear what he had done to establish its validity. (55)

    The testimony could have been excluded under the traditional relevance-expertise standard. Dr. Marston, who was a professor of psychology at Harvard College, (56) was qualified to give certain kinds of expert testimony, but if his opinion about Frye's veracity was based on a procedure that was not well studied, it could have been rejected as too speculative to be of much [-----------1946-----------] assistance to the jury. Indeed, the trial judge, in excluding the testimony, may have been following just this approach.

    In affirming the trial court's ruling, the United States Court of Appeals for the District of Columbia observed that "[j]ust when a scientific principle or discovery crosses the line between the experimental and the demonstrable stages is difficult to determine." (57) This observation is entirely consistent with the traditional approach. A conclusion drawn from a technique that still is "experimental" rather than "demonstrable" may be relevant, but it also may be too insecure to be sufficiently helpful to the jury.

    The innovation of Frye lies in how the Court of Appeals ascertained whether the technique was too speculative. The court was not content to rely solely on the assertion of the well qualified expert who had experimented with systolic blood pressure as an indicator of truthfulness; neither was it prepared to inquire directly into whether his work was sufficient to establish the validity of the technique. Rather, it affirmed the exclusion of the evidence on the neoteric ground that other psychologists had yet to accept Marston's claim that he could verify honesty by measuring the speaker's blood pressure. Although no previous cases explicitly had held this general acceptance to be indispensable, the court boldly wrote:

    Somewhere in this twilight zone [between the experimental and the demonstrable] the evidential force of the principle must be recognized, and while the courts will go a long way in admitting expert testimony deduced from a well-recognized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs. (58)

    The requirement of general acceptance, like any special trustworthiness test, tends to screen out evidence. The Frye court offered no reason for imposing this special requirement, but subsequent courts and commentators have filled the gap. As noted above, the rule can be understood as a crystallization of the ad hoc balancing that trial courts are expected to undertake. Ideally, it screens out evidence that is superficially impressive but not [----------1947----------] sufficiently probative because it is not scientifically valid. It does not ask--or even permit--the court to ascertain scientific validity for itself. Instead, the court defers to the scientific community, for the rule treats "general acceptance" as a surrogate for validity. By looking to the views of the scientific community, the rule avoids having the judge act like an independent scientist.

    Of course, demanding general acceptance as opposed to some lesser degree of support among scientists tends to increase the incidence of "false negatives" (rulings that exclude valid scientific evidence) over "false positives" (rulings that admit invalid scientific evidence). This conservative strategy (59) has been defended as an appropriate response to the risk that jurors are too credulous of scientific evidence. (60) Furthermore, waiting until a technique has been generally accepted ensures that it has been widely studied and thus assures that a pool of experts is available to both sides to verify that the technique has been applied properly.

    In practice, the objectives of a clear rule--uniformity and predictability--have not been achieved. Courts in different Frye jurisdictions have reached contradictory results with respect to the same types of scientific evidence, (61) and it is not obvious that the uniformity achieved under Frye is any greater than that which would be obtained with most other plausible rules or standards. Ambiguities as to the propositions that must be generally accepted, the fields in which they must be accepted, the extent to which they must be accepted, and the indicia and proof needed to show their acceptance have made Frye disappointingly ductile and frustratingly unpredictable. (62) [----------1948----------]

    Thus, the use of Frye in evaluating statistical assessments has been capricious. Traditionally, Frye simply was not perceived as a barrier to statistical testimony. (63) Starting in the 1970s, parties in employment discrimination cases brought under Title VII of the Civil Rights Act of 1964 began to make extensive use of statistical expertise. (64) Early cases involved simple comparisons of proportions, (65) but as "the floodgates . . . opened," (66) more complicated studies were introduced. (67) Courts discussed standard deviations, (68) correlation coefficients, (69) significance levels, (70) hypothesis tests, (71) Mantel-Haenzel tests, (72) scattergrams, (73) nonlinear regressions, (74) and [----------1949----------] reverse regressions. (75) These cases concerned issues such as whether a study that fails to show a disparity that is significant at the .05 level could create a prima facie case of disparate impact, (76) or whether a study that does show a significant difference in salaries but omits certain variables "must be considered unacceptable as evidence of discrimination." (77) The opinions and arguments in these cases, however, almost never questioned the admissibility of the evidence. They never suggested that the general acceptance standard or a heightened reliability standard might make the expert's testimony inadmissible. (78)

    Likewise, epidemiological studies in civil cases were admitted with little scrutiny for many years. (79) In parentage proceedings, courts initially questioned the general acceptance of serological methods (80) and would not admit blood group typing to establish paternity. (81) As the number and power of genetic tests that [----------1950----------] could be applied to determine parentage grew, however, the traditional rule began to crumble under the weight of cases (82) and specialized statutes. (83) Laboratories usually accompanied their inclusionary findings with an impressive "probability of paternity"--a statistic that largely went unchallenged. Eventually, some courts restricted the practice, (84) but the doctrinal basis was not general acceptance. Rather, it was the normal weighing of probative value and prejudicial effect. (85)

    Similarly, "[n]ot so long ago, the courts refused to admit either survey or sampling evidence." (86) Public opinion was not established [----------1951----------] through systematic polls but through the testimony of representatives of the public itself--what the law called "public witnesses." (87) Thus, in Irvin v. State, (88) the Supreme Court of Florida refused to credit a public opinion survey of community sentiment. Two African-American men were convicted of raping a white woman, but the conviction was set aside after it became clear that the grand jury that returned the indictments had been selected in a discriminatory fashion. (89) A new grand jury promptly reindicted the men. The NAACP commissioned Elmo Roper, one of the pioneers of American public opinion research, to conduct what was probably the first large-scale survey of public prejudice in a venue. The trial court, however, excluded the research director's testimony and declined to change the venue. (90) The trial ended in a verdict of guilt and a sentence of death. The Florida Supreme Court upheld the exclusion of the survey as hearsay and insisted that although a survey might indicate consumer attitudes toward a product, the method was "useless" to "indicate an aroused public against a prospective defendant in a court of justice." (91) In upholding the refusal to change the venue, the court preferred to rely on "the friendliness of white people for the colored in the community" as [----------1952----------] indicated by the testimony of "numerous witnesses" and "the recent construction of an elaborate memorial to a colored soldier." (92)

    In categorically rejecting survey and sampling evidence in Irvin and other cases, courts rarely have mentioned Frye or any special standards for scientific evidence. Likewise, the later opinions admitting survey results did not maintain that Frye was satisfied because social scientists accepted scientific sampling methods to ascertain opinions. To be sure, modern courts are far more hospitable to survey evidence, (93) but the transformation has been traced to other developments. (94)

    In criminal cases, the courts have been skeptical of efforts to assign numerical probabilities to events, and often rightly so, but once again, the usual principles of relevance rather than the special test of general acceptance have been the vehicle for their expression. (95) Consider what may be the most famous modern case of statistical testimony introduced to establish a defendant's guilt. In People v. Collins, (96) the Supreme Court of California overturned a conviction because of a contrived (but unchallenged) attempt to show that certain traits of a couple apparently fleeing the scene of a robbery were so uncommon as to be practically conclusive of guilt. Malcolm Collins and his common-law wife Janet had been charged with robbing a woman in an alley in the San Pedro area of Los Angeles. Malcolm was a black man who at one time had worn a beard and mustache and owned a yellow Lincoln; Janet was a caucasian woman with blond hair that she wore in a pony tail. There was no outright confession and no definitive identification of this couple, but a blond woman with her hair in a pony tail was [----------1953----------] seen running from the scene of the robbery and entering a yellow car driven by a bearded and mustached black man. (97)

    As in Risley, the prosecutor called a college mathematics instructor to the stand and had him assume various values for the frequencies of characteristics like beards, mustaches, interracial couples and yellow cars. The mathematician then multiplied these assumed values to conclude that the joint probability of all these characteristics in a randomly selected couple would be about 1/12,000,000. (98)

    The California Supreme Court reversed the resulting conviction. The opinion, which even sported a mathematical appendix, found at least three errors in the probability testimony: (1) the lack of any evidentiary foundation for the probabilities used by the mathematician; (2) the lack of a foundation for the independence of the events whose probabilities were multiplied together; and (3) the possibility that the jurors were distracted and confused by the mathematical proof. (99) In Collins and other "no-evidence" cases, (100) "the computations have little basis in fact and are presented in the guise of expert analysis . . . ." (101) Such calculations are excluded, not [----------1954-----------] because the probability model is not generally accepted among statisticians, but "under the principle that their prejudicial impact clearly outweighs their probative value." (102) Although California was (and remains to this day) a devotee of Frye, (103) the Collins opinion contains nary a word about Frye, general acceptance, and the way that statisticians usually would estimate the probability of an event like a randomly generated couple sharing all the pertinent traits attributed to the suspects. The opinion is a relevancy opinion, pure and simple. (104)

    In this regard, Collins could not be more different than other opinions of the same court with regard to computations of probabilities of other physical traits attributed to suspects on the basis of biological trace evidence rather than the reports of witnesses. In People v. Venegas, (105) a woman was raped in her hotel room. Police sent vaginal swabs and swatches of a bedspread containing two semen stains, along with blood samples from the victim and the defendant, to an FBI laboratory. The FBI reported that defendant's DNA profile matched the DNA profiles from the swabs and one of the stains, and the FBI added that "the probability of selecting an unrelated individual at random from the Hispanic population with a profile that also matched the samples was approximately 1 in 31,000." (106) After a hearing on the general acceptance of the procedure for arriving at this figure, the trial court admitted testimony that "the probability of another [randomly selected] person having the DNA profile found in defendant's blood sample was 1 in 65,000." (107) Both the state court [----------1955----------] of appeals and supreme court agreed that the method for arriving at the probability had to be generally accepted in the scientific community. Ultimately, the California Supreme Court held that the number came from a computational procedure that was not generally accepted because of an inconsistency between the statistical criterion used in declaring a match and the one used in estimating the frequencies of matching alleles. (108) Likewise, in People v. Soto, (109) the California Supreme Court looked to general acceptance in "the relevant scientific community of population geneticists" to conclude that "statistical calculations" for DNA types using "the unmodified product rule" met the Frye standard for admissibility. (110)

    One explanation for the unexplained shift from relevancy in Collins to general acceptance in Soto and Venegas might be that the probability computations in the DNA cases could not be dismissed as utterly devoid of an empirical foundation or a theory that might justify the independence assumption. Forensic scientists had compiled some data as to the frequencies of the various alleles that comprise the more complex genotypes, and geneticists had some experience and an ample theoretical framework to draw on in inferring genotype frequencies. Although some defendants vainly argued that Collins precluded any multiplication of probabilities, (111) the DNA computations simply could not be dismissed as manifestly erroneous and hence irrelevant. (112) Consequently, a further argument, such as the lack of general acceptance of the probability calculations, was necessary if defendants were to block the evidence. Nevertheless, DNA cases stand out as the only instance in [----------1956----------] which courts in Frye jurisdictions have responded to criminal "probability evidence" with a Frye analysis. (113)

2. Relevancy-Plus: The Road to Daubert

    The general acceptance standard never was popular with evidence scholars, (114) and by the 1970s and 1980s, more and more courts abandoned it in favor of various substitutes. (115) For example, in United States v. Williams, (116) the government recorded telephone conversations initiated by an undercover police officer offering to buy heroin. At trial, it introduced a spectrographic analysis to prove that the voice on the recording was the defendant's. In upholding the admission of this testimony, the Court of Appeals for the Second Circuit refused to apply general acceptance as a "universal litmus test for the general admissibility of all 'scientific' evidence." (117) Instead, the court recited the usual features of relevancy (118) and concentrated on "reliability." (119) It concluded that the technique possessed the requisite reliability to warrant admission in light of the extent of its acceptance in the scientific community and "the potential rate of error." (120)

    Some years later, in United States v. Downing, (121) the Court of Appeals for the Third Circuit expounded at length on this notion [----------1957----------] that the admissibility of scientific evidence requires "a quantum of reliability beyond that required to meet a standard of bare logical relevance" and explained that this condition can be fulfilled even when "the principles underlying the evidence have not become 'generally accepted' in the field to which they belong." (122) The defendant, who was convicted for fraud on the basis of eyewitness identifications, was precluded from presenting a psychologist to testify to experiments on the sources of eyewitness error. The court of appeals remanded to permit the district court to reconsider its ruling in light of the criteria for ascertaining admissibility articulated in this repudiation of Frye. Under Downing, "reliability" is "a critical element of admissibility," (123) and the "reliability inquiry" (124) can probe the "degree of acceptance within [the scientific] community," (125) the "existence of a specialized literature dealing with the technique," (126) and "the rate of error." (127) In addition, Downing called on the district court to inquire into "another aspect of relevancy"--"fit," that is, "whether expert testimony proffered in the case is sufficiently tied to the facts of the case that it will aid the jury in resolving a factual dispute." (128)

    As Williams and Downing indicate, (129) the major emergent alternative to Frye looked to the relevance of the proposed scientific testimony but demanded something more--relevance plus a certain extra trustworthiness, accuracy, or fit beyond that needed to admit nonscientific testimony. (130) Statistical evidence, however, was [----------1958----------] rarely held to this standard. The "relevancy-plus" jurisdictions, like the Frye jurisdictions, either admitted statistical studies with little comment or excluded them as too flawed to satisfy the more general balancing standard of Federal Rule of Evidence 403. (131) With the Supreme Court's opinion in Daubert, however, this situation would change. The courts would not necessarily demand more of statistics, but the doctrinal machinery for processing scientific evidence no longer would remain idle or overlooked when statistical studies were offered.

3. Scientific Soundness: Daubert

    After many years of refusing to examine the issue of the admissibility of scientific evidence, (132) the Supreme Court granted certioriari in Daubert to consider whether the general acceptance standard survived the enactment of the Federal Rules of Evidence. In Daubert, two young children born with deformed limbs and their parents sought damages against the manufacturer of Bendectin, a prescription drug taken by the boys' mothers to treat nausea and vomiting during pregnancy. The plaintiffs' case foundered when they were unable to point to any published epidemiological studies concluding that Bendectin causes limb reduction defects. The district court granted the defendant's motion for summary judgment on the ground that the plaintiffs had failed to establish a genuine issue of material fact regarding causation. As summarized by the Court of Appeals for the Ninth Circuit:

Plaintiffs' evidence of causation consisted primarily of expert opinion based on in vitro and in vivo animal tests, chemical structure analyses and the reanalysis of epidemiological studies. Among the contrary evidence proffered by Merrell Dow was [----------1959----------] the affidavit of a physician and epidemiologist who reviewed all of the available literature on the subject, which included more than 30 published studies involving over 130,000 patients, and concluded that no published epidemiological study had demonstrated a statistically significant association between Bendectin and birth defects. Plaintiffs do not challenge this summary of the published record. (133)

    The trial court in Daubert excluded all four categories of the plaintiffs' evidence-- so-called structure-activity studies, (134) in vitro or animal cell experiments, (135) in vivo or live animal research, (136) and reanalysis of the epidemiological data. (137) These rulings on admissibility were based on two lines of reasoning. First, the district and circuit courts held that absent scientific understanding of the cause of the birth defects in question, causation may only be shown through epidemiological evidence. (138) Second, both courts refused to allow the recalculated epidemiological data offered by plaintiffs experts because, unlike the studies "rejected by [the plaintiffs' experts, which] had been published in peer-reviewed scientific [----------1960----------] journals," the plaintiffs' experts had "neither published [their] recalculations nor offered them for review." (139)

    The Supreme Court unanimously held that the lower courts had applied the wrong standard for the admissibility of scientific evidence. In an opinion by Justice Harry A. Blackmun, the Court proclaimed that the "austere [general acceptance] standard, absent from and incompatible with the Federal Rules of Evidence, should not be applied in federal trials." (140) In reaching this conclusion, the Court made no effort to analyze the substance or merits of the general acceptance standard, but relied instead on the fact that neither the wording nor the drafting history of the rules of evidence evinced "any clear indication that Rule 702 or the Rules as a whole were intended to incorporate a 'general acceptance' standard." (141)

    Having jettisoned general acceptance as "the exclusive test for admitting expert scientific testimony," (142) the Court adopted the more richer and more flexible (143) "relevancy-plus" standard already employed in many jurisdictions. (144) It announced that as the gatekeeper [----------1961----------] of evidence, "the trial judge must ensure that any and all scientific testimony or evidence admitted is not only relevant, but reliable." (145) This "evidentiary reliability," as the Court put it, presumes "scientific knowledge" (146)--the proffered testimony must be "ground[ed] in the methods and procedures of science." (147) In a further elaboration the Court suggested that this "reliability" determination "entails a preliminary assessment of whether the reasoning or methodology underlying the testimony is scientifically valid and . . . properly can be applied to the facts in issue." (148) This, in turn, depends on such things as "whether it can be (and has been) tested," "whether the theory or technique has been subjected to peer review and publication," "the known or potential rate of error," and the "degree of acceptance within [a relevant scientific] community." (149)

    Moreover, the Court suggested, a showing of scientific validity is not enough, for "Rule 702's 'helpfulness' standard requires a valid scientific connection to the pertinent inquiry as a precondition to admissibility." (150) Drawing directly on Downing, the Court observed that "whether expert testimony proffered in the case is sufficiently tied to the facts of the case . . . has been aptly described by Judge Becker as one of 'fit.'" (151) As a logical matter, however, the fit requirement is superfluous. "Purpose" is already built in to the definition of "validity." For example, the LSAT has been shown to [----------1962----------] be valid for the purpose of predicting grades in the first year of law school. (152) It is not valid for predicting monetary success as a lawyer. (153) But even if "fit" is implicit in scientific validity, the discussion in Daubert is an important reminder that "scientific validity for one purpose is not necessarily scientific validity for other, unrelated purposes." (154)

    The impact of Daubert far exceeds its substance. The opinion adds little to the relevancy-plus standard developed in the decades preceding it. (155) Nevertheless, lower courts were stunned. One district court exclaimed that "[t]he rules governing the admissibility of expert testimony have recently undergone dramatic change." (156) On the remand in Daubert itself, Judge Alex Kozinski spoke of the " New World" (157) that the court faced. (158) Invoking the metaphor of "gatekeeping"--hardly a new concept in the law of [----------1963----------] evidence (159)--courts began to re-examine seemingly settled results as to the admissibility of many forms of scientific testimony. (160) Some scientific evidence was admitted more readily, (161) but much was reviewed with a newfound skepticism and a sense of disquiet. In particular, pretrial motions to exclude statistical testimony became commonplace. (162) Along with the shift in focus from weight to [----------1964----------] admissibility came a series of problems involving the structure, reach, and appellate review of the heightened scrutiny of scientific, expert testimony--and two more Supreme Court opinions on these issues.

C. The Puzzles of Strict Scrutiny

1. The Boundary Problem

    If scientific evidence must clear a hurdle that does not block the path of other expert testimony, the problem of demarcating boundaries arises. What evidence counts as "scientific" for the purpose of Frye, Daubert, or any other such standard? Advocates have implored courts to apply heightened scrutiny to a myriad of claims. Some items, such as agglutination or electrophoresis of blood, or the spectrographic analysis of voices, seem indisputably "scientific." Courts have not hesitated to apply the special standards to testimony about such technologies. (163) Other testimony, such as the opinion of a psychiatrist that a person's will is overborne by a compulsion to gamble, (164) seems less easy to classify. In these borderline cases, courts have reached apparently conflicting results; few opinions have provided clear or comprehensive explanations of how the line was drawn. (165) [----------1965----------]

    Statistical evidence, it seems, is such a borderline case. For instance, in State v. Louis Trauth Dairy, Inc., (166) a federal district court noted that econometrics and statistics are simply methods applied to produce knowledge in substantive disciplines. As such, it concluded that "[n]either economics or statistics seems to completely qualify as 'scientific knowledge'" for purposes of Daubert. (167) In the textualist style of Daubert, this opinion seeks to resolve the boundary problem by asking what scientists (rather than statisticians) "know." Statistical reasoning, however, is crucial to most scientific inquiry--indeed, some would say that it is the essence of all inductive scientific reasoning. It is required of (although not always mastered by) students of the "hard" as well as the softer sciences. Although statistical modeling is as much art as science, (168) statistical techniques and tests have well-defined mathematical properties described in an active research literature. In a word, it is not a misnomer to speak of "statistical science." (169) From this perspective, it would seem that the focus in Trauth Dairy on whether statistical expertise is a substantive, empirical science like physics, astronomy, or psychology, misses the mark.

    Yet, this conclusion may be too facile. What, one might well ask, are the unstated criteria being used to separate "science" from other knowledge? At first glance, philosophical studies of the nature and structure of scientific theories might seem to hold the key [----------1966----------] to this puzzle. Indeed, the Daubert Court started down this road (170) when it cited Sir Karl Popper's criteria for distinguishing science from metaphysics. (171) Nevertheless, the basis for drawing a line between expert scientific evidence and other expert testimony is not to be found in abstract definitions of "science." The writings of David Hume, Immanuel Kant, A.J. Ayer, Sir Karl Popper, Thomas Kuhn, and many other philosophers or historians provide brilliant insights into the nature of scientific knowledge, but they do not speak directly to the legal issues. (172) Enriching the philosophical literature on the nature and aims of science might be, it is unlikely to be of great assistance in deciding when a special test for scientific evidence should be applied. The reason, as Justice Holmes once remarked, is that "[a] word is . . . the skin of a living thought." (173) Words are the visible surface of rules that are designed to achieve certain goals. Abstract definitions may or may not fit these goals. (174)

    Thus, a functional inquiry, rather than a review of the philosophical literature, the encyclopedia, or the dictionary is required. The rules of evidence, whether derived from the common law or a code, are designed to perform certain functions, and the raison d'etre of a special hurdle for scientific evidence is that this particular evidence poses special problems. When these problems are not present, heightened scrutiny is not justified and may well be [----------1967----------] counterproductive, unnecessarily consuming resources and possibly resulting in unwarranted exclusion of probative evidence.

    The major arguments for and against heightened scrutiny of scientific evidence were rehearsed earlier. (175) The principal problem is not that it is difficult for lay factfinders to assess an expert's reasoning or conclusions without possessing the underlying expertise. That much is true of all expert testimony. If there is a rationale for a special rule for scientific experts, it must be something special about science that justifies stricter scrutiny. Three features of scientific expert testimony provide this rationale: (1) science is generally more difficult to understand than other areas of expertise; (176) (2) science is not only relatively impenetrable, but it is more impressive, posing a special danger that jurors will give too much weight to evidence that carries the trappings of scientific truth; (177) (3) until a period of rigorous testing passes, few scientists will be available to testify to the limitations or risks of errors in a scientific analysis. As a result, the usual safeguards of the trial process--cross-examination and opposing testimony--may be unavailable or ineffective.

    With these reasons for an especially demanding screening of scientific evidence in mind, the boundary problem becomes tractable. The court should consider whether these three concerns are [----------1968----------] present in sufficient degree to warrant heightened scrutiny. Under this approach to the boundary problem, mathematical modeling of physical or biological processes such as the flow of water (178) or the survival of wildlife, (179) applications of mathematical equations that yield computer enhancement of images, (180) or statistical or econometric modeling of many types of data (181) might seem to qualify for heightened scrutiny. (182) Although these methods do not involve sophisticated laboratory instruments, they can be inscrutable and impressive to the uninitiated. It is not easy to shrug off a "best fit" or a "maximum likelihood estimate." Indeed, as we have seen, the California Supreme Court once was so moved by a trivial and inadequately countered bit of mathematics as to brand mathematics "a veritable sorcerer in our computerized society . . . ." (183)

    Nevertheless, it is not clear that Frye or Daubert (or some variant) should be applied to particular forms of mathematical and statistical modeling. Unlike a new chemical test or a novel physical theory or instrument, which might require significant time and experimental effort to probe, the adequacy, limits, or untested assumptions of most mathematical and statistical models can be defined fairly readily by other experts. Consequently, effective opposing testimony is generally available (if the economics of the [----------1969----------] case warrant it). It is unlikely that jurors will be overwhelmed with one side's set of equations when the other side can produce another set of equations or results. Indeed, triers of fact sometimes seem as ready to embrace fallacious criticisms of models as to recognize valid objections to them. Thus, condition (3) does not hold, and the import of condition (2) is unclear in this context.

    In the end, however, it is condition (1) that should be decisive--statistical studies should not be exempt from careful scrutiny under standards like general acceptance or scientific soundness. As with Gresham's Law, bad statistical proof drives out (or at least devalues) the good. (184) The perception that statistics can prove anything and the typical aversion to mathematics make it all too easy for quite dubious statistical analyses to appear the equal of far sounder assessments. (185) If the end result of a liberal policy of admissibility is the proverbial battle of the experts with jurors no better able to decide the case when the fighting ceases, then the cost of the campaign is a dead-weight loss. (186) For these reasons, complex statistical testimony warrants some level of heightened scrutiny.

    In the discussion that follows, I consider how the scrutiny required under Daubert and Frye should be applied to such studies. In federal jurisdictions, however, the Supreme Court's decision in [----------1970----------] Kumho Tire Co. v. Carmichael, (187) relieves the pressure to define a clear boundary between science and nonscience. There, the Court wrote that all expert testimony must meet the "reliability" standard announced in Daubert but that not all the factors used to ascertain scientific validity might apply, or they might apply differently to other areas of expertise. Kumho arose in response to a fatal automobile accident caused by a tire failure. The district court excluded an engineer's testimony that a manufacturing defect led to a separation between the tire tread and an internal structure known as a steel-belted carcass, causing a blowout. This court applied the standard for scientific evidence described in Daubert to find that the engineer's analysis of his "visual inspection" of the tire lacked a sound "scientific basis." (188) The Court of Appeals for the Eleventh Circuit reversed the resulting summary judgment on the theory that "'a Daubert analysis' applies only where an expert relies 'on the application of scientific principles,' rather than on skill- or experience-based observation." (189)

    In an opinion written by Justice Stephen G. Breyer, the Supreme Court reversed the court of appeals and held that the district court's exclusion of the engineer's analysis was not an abuse of discretion. (190) Every Justice agreed that Federal Rule 702 means that a witness testifying as an expert must present expert "knowledge" (191) rather than speculation and that "where such testimony's factual basis, data, principles, methods, or their application are called sufficiently into question, . . . the trial judge must determine whether the testimony has 'a reliable basis in the knowledge and experience [----------1971----------] of [the relevant] discipline.'" (192) Finally, the Court wrote that in making the determination that the expert was providing specialized knowledge that was sound enough to assist the trier of fact, the trial judge "may consider [the] more specific factors [enumerated] in Daubert." (193)

    In short, Kumho extends Daubert's call for "'evidentiary reliability'" and "'a valid . . . connection to the pertinent inquiry as a precondition to admissibility'" (194) to all expert testimony, but it discerns no universal solvent for ascertaining the validity of putative expert knowledge. (195) Some assurance of validity is required even from "experts in drug terms, handwriting analysis, criminal modus operandi, land valuation, agricultural practices, railroad procedures, attorney's fee valuation, and others," (196) but in such situations the details of Daubert may not apply, (197) and it is unclear what Kumho demands. (198) When it comes to engineering analysis that "rests upon scientific foundations," (199) however, Kumho strongly suggests that the central considerations articulated in Daubert--the extent to which a theory or technique has been tested and subjected to critical scientific inquiry--are vital. (200) [----------1972----------]

    The same principle should govern the use of statistical methods. The statistical theory or technique should be one that has been subjected to sufficient study to establish its validity as applied to a class of problems that includes the one being investigated in the litigation. (201) Whether such a method is being applied properly to the problem at hand is a separate question that the Supreme Court, regrettably, has conflated with the issue of the validity of the method itself. (202) I turn now to that topic.

2. The Usurpation Problem and the Methodology-Conclusion Puzzle

    Before Daubert, it was clear that the elevated scrutiny reserved for scientific evidence applied to the methodology that an expert employed rather than the conclusions that the expert reached by applying that methodology to specific facts. When heightened scrutiny is confined to methodology, the usurpation problem is manageable. Jurors are free to accept or reject particular conclusions as long as they are derived with an acceptable methodology and not otherwise subject to exclusion. (203) In Frye v. United [----------1973----------] States, (204) for example, the Court of Appeals spoke of "testimony deduced from a well-recognized scientific principle or discovery" (205) and the need to ensure that "the thing from which the deduction is made [has been] sufficiently established to have gained general acceptance in the particular field in which it belongs." (206) The court upheld the exclusion of the psychologist's testimony not because of doubts about how well he conducted the test on the defendant, but because "the systolic blood pressure deception test has not yet gained such standing and scientific recognition . . . ." (207) If the expert's reasoning were recast in syllogistic form, (208) it might proceed along the following lines:

Major Premise P1: All subjects whose systolic blood pressure remains constant as they answer questions about their alleged participation in crimes are answering truthfully.

Minor Premise P2: The systolic blood pressure of Alphonse Frye, who was accused of a crime, remained constant as he asserted his innocence in answering questions about the murder.

Conclusion C: Frye was telling the truth when he denied committing the murder.

    Only the major premise P1 is subject to general acceptance "among physiological and psychological authorities." (209) The minor premise P2, which is specific to the case, is more like the testimony of any other witness about his or her observations. It is not an expression of esoteric scientific reasoning, and it would make little sense to ask whether the scientific community generally accepts a case-specific proposition such as the particular blood pressure [----------1974----------] readings taken from a single individual. Ordinary procedures like cross-examination can test whether the witness is speaking truthfully when he testifies that defendant's blood pressure did not rise. (210)

    In some respects, this dichotomy between the major and minor premise is oversimplified to bring out the methodology-conclusion distinction as sharply as possible. (211) The complications, however, do not affect the basic point. Indeed, they help enucleate the principle that underlies the distinction between conclusion and methodology. Among other things, a full analysis would recognize that, in addition to deducing C (that Frye was telling the truth), Marston deduced the minor premise P2 from another logical argument about the sphygmograph used to chart Frye's blood pressure. That argument might have as its major premise a claim P1' that the instrument Marston used was capable of recording systolic blood pressure accurately. The minor premise P2' of the supplemental argument would relate to the measurements that Marston made on Frye himself. The general acceptance test would apply to this additional major premise P1' about the ability of the instrument to measure blood pressure, but not to the case-specific minor premise P2' about the sphygmogram obtained in this particular case. The latter could be tested by having an opposing expert explain how the recording could have erroneously reflected the true blood pressure curve or by cross-examination to this effect. By definition, case-specific facts are not subject to "general acceptance" but must be determined on a case-by-case basis.

    The basic point, then, is that whenever an expert's chain of reasoning includes general propositions that cut across cases and that [----------1975----------] are purportedly scientific, these claims--and only these claims--should be subject to special scrutiny. The crucial distinction, in other words, is between the case-specific facts asserted in minor premises and the trans-case facts asserted in major premises. (212) The former are "adjudicative facts," while the latter are "legislative facts." (213) Screening for general acceptance prevents the jury from relying on a legislative fact--the validity of a scientific theory--when the fact is not generally accepted in the relevant community of experts.

    Daubert works no change in the principle, clearly established under Frye, that the heightened scrutiny pertains strictly to methodology. (214) Instead, Daubert simply substitutes for the pure [----------1976----------] general-acceptance test a richer set of criteria with which to scrutinize methodology. Under both Daubert and Frye, "[t]he focus, of course, must be solely on principles and methodology, not on the conclusions that they generate." (215)

    In Daubert itself, this focus became quite blurred. The excluded testimony was the experts' opinion that Bendectin was a human teratogen. Was the underlying "methodology" the unpublished reanalysis of data from a published epidemiological study, as the Ninth Circuit had thought? Was it the undisclosed statistical procedure used in this reanalysis to discern a statistically significant association between exposure to Bendectin and limb reduction defects? Was it inferring teratogenicity in humans in the absence of consistent and statistically significant epidemiological findings? Or, is it possible that the experts' opinion was itself a "methodology" that required a preliminary showing of soundness? The Supreme Court's discussion of scientific soundness was so abstract and unconnected to the evidence in the case that its opinion provides no answer. On remand, the Ninth Circuit also gave no answer, and it refused to let the district court venture into this thicket. Rather, it upheld the summary judgment on the ground that even if general causation could be proved, the admissible evidence could not support a conclusion that the plaintiffs' injuries were attributable to the drug. (216)

    The analysis offered above provides some answers to this inquiry. In Frye, the case-specific conclusion was that the defendant was telling the truth when he denied being the murderer (C). [----------1977----------] In Daubert, the analogous case-specific conclusion is that Bendectin caused plaintiffs' injuries (C''). These are adjudicative facts in the two cases. The methodology-conclusion distinction focuses attention at the stage of admissibility on the legislative facts--the scientifically established, trans-case premises used in reaching the case-specific conclusions. In Daubert, these premises include the proposition that Bendectin is a teratogen--that it can (and sometimes does) cause limb reduction defects (P1''). Thus, if a single expert had been offered to prove C'' (specific causation), the gatekeeping role would have required elevated scrutiny of the underlying scientific premise P1'' (general causation).

    The plaintiffs divided up the reasoning from to the various premises to the case-specific conclusion C'' among several experts. One group was willing to attest to general causation, and a different group to specific causation. This division of expert labor can make no difference in applying the methodology-conclusion distinction. Daubert is simply a case in which one expert's testimony ends at the methodological level of the major premise, and another expert's testimony employs that premise to reach the case-specific conclusion. (217) It is comparable to having one expert testify that a sudden systolic pressure spike is indicative of deception, and another report that because he found no such spike, defendant was not deceptive. Under Frye, the first expert's "conclusion" about the physiological correlate of deception would have to be generally accepted. The second expert's case-specific observations would not be have to run this gauntlet. Under Daubert, the only difference [----------1978----------] is that the first expert's "conclusion" would have be adequately validated by reference to general acceptance and other factors.

    Recognizing that the labels "methodology" and "conclusion" can be confusing and lacking a well articulated standard for using these terms, courts in recent years have shied away from them. General Electric Co. v. Joiner (218) is the most prominent example. Robert Joiner was an electrician who worked for nearly twenty years for a city water and light department in Georgia. His work brought him into contact with Polychlorinated biphenyls (PCBs) in electrical transformers. In 1991, at the age of thrity-seven, he was diagnosed with lung cancer. (219) Joiner and his wife sued three manufacturers of PCBs on theories of strict liability, negligence, and fraud. (220) A former cigarette smoker, Joiner alleged that tobacco smoke acted as an initiator of his cancer and that the PCBs acted as a promotor, transforming the initiated cells into malignant growths. (221) Defendants moved for summary judgment. (222) They argued that "plaintiffs . . . cannot present credible, admissible scientific evidence that . . . small cell lung cancer in humans can be caused or promoted by PCBs," (223) and they maintained that PCBs do not cause cancer unless other chemicals--namely, furans or dioxins--are present. Plaintiffs' experts pointed to studies of PCBs to dispute this claim, (224) and they suggested that there were reasons to think that Joiner had been exposed to PCBs, furans, and dioxins. Defendants argued further, however, that the available evidence indicated that Joiner had no significant exposure to any of these three types of chemicals. (225)

    The district court granted the defendant's motion for summary judgment. It found that although there was a genuine dispute as to [----------1979----------] whether Joiner was exposed to PCBs, the potentially admissible evidence failed to show that he was exposed to furans or dioxins. (226) Furthermore, the court found that the epidemiological and animal studies on which plaintiffs' experts relied were too weak to justify the conclusion that PCBs can promote cancers. Finding this major premise scientifically unsound, the district court ruled the expert testimony that rested on it to be inadmissible.

    A divided panel of the Eleventh Circuit reversed. Two judges concluded that the district court "improperly assessed the admissibility of the proffered scientific expert testimony and overlooked evidence establishing disputed issues of fact." (227) In particular, the court held that there was a disputed issue of fact as to whether Joiner was exposed to furans and dioxins, and that the district court erred in finding the claims that PCBs promote cancers to be too speculative to be admissible.

    The Supreme Court granted certiorari to review the "particularly stringent standard of review" (228) that the court of appeals purported to apply to the district court's ruling that plaintiffs' experts' opinions were inadmissible under Daubert. The Supreme Court unanimously agreed that the district court's ruling on admissibility was reversible only for an abuse of discretion, (229) and all but one Justice (230) agreed that the district court's ruling excluding the experts' opinions about the effects of PCBs was within its discretion. (231) The portion of the majority opinion upholding the [----------1980----------] evidentiary ruling reviewed the research literature on whether PCBs promote cancers and concluded that the district court did not err in finding that the experts could not establish this major premise in a scientifically sound manner. (232)

    This disposition required the Court to confront the argument "that because the District Court's disagreement was with the conclusion that the experts drew from the studies, the District Court committed legal error and was properly reversed by the Court of Appeals." (233) According to Justice John Paul Stevens:

    The reliability ruling was more complex and arguably is not faithful to the statement in Daubert that "[t]he focus, of course, must be solely on principles and methodology, not on the conclusions that they generate." Joiner's experts used a "weight of the evidence" methodology to assess whether Joiner's exposure to transformer fluids promoted his lung cancer. They did not suggest that any one study provided adequate support for their conclusions, but instead relied on all the studies taken together (along with their interviews of Joiner and their review of his medical records). The District Court, however, examined the studies one by one and concluded that none was sufficient to show a link between PCB's and lung cancer. The focus of the opinion was on the separate studies and the conclusions of the experts, not on the experts' methodology.

    Unlike the District Court, the Court of Appeals expressly decided that a "weight of the evidence" methodology was scientifically acceptable. (234)

    Rather than analyze the methodology-conclusion distinction, the majority threw up its hands: [----------1981----------]

Respondent points to Daubert's language that the "focus, of course, must be solely on principles and methodology, not on the conclusions that they generate." . . . But conclusions and methodology are not entirely distinct from one another. Trained experts commonly extrapolate from existing data. But nothing in either Daubert or the Federal Rules of Evidence requires a district court to admit opinion evidence which is connected to existing data only by the ipse dixit of the expert. A court may conclude that there is simply too great an analytical gap between the data and the opinion proffered. (235)

    This abandonment of the focus on methodology prompted Justice Stevens to retort:

Daubert quite clearly forbids trial judges to assess the validity or strength of an expert's scientific conclusions, which is a matter for the jury. Because I am persuaded that the difference between methodology and conclusions is just as categorical as the distinction between means and ends, I do not think the statement that "conclusions and methodology are not entirely distinct from one another," either is accurate or helps us answer the difficult admissibility question presented by this record. (236)

    As Justice Stevens maintained, the distinction between methodology and conclusion is viable, (237) but the classification serves legal rather than scientific purposes and must be applied accordingly. The words function to avoid excessive scrutiny of case-specific, minor premises and case-specific conclusions. The trans-case, major premise that PCBs promote cancers in human beings should be shown to be sufficiently well established by the methods of science to justify its use in an expert chain of reasoning. (238) The majority's [----------1982----------] demand that the expert not leap to a conclusion about the carcinogenicity of PCPs (239) is consistent with this specificity analysis.

    Following Joiner, however, the Supreme Court has continued to blur the methodology-conclusion distinction. In Kumho Tire Co., Ltd. v. Carmichael, the court observed that "[t]he objective of [Daubert] is to . . . make certain that an expert, whether basing testimony upon professional studies or personal experience, employs in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field." (240) Various lower federal courts had drawn the same lesson from Daubert, and several have spoken of a departure from the level of professional care normally observed outside of litigation for as a reason to exclude statistical testimony. (241) Because the Kumho opinion deals with all stripes of experts, including those who rely on skill that is not reducible to any articulated methodology, the search for some substitute for the "scientific methodology" standard sketched in Daubert (242) is understandable and important.

    Kumho's quasi-malpractice standard is useful in this connection, but the demand for ordinary rigor should not excuse the failure of an entire field of putative experts to apply truly rigorous standards in developing their field. Neither should it result in the exclusion of expert testimony just because a judge believes that a more rigorous analysis would have led to different conclusions. A demand for "rigor" is easy to apply to all facets of expert testimony-- [----------1983----------] conclusions as well as methods. It could tempt courts to exclude legitimately debatable testimony that they find unpersuasive even though it is based on generally accepted and valid methods. To be sure, there will be cases in which an expert has been so sloppy in applying these methods that the testimony would not be sufficiently probative under Federal Rule 403, but the stricter scrutiny reserved for trans-case scientific reasoning should not be applied under the rubric of rigor to case-specific conclusions. (243)

    In sum, the specificity standard for distinguishing methodology from conclusion for the purpose of applying heightened scrutiny is superior to the Joiner Court's apparent willingness to allow the category of methodology to bleed into the category of conclusions. It is also superior to any tendency to read into Kumho a requirement that case-specific conclusions be subjected to the careful scrutiny that is properly reserved for scientific methods. Nonetheless, the specificity standard is not always trivial to apply. In particular, problems can arise in screening statistical evidence, which typically involves methods that are accepted at a very general level and that are sound as applied to certain types of data but not others. For example, whether an expert has used an acceptable formula for estimating the frequency of a genotype in the population plainly is a methodological issue. It involves a trans-case, major premise. Equally plainly, whether the same expert has done the arithmetic correctly is a case-specific question not subject to heightened scrutiny under Frye or Daubert. But consider State v. Garcia, (244) in which:

[A] trial court in Arizona admitted testimony about likelihood ratios in a rape case involving two assailants. . . . [A]nalysis of the semen stain on the victim's blouse indicated that sperm from two males were present. According to the court of appeals, a population geneticist "provided the jury with likelihood ratios (broken down by population subgroups such as [----------1984----------] Caucasians, African Americans, and the like) for three distinct scenarios involving the sources of the DNA mixture found in the stain: (1) victim, defendant and unknown versus victim and two unknowns; (2) victim, defendant and unknown versus defendant and two unknowns; and (3) victim, defendant and one unknown versus three unknowns." (245)

    The trial court admitted this testimony following a Frye hearing at which the state's expert testified to general acceptance. The defendant was convicted. On appeal, he argued that the state had not proved that the specific formulas used to calculate the likelihood ratios had been generally accepted. The court of appeals affirmed the conviction, reasoning that both the concept of the likelihood ratio and the specific formulas were generally accepted, as indicated by publications in the scientific literature.

    In a petition for review, Garcia suggested that although the use of the likelihood ratio has support in the literature, the particular formulas were not previously published. There is no general formula, however, for computing a likelihood ratio. The formula depends on the specific hypotheses being compared. The likelihood ratio for a mixture with two possible men is different from that for a mixture with three, or four, and so on. The same approach produces the appropriate expression in each situation, and arriving at the correct expression is like solving word problems in high school algebra. Everyone agrees that the problems should be solved with formulas derived according to the rules of algebra, but different word problems require different formulas. The use of algebra is generally accepted, but a student can a make a mistake applying those rules.

    In Garcia, the use of likelihood ratios is generally accepted as scientifically valid, but an expert can make a mistake in algebraically representing the pertinent conditional probabilities or in working out the algebra that yields the likelihood ratio for a [----------1985----------] particular problem. Is this a concern about the case-specific, minor premise (so that Frye would not apply) or a trans-case, major premise (that must be generally accepted)? Because the formulas used in Garcia easily could be employed in other cases involving a mixture of DNA from one female and two males, they fall into the latter category. There would be little difficulty admitting them under Daubert, for the derivation of the formulas is a straightforward algebraic exercise that can be verified by any number of experts familiar with probability theory. (246) Affidavits from a few such experts should be enough to demonstrate the requisite reliability. Under Frye, it is more difficult to introduce even an obviously valid result that has yet to be scrutinized fully by the relevant portion of the scientific community, but an advocate can build a record of acceptance even in this situation. (247) In any event, the added difficulty of satisfying Frye is not a reason to depart from the specificity standard for the methodology-conclusion classification. If anything, it is a reason to replace Frye with a more direct inquiry into scientific validity.

D. Looking Back at Statistical Evidence

    The prior sections reveal that until recently, the admissibility of statistical evidence either was admitted as a matter of course, excluded as irrelevant because it was obviously baseless, or questioned on extremely [----------1986----------] dubious grounds. In each of these instances, the strict scrutiny standards for scientific evidence were not applied to statistical proof. As late as 1994, it could be said that although "a particular study may use a method that is . . . so poorly executed that it should be inadmissible[,] . . . [m]ore often . . . the battle over statistical evidence concerns weight or sufficiency rather than admissibility." (248) Indeed, the 1997 edition of McCormick on Evidence does not even address the subtleties of applying the special standards for scientific evidence to statistical analyses, for it suggests that the admissibility of statistical assessments rarely is in doubt. (249) With the explosion of employment discrimination claims brought under Title VII of the 1964 Civil Rights Act in the 1970s and 1980s, and through the efforts of economists (250) and statisticians in a broad spectrum of cases, courts became exposed to--and came to expect (251)--more sophisticated and potentially more useful statistical models. (252) To be sure, there was no shortage of argument among experts and counsel about the persuasiveness of specific statistical analyses. (253) Many courts experienced considerable difficulty [----------1987----------] penetrating these arguments, (254) and some jurisdictions searched for bright-line rules that would reveal which statistical convention or procedure had be used to produce a prima facie case. (255) The admissibility of the studies, however, rarely was questioned. (256)

    This situation changed as commentators and advocates brought concerns about "junk science" to the forefront of the judicial consciousness. Although Daubert was but a variation on the theme of earlier cases, the allusion to "gatekeeping" struck a responsive chord, (257) encouraging federal district courts to be bolder in excluding scientific evidence and prompting state courts reconsider their rules and to look more carefully at proffers of scientific testimony. Today, "Daubert motions" to exclude statistical studies or conclusions have migrated from the realm of epidemiology in which Daubert was grounded to many substantive fields and types of statistical proof. To identify the special issues that arise with [----------1988----------] statistical expert testimony and to illustrate how these issues should be approached, the remainder of this article examines a study of damages in a major antitrust case.

II. BRINGING DAUBERT AND KUMHO TO BEAR: CONWOOD CO. v. UNITED STATES TOBACCO CO. (258)

A. Conwood's Complaint: Monopolizing Moist Snuff

    Snuff is a smokeless tobacco product (259) that is placed in small amounts between the cheek and the gums. The major producer of moist snuff is United States Tobacco Company, Inc. (USTC), (260) followed by Conwood Company, L.P. (261) In 1998, Conwood filed a complaint in the United States District Court for the Western District of Kentucky alleging that USTC monopolized the moist snuff [----------1989----------] market in the U.S. in violation of Section 2 of the Sherman Act. (262) Conwood's theory, as developed at a four-week trial, was that:

In 1990, UST began an orchestrated campaign to choke off the distribution of rivals' products. Disdaining competition on the merits--which UST feared would erode its market share and profit margin--UST used its power to exclude competitors' display racks, advertising, and products. UST's representatives tossed as many as 20,000 Conwood [sales] racks [in retail stores] into dumpsters each month. (263)

    USTC denied engaging in systematic, exclusionary conduct of this (or any other) sort. It moved to exclude econometric testimony designed to prove that USTC's allegedly illegal conduct gravely suppressed Conwood's sales of its brands of snuff, and it sought summary judgment. The district court denied these motions. At trial, USTC cross-examined Conwood's expert and presented its own expert, who dismissed the damages study as worthless, (264) but produced no evidence of its own as to the amount of damages.

    After deliberating for under four hours, a jury awarded Conwood $350 million in damages. (265) Trebling this figure, (266) the district court entered judgment of $1.05 billion. (267) USTC's appeal to the Court of Appeals for the Sixth Circuit is pending.

B. Conwood's Resistance Theory

    In establishing damages, Conwood relied heavily on an analysis prepared by Dr. Richard Leftwich, a professor of accounting and finance. (268) As presented, the study appears to be a paradigm of [----------1990----------] objective, scientific inquiry. It began with a "test [of] a hypothesis about the effect of USTC's behavior on Conwood's performance." (269) "The hypothesis was that USTC's anticompetitive behavior had a greater impact on Conwood's market performance in cases where Conwood had a relatively low market share in . . . 1990." (270) We can call this a "resistance theory." Stated more fully, this theory posits that (1) UST engaged in anticompetitive conduct to roughly the same degree in every state; (2) the conduct had little or no effect on Conwood's sales in states where Conwood was resistant to these practices--where it had a large market share in 1990; and (3) the conduct had a greater effect on Conwood's sales in states where Conwood was susceptible--where it had a small market presence in 1990.

C. Conwood's Data

    To test this resistance theory, the expert compiled a table (reproduced in the Appendix as Table A1) showing Conwood's percentage of moist snuff sales in each state in 1990 and 1997. (271) For example, in 1990 Conwood sold 14% (by weight) of all moist snuff in Vermont; by 1997, Conwood's share rose four percentage points, to 18%. In the District of Columbia, the share rose 10.3 points, from 7.2% to 17.6%. (272) All these "raw data," as Leftwich called them, (273) are shown in Table A1 of the Appendix.

    The figures in Table A1, however, are not those used in the expert's first report.251 The original data came from an accounting [----------1991----------] firm's report on the pounds of moist snuff sold annually in the various states. (274) These numbers were not recorded correctly for the initial analysis. These data-entry errors resulted in an excess of 245 million dollars in estimated damages. (275) Such errors are flaws in execution that should be evaluated under Federal Rule 403; they do not affect the validity of the statistical methodology. Under Kumho, it also could be argued that they bespeak a lack of rigor that precludes the expert from testifying. (276) Data-entry errors are common in academic research, however, and once the expert has corrected the major errors, even if belatedly, exclusion on this ground does not seem justified. A corrected analysis may well be based on a valid and (ultimately) a reasonably implemented approach.

    The state-by-state data can be presented more perspicaciously in graphical form. Figure 1 is a scatter diagram that plots the 1990 market share (the horizontal distance on the X-axis) against the subsequent growth (the height on the Y-axis). Each state thus appears as a point in the graph. [----------1992----------]

Figure 1.
Scattergram for Conwood's Market Share Data

D. Regression Analysis to Show Causation

1. The Regression Results

    Leftwich then testified that he could learn little by looking at the situation in particular states, for these results were just "anecdotes" or "stories." (277) "[A]s a professional economist," (278) he was obliged to undertake "systematic analyses" and "empirical data analysis." (279) Therefore, he used "a standard economic method . . . called regression analysis"to test "the prediction of the original hypothesis that Conwood's performance in low market share states should have been . . . hampered more than it was in high market share states." (280) The "standard economic method" revealed that "there was a highly reliable relationship between Conwood's growth in the [----------1993----------] period [from 1990 to] 1997 and its market share in 1990." (281) That is, "the results were highly reliable, or statistically significant in . . . that there was more than a 95% chance that these results were, in fact, reflective of systematic patterns in the data." (282)

    These characterizations of statistical significance and the nature of the relationship (283) are misleading at best, (284) but they result from a flawed attempt to translate technical terms into lay language, (285) and not necessarily from a failure to use sound statistical methods. As such, even though they do not reflect the "intellectual rigor" with which knowledgeable experts would be expected to present their results outside of litigation, they have no trans-case implications. Moreover, they can be fuel for effective impeachment. Therefore, these errors in the presentation of the statistical analysis should not preclude all testimony about the analysis.

    The type of regression performed in Conwood is known as "simple linear regression." The idea is to relate subsequent growth to initial market share with a straight line through the cloud of data points in Figure 1. The equation for a straight line that has a slope β and a height α where it intersects the Y-axis is

Y = α + βX(1)

[----------1994----------] The Greek letters α and β stand for numbers that define a straight line, (286) and the regression procedure simply finds the particular numbers that define the one line that best fits the data. (287)

    Because factors apart from Conwood's shares in 1990 affect Conwood's market share in 1997, we would not expect the growth within each state in the 1990-1997 period to be given exactly by this simple equation. Due to the many variables not captured in equation (1), in some states the growth will be greater, and in others, it will be less. If the effects of all the unobserved factors merely combined to produce random fluctuations from the straight-line relationship, we could just add an "error term" to the equation to account for these disturbances. Conwood's expert therefore posited the following statistical model:

Y = α + βX + ε, (2)

where α is the growth expected in a state in which Conwood had no sales in 1990 (the Y-intercept), β is the constant increase in growth for a unit increase in initial market share, and ε is a random fluctuation from the values of Y expected on the basis of α and β alone. In other words, the error term ε represents "noise" that distorts the deterministic relationship of equation (1). Furthermore, the expert assumed that the level of the noise (from all the factors that actually determined sales but are omitted from equation (1)) was the same in every state and that it was what engineers call "white noise." (288) [----------1995----------]

    For the market share data of Table A1, the best estimate of the intercept is 0.85, and the best estimate of the slope is 0.22. That the estimated slope is 0.22 means that, on average, across all states, every additional percentage point in the share of the 1990 market is associated with an increase of about two-tenths (0.22) of a percentage point by 1997. (289) If there were no association at all (β = 0), and if other assumptions that Dr. Leftwich apparently did not verify held, then the chance that the observed value of would as far from the expected value of zero as 0.22 would have been about 0.01. The regression line Y = 0.85 + 0.22X is shown in Figure 2, which superimposes this straight line on the scattergram. Although the actual values show considerable dispersion about the estimated regression line, there is a modest correlation between Conwood's 1990 market share in a state and its subsequent share gain in that state. (290) [----------1996----------]

Figure 2.
The Regression Line

2. The Causal Inference

a. Applicability of Daubert

    Even if there is a weak but real relationship between initial market share and subsequent growth, does it prove that "anticompetitive behavior hampered Conwood's growth more in the non-toehold states than in the toehold states," (291) as Conwood's expert suggested? Or is the relationship, as USTC suggested on appeal, an exercise in searching for a pattern in noisy data and reading into that pattern something that is not there? (292) [----------1997----------]

    The inference that the differences in Conwood's growth should be attributed to USTC's illegal acts requires a leap of faith, for the regression model contains no variable that measures these acts. One must step outside the regression framework to draw the desired conclusion, and this methodological step is difficult to justify under Daubert. The issue here is not just whether, under the facts of a specific case, certain assumptions in a statistical model are reasonable. (293) The method in question requires inferring that illegal conduct caused injury to a competitor simply by positing some kind of resistance to illegal conduct that cannot be measured directly but, by hypothesis, is reflected in some pattern in the competitor's sales history after the conduct began. This logic could be used in any antitrust case. Being a general, seemingly scientific theory or procedure, the resistance theory should be subject to the full scrutiny that Daubert establishes for scientific evidence.

    It is difficult to see how the resistance theory can survive this scrutiny. It has never been published or examined by other economists. (294) As a procedure for discerning illegal conduct, the [----------1998----------] resistance method could have an enormous error rate. The method is essentially circular. For example, an unscrupulous analyst intent on finding causation and damages could hypothesize that Conwood's marketing efforts are more susceptible to USTC's conduct in the mountain states of Arizona, Colorado, Idaho, Montana, Nevada, New Mexico, Utah, and Wyoming. (295) The analyst then could "confirm" this hypothesis with a critical ratio test on the data in Table A1, for Conwood's mean gain in market share in the mountain states is one-quarter of its gain in the states outside this mountain region. (296)

b. Implications of the Threat of Confounding

    A further obstacle to inferring causation is the threat of confounding. Confounding refers to the action of an unobserved variable that also is correlated with the dependent and the independent variables. (297) Without data on potential confounders, it is impossible to disentangle the effect of the measured variable from the potentially confounding ones. In the Conwood case, it is easy to suggest possible confounding variables. Perhaps personal income among snuff users has grown more in states in which Conwood had [----------1999----------] small shares in 1990, and USTC's brands appeal more to relatively affluent users. Population migration across state lines might be at work. Regional differences in consumer attitudes might lead to more growth in states in some regions than in others. Advertising restrictions and the conduct of other competitors also vary across states.

    The well-known fact that correlation is not causation, (298) however, is not itself a reason to exclude an observational study offered to prove causation. (299) The validity of an inference of causation depends on how well the study succeeds in "controlling" for plausible confounders and the extent to which its conclusions have been replicated in other populations. (300) The most secure procedure for controlling for lurking variables is a randomized, controlled experiment. (301) Of course, that is not possible in most econometric research, and it was not possible in Conwood. With adequate data, however, a statistical analyst can determine whether another variable might account for the pattern. The analyst could "control" for income, for instance, by examining whether Conwood's share growth in those states where snuff users experienced similar income growth was related to Conwood's initial market share. Another approach would be to modify equation (2) by adding a variable for personal income growth among snuff users. If we call [----------2000----------] this variable Z and use the Greek letter gamma (γ) to denote the change in market share (Y) associated with a unit change in Z (for a fixed value of the starting market share, X), then equation (2) becomes

Y = α + βX + γZ + ε. (3)

If Z is correlated with X, then the estimated value of β should decline (relative to equation (2)), making it harder to attribute a change in market share growth (Y) to a unit change in initial market share (X).

    Conwood's expert examined some possibly confounding variables with a multiple regression model similar to equation (3). He did not report whether they were correlated with 1990 market share (X), but stated that "I tested all the plausible explanations that I had data that enabled me to test" (302) and that "[m]y tests showed that plausible alternative explanations were inconsistent with the patterns I found in the data." (303) If the expert actually employed reasonable procedures to eliminate all plausible rival hypotheses, then the resistance-regression procedure should not be inadmissible simply because the initial regression left open the possibility of confounding variables.

c. Resistance Versus Momentum

    One "plausible explanation" that the expert purported to eliminate was not a confounding variable, but rather went to the core of the resistance theory. Instead of attributing the change in market share to the hypothetical "resistance" to USTC's conduct in some states but not others, one might well suppose that there would be more growth, on average, in states where Conwood was better established, if only because its products, for any number of reasons, were selling better in those states than in others. In other words, the regression depicts the effects of "momentum" as readily as "resistance."

    Conwood's expert purported to refute the momentum interpretation of the regression of 1990-1997 growth on 1990 shares by means [----------2001----------] of a regression of 1984-1990 growth on 1984 shares. (304) This regression did not reveal any statistically significant association. Having already found a statistically significant association in the post-1990 period, he concluded that the only thing that could explain the change from "not significant" before 1990 to "significant" after 1990 was differential resistance to illegal conduct.

    This reasoning is fallacious. The momentum theory asserts that with or without resistance, initial market share (X) tends to predict subsequent market growth (Y). A large change in the predictive value of the initial market share as between the earlier and later periods would undercut the theory that only momentum is at work in both periods (as opposed to momentum alone in the earlier period and momentum plus resistance to illegal conduct in the later period). At first glance, it looks like the change in the impact of initial market share in the pre-1990 period to the impact in the post-1990 period is substantial. The pre-1990 estimate of the slope is -0.13, but the post-1990 estimate is 0.22. Both these numbers, however, are estimates of the unknown slope in equation (2). The true value of in the pre-1990 period could be higher, and the true value in the post-1990 period could be lower. Before concluding that the difference in the two periods should be attributed to resistance to illegal conduct after 1990, the hypothesis that is actually the same in both periods (and the observed difference is attributable to chance) must be rejected. (305) Yet, Conwood's expert never tested this hypothesis. Had he done so, he would have found that the uncertainty in the difference in the estimated slopes for each period is too large to permit the conclusion that the difference is statistically significant. (306) [----------2002----------]

    Although the point may seem a fine one, the failure of Conwood's expert to test for the significance of the difference in the estimates of the slope is a methodological flaw that affects the validity of his effort to refute the momentum theory. To make the same point with other language from Daubert, one can observe that the use of two separate tests for significance rather than a single test of the difference between the two estimates does not "fit" the problem of eliminating the rival momentum theory as an explanation for the pattern in the 1990-1997 period.

E. Regression Analysis to Estimate Damages

1. Estimating Effect with a "Regression Rectangle"

    If the resistance-regression proof of causation is vulnerable to assault under Daubert, the use of the regression analysis to estimate damages is open to mayhem. Conwood's expert treated USTC as the cause--indeed, the sole cause--of Conwood's lower growth in most states. As explained in the preceding section, he purported to verify this treatment by a statistical regression model that assumed that the market share in 1997 is equal to one constant plus a second constant multiplied by the market share that Conwood had in 1990. (307) This regression did not take into account any variables to show the effect of USTC's alleged anticompetitive practices. It did not adequately consider whether the pattern or trend in market share growth changed before and after the time that the practices that were supposed to have depressed Conwood's growth were instituted. [----------2003----------]

    In computing damages, the expert inexplicably modified the actual market shares in a way that was supposed to account for the extent of USTC's "bad acts". (308) As in the causation analysis, the 1990-97 growth (as adjusted) was regressed on 1990 shares, yielding the straight line Y = 1.8 + 0.31X, which is plotted in Figure 3.

    Thus far, there has been no analysis of damages--just another regression showing a weak correlation between two variables. To arrive at a figure for damages, Conwood's expert divided the forty-nine states into two groups. The high share, supposedly resistant group consisted of three states in which Conwood had more than 20% of the market in 1990. Although the law allows substantial latitude in estimating damages once liability has been established, the expert had no economic theory or data that indicated why he selected this cut-off point. Nevertheless, he assumed that these states were unaffected by USTC's anticompetitive practices and hence had no damages.

    The low-share, susceptible group consisted of the other forty-six states. Leftwich assumed that had there been no anticompetitive practices, the 1997 Conwood market share in every one of these forty-six low-share states would have gone up from 1990 by the same amount. But he did not use the actual experience of the high share states in the 20+% range to deduce this amount. (309) Instead, he used the regression of 1997 on 1990 shares to predict that if Conwood started with 20% of a state's market in 1990, it would have 28.1% of the market in 1997. If Conwood's share in a low share state had gone up less than 8.1 percentage points, he boosted its [----------2004----------] gain to 8.1; if Conwood's share had gone up more than 8.1 percentage points, he reduced the gain to 8.1 points. Thus, he gave every one of the low-share states a market gain of 8.1 points--an amount that exceeded Conwood's actual performance in two of the three high share states that supposedly were unaffected by USTC's practices.

Figure 3.
Conwood's Estimate of How Much More of the Market it Would Have Gained

    Figure 3 is a picture of this augmentation of market shares. The points in the rectangle are states in which Conwood supposedly would have gained more market share in the absence of USTC's acts. The lengths of the vertical lines drawn from these points up to the horizontal line Y = 8.1 are the increases in market share growth that Conwood's expert awarded Conwood in these states. The points above the rectangle are the states in which Conwood outperformed the gain expected of a state in which Conwood had 20% [----------2005----------] of the market in 1990. The lengths of vertical lines drawn from these points down to the horizontal line Y = 8.1 are the decreases in market share growth given to these states. The net adjustment is the difference between the sum of the lengths of the first set of lines (in the rectangle) and the sum of those in the second set (above the rectangle). This difference translated into 488 million dollars of estimated damages. (310)

Figure 4.
Conwood's Estimate of the Market Shares It Would Have Gained Without "Bad Acts" by USTC

[----------2006----------]

    The expert's picture of what Conwood's growth would have looked like in the absence of the "bad acts" from 1990-97 is shown in Figure 4. The forty-six states in or above the rectangle now have the same share gain of 8.1. (311)

2. Applying Daubert to the "Regression Rectangle"

    Skepticism of expert testimony is one thing; exclusion of that testimony is another. To apply the validity requirement of Daubert to this procedure for estimating damages, a court must ask whether the methodology is sound. Conwood argued to the district court that regression analysis and other such economic models are accepted and tested methods for proving damages," (312) and the district court was satisfied with this rejoinder. In a brief opinion that recited the sources of the data and the fact that Conwood's expert applied various regressions to establish his resistance theory, the district court concluded that "Leftwich's testimony satisfies Daubert. His methodologies are generally acceptable. Defendant's expert also used them. . . . The credibility of the expert and his opinions is an issue for the jury." (313)

    By "methodologies," the court apparently meant the statistical procedure of regression. The court relied exclusively on another district court opinion in Ohio v. Louis Trauth Dairy, Inc., (314) a price-fixing case that it characterized as holding that Daubert was satisfied because "the experts all were . . . economists or statisticians [who] conducted econometric and regression analyses that were testable, generally acceptable, and reproducible." (315) As I explain [----------2007----------] below, however, this analysis of the regressions in Conwood is far too cursory.

a. Daubert's Four Factors

    The problem with the district court's conclusion and reasoning is that the analyst did more than use linear regression to predict the value of a dependent variable. Regression is an apodictically valid tool for measuring how changes in one variable are associated with changes in other variables. The mathematics that generates a regression equation is sound, but whether a "regression rectangle" validly estimates damages involves additional considerations. The most important of these considerations is the underlying premise that resistance to anticompetitive conduct is linearly correlated with market share and is the explanation for the positive slope of the regression line. That theory is difficult to square with the kinds of factors listed in Daubert. (316) First, the resistance theory holds that the effects of anticompetitive conduct can be measured simply by identifying states in which market share grew less and attributing the entire difference to the challenged conduct. This theory appears to have been invented for use in the Conwood case and has never been tested--in that case or any other. This fact counts heavily against admissibility. (317) Second, "the theory or technique has [not] been subjected to peer review and publication." (318) The economic literature is devoid of any discussion of the resistance theory and the "regression rectangle" as a means of detecting and measuring harms from anticompetitive conduct. Third, because the theory and method have yet to be tested, the risks and magnitude of the errors that it yields are unknown. Finally, as indicated by the lack of published or other critical discourse about the approach, general acceptance in the scientific community is lacking. The record in [----------2008----------] Conwood contains no evidence that economists, statisticians, accountants, or finance professors accept the resistance theory and "regression rectangle" estimates of damages. The procedure bears no resemblance to the commonly accepted "before-and-after" and "yardstick" approaches that use meaningful control groups to separate the effects of anticompetitive conduct from other factors. (319)

    In short, to argue that the regression study in Conwood satisfies Daubert simply because it uses least-squares regression is tantamount to claiming that Ptolemy's theory that the sun revolves around the earth is valid and generally accepted because these movements can be described by geometry. A meaningful application of Daubert requires verification of all the major premises of the analytical method, not just those at the highest level of abstraction. (320) It is therefore appropriate to observe that no other expert in any antitrust case has used "rectangular regression" to infer causation or to estimate damages.

    A narrower, but still severe methodological flaw in the regression analysis is the use of "adjusted" market shares. (321) Using shares that were already adjusted to reflect the effect of illegal conduct to deduce the effect of that same conduct is extremely puzzling. Indeed, the resulting numbers are so lacking in probative value as to be excludable under the balancing test of Rule 403. Unadjusted shares produce much lower damage estimates-- $155-238 million (322) instead of $313-488 million. (323) It therefore seems that the prejudicial effect of adjustment in this case substantially outweighs whatever minimal probative value the adjustment could have.

    Yet another disturbing feature of the analysis is the use of the difference of 8.1 percentage points between the predicted share of 28.1% in 1997 and the arbitrary 1990 starting share of 20%. The [----------2009----------] share growth also can be expressed as the ratio of 28.1 to 20, or 1.405. Since this is the growth factor in a putatively resistant state, why not assume that but for the allegedly unlawful acts, the low-share, susceptible states would have grown by the same factor of 1.405? This multiplicative adjustment would raise an initial 1% share to only 1.4% instead of 8.1%.

    The assumption of additive growth implicit in the regression-rectangle damages estimate is a general feature of the method rather than a case-specific fact. Consequently, it is appropriate to apply Daubert and to demand a showing that the assumption is valid. In Conwood, no theoretical or empirical reason to expect that growth would be additive rather than multiplicative was offered. (324) Even if the resistance theory were better established, the additive method of adjustment has not been validated, and it could be quite unreliable.

b. Daubert's Fit and Joiner's Nexus

    Reliance on mathematics or statistics is not enough to satisfy Daubert. If such reliance were sufficient, then the plaintiffs' experts in both Daubert and General Electric Co. v. Joiner would have been allowed to testify without further ado, for the experts in both these cases relied on statistical studies or analyses. In Conwood, the use of regression to estimate damages can be dismissed because it does not fit the problem, (325) but the "fit" analysis adds nothing to the analysis of the validity of "rectangular regression" as a method for estimating damages. (326) It is merely another way to say that although regression is a valid procedure for looking at the association between variables and for predicting the value of a dependent variable, the interpretation of differences in market share growth as the result of "resistance" to illegal conduct has no logical or scientific basis.

    In Joiner the Court wrote that heightened scrutiny encompasses not only the abstract methodology, but also the use of that methodology to reach specific conclusions. As discussed in Part II.C.2, in examining an expert's opinion based on standard statistical methods in epidemiology, the Court held that the opinion failed to satisfy Daubert because it was "connected to existing data only by the ipse dixit of the expert." (327) In the end, there was "simply too [----------2011----------] great an analytical gap between the data and the opinion proffered." (328) The phrase "ipse dixit of the expert," much like the "gatekeeping" metaphor of Daubert itself, has great rhetorical force, but little analytical precision. Here, it is easy to dismiss the "rectangular regression" as an ipse dixit that cannot bridge the "gap between the data and the opinion proffered," but the justification for these characterizations lies entirely in the preceding analysis of the putative validity of this novel procedure for estimating damages.

3. "Internal" Criticisms of the Regression

    The major criticisms of the regression study in Conwood are "external" to the study. The problems with the "resistance theory" and the "rectangular regression" are present whether the regressions are performed impeccably or erroneously. These problems undermine the major premise that "resistance" can be presumed to be the explanation for variations in market share growth. Such external criticisms clearly affect the validity of this regression-based method for establishing damages. (329) [----------2012----------]

    Other criticisms are internal to the study, and the propriety of judging admissibility under Daubert's heightened scrutiny for reliability is more debatable. For instance, the data set contains "outliers"--states that unduly influence the regression results. (330) The amicus brief hammers hard at this point:

    The difference between a finding by Dr. Leftwich of several hundred million dollars of damages and a finding of no damages is the inclusion in his model of a single anomalous data point, the data for Washington, DC ("DC").

    Any reasonable statistical analysis would identify the DC point as one that does not fit the model. . . .

    The question of whether the DC data point should be given the same weight as other data points is not an academic quibble. A fundamental step in producing a sound econometric analysis is to look for aberrant data that is [sic] either erroneous, highly variant, or does not fit the specified model. Any number of diagnostics would have identified the DC data point as an outlier that should either have been excluded from Dr. Leftwich's regressions or given less weight than other data. This is not an issue over which reasonable economists would differ.

    . . . Dr. Leftwich's failure is not a subtle statistical mistake. This kind of failure to examine the impact of such an outlier would not be acceptable in an undergraduate econometrics class, let a