Click to return home

Prove It with Figures:
Empirical Methods in Law and Litigation
by Hans Zeisel & David Kaye
with a foreword by Jack B. Weinstein
1997 Springer-Verlag Chinese translation in progress at Zhongguo Renmin Daxue Press.

REVIEWS International Statistical Institute: "The book deserves, and even needs, to be brought to the attention of law students and the judiciary."

Jurimetrics Journal: "[A]n excellent primer for the legal professional who has little previous acquaintance with statistics and empirical methods."

Public Opinion Quarterly: "Zeisel and Kaye have come close to writing a handbook for general social science research."

Journal of the American Statistical Association: "[T]his book will prove to be a valuable resource."

Zentralblatt MATH: "[T]errific reading for anyone interested in the use of statistical methods in society -- lawyer or not."

Contents Dedication, Foreword, Preface, Acknowledgments, List of Figures, List of Tables

1 The Search for Causes: An Overview
2 The Controlled Randomized Experiment
3 Inferring Causes from Observational Studies
4 Epidemiological Studies
5 Summing Up: Replication and Triangulation
6 Coincidence and Significance
7 Sampling
8 Content Analysis
9 Surveys and Change of Venue
10 Trademark Surveys: Genericness
11 Trademark Surveys: Confusion
12 The Jury: Composition and Selection
13 DNA Profiling: Probabilities and Proof

Notes, Glossary, List of Cases, Index

Chapter 1.
The Search for Causes: An Overview

Among the many questions that are central to legal proceedings, the question of whether one thing caused another is the most frequent. It occurs in civil and criminal litigation. Does capital punishment deter crimes? Does a food additive cause cancer? Does a headache tablet work as advertised? Would additional information in a securities prospectus have discouraged potential investors from purchases that proved to be unwise? Does the similarity in the names of two products lead consumers to buy one because of their familiarity with the other, well-known and respected brand? The list is endless.

At least some such questions can be addressed by collecting and analyzing data rather than relying solely on seat-of-the-pants judgments.(1) Pertinent research already may exist. If so, it becomes the task of the lawyer and appropriate experts to take this research "off the shelf" and explain it to the court. In Brown v. Board of Education, 347 U.S. 483 (1954), and related cases attacking racial segregation in elementary education, for example, the courts noted experiments that purportedly showed the harms of racially segregated schools on young children.(2) More recent examples include psychological studies pinpointing conditions under which eyewitnesses tend to err in identifying criminals,(3) investigations into the effects of drugs and other chemicals in animals and humans,(4) studies of how sex stereotyping affects perceptions of women in the workplace,(5) and the conditions that promote such stereotyping and hostile work environments.(6)

Even if no pre-existing studies are available, a "case-specific" one may be devised, as when a psychologist simulates the conditions of a particular eyewitness's identification to see whether comparable identifications tend to be correct.(7) Likewise, an organization investigating racial discrimination in the rental housing market may send several "testers" who, it is hoped, differ only in their race, to rent a property.(8) In product liability cases valuable information as to the cause of a product's failure may be gleaned from pretrial experiments(9) or data on trends in sales and accidents. An example occurred in a consolidated trial of 800 cases involving 1,100 children born with deformed limbs. The plaintiffs alleged that the childrens' mothers had taken Bendectin for relief of nausea during pregnancy, and that this drug produced the birth defects.(10) The drug's manufacturer introduced charts prepared for that case, which showed birth defect rates remaining stable or increasing during a time when Bendectin sales dropped markedly. U.S. District Judge Carl Rubin, who presided over the trial, described this presentation as "the most telling single piece of evidence I have ever seen after 23 years on the federal bench."(11) Again, the list of such possible empirical analyses is bounded only by the imagination, good sense, and financial resources available to counsel.(12)

When is such research scientifically convincing? It might seem a simple matter to address the questions of cause-and-effect listed above. If we want to learn something about the relative effectiveness of a headache remedy, we invite a hundred persons to try it when they have a headache, and then find out how many headaches disappeared or got better. Alas, it is not this easy. To identify the remedy as the cause, we must exclude the possibility that other factors brought it about. Even if we found that the patients given the remedy usually improve, this finding would not establish that the remedy is beneficial. Many headaches have a way of disappearing by themselves after a while. Others may disappear because the sufferer was given something -- a phenomenon known as the "placebo effect." Without some "control group" that was not subject to these possibilities or some method that "controls" for their effects, who can say what the true cause is?(13)

The controlled randomized experiment is the ideal procedure for eliminating such rival hypotheses, and administrative agencies and courts have demanded this form of proof in appropriate circumstances.(14) In its simplest form, the structure of such an experiment is this: Assemble the participants in the experiment. Assign these subjects to two groups randomly, that is, by some lottery process. Without their knowing in which half they are, the subjects in one group receive the headache tablet, the others receive a look-alike placebo. (If the question is how the experimental headache remedy compares with another one, the control group should receive the other treatment.) Whatever difference appears in headache relief between the treatment and the control group then must be the result either of the particular remedy or of some difference in the groups of headache sufferers.

The beauty of random assignment is that it prevents any biases of the experimenter from producing a difference between the groups. The medical researcher who develops the remedy naturally hopes or expects that it will work; this researcher may pick people for the treatment group who look more likely to recover anyway, or the investigator may pick people for the control group who appear least likely to improve. The bias in assignment need not be conscious; subliminal influences may lead to groups that are different in some way that might be related to the outcome. If this happens, then any difference in the recovery rates or times due to the remedy will be confused with the difference in the composition of the groups. Thus, a review of 250 clinical trials of medical treatments found that about half did not conceal adequately the allocation of patients to treatment and control groups -- and those trials yielded estimates of effectiveness that were substantially higher than the completely "blind" trials.(15) Strictly random assignment removes the possibility that a difference in outcomes merely reflects a difference in the groups that resulted from the experimenters' skill in sorting people according to their likelihood of recovery or similar factors.

For the benefits of random assignment to be realized fully, neither the researchers administering the cure and the placebo nor the subjects in the control and the treatment groups can know who is in what group and which tablet is which. A medical experiment that adheres to this requirement is said to be "double-blind."(16)

Another important advantage of random assignment is that it permits us to compute the probability of large differences in the outcomes resulting from the luck of the draw when, in reality, the treatment has no effect above and beyond the placebo. If every subject has an equal probability of being in either group, then it is unlikely that all of the subjects with better prognoses and shorter headaches will end up in one group as opposed to the other. But if the outcomes do not come from an experiment where the members of the treatment and control groups have been assigned randomly, then there is no easy way to quantify the likelihood that differences in the characteristics of the groups rather than the treatment itself would produce a difference in outcomes.

For a variety of reasons discussed in the next chapter, only medical treatments are, as a rule, tested in randomized experiments. More often, we, and the courts, must answer the causal question from other data. A control group sometimes can be created after the fact, and the outcomes in that group contrasted with those in a treatment group, but if the control group was not created by random assignment, we should be suspicious. We must explore the differences between treated and control groups that may have existed prior to the treatment.

At this point, two insights are important. First, outcome figures from a treatment group, without a control group, tell us very little, and may indeed mislead us. Comparison with a control group is essential. Second, if the control group was obtained through random assignment prior to the time of treatment, a difference in the outcomes between treatment and control groups may be accepted, within the limits of statistical error, as the true measure of the treatment effect. But if the control group was created in any other way, we must be suspicious of possible differences in the groups that existed prior to the treatment and that may have been related to the outcomes. Statistical "adjustments" or "controls" can help deal with such differences in observational studies, but the evidence of causation cannot be as direct as that obtained from a well-run, randomized experiment.

We discuss, in Chapter 2, the power and limitations of the randomized, controlled experiment and the reasoning that can help eliminate the possibility that an observed difference in outcomes is attributable to the luck of the draw in randomly assigning subjects to each group. In chapter three, we discuss the observational data on causal connections and the analytical challenges they pose. Chapter 4 continues this story, with epidemiological studies that are seeing increased use in product liability and other litigation. Chapter 5 considers the importance of combining information from a variety of studies in order to overcome the inevitable weaknesses and limitations of individual research efforts. Chapter 6 describes some probabilities and statistics used to assess whether an observed difference between two groups is too large to ascribe to chance. These chapters raise general points of methodology, always in the context of concrete examples. Among the examples considered are experiments in criminology, studies of procedural reforms in the legal system, experiments on the effects of jury instructions, analyses of the deterrent effect of capital punishment, and data on the health effects of silicone breast implants, dioxins, electromagnetic fields, tobacco smoke, and Bendectin.

From there, we turn to chapters on the applications of the methodologies to specific areas of litigation or to important legal institutions. We describe the basic principles of scientific sampling and content analysis in Chapters 7 and 8. Applications to motions for a change in the venue of a trial, to trademark surveys, and to jury selection are considered in Chapters 9 through 12. Finally, Chapter 13 examines a form of what forensic scientists call "associative evidence" that links a suspect to a crime. We discuss selected legal and statistical issues that have dominated the introduction of DNA profiling into the legal system. The chapter provides a starting point for studying concepts from the theory of probability that are important in evaluating many types of evidence from the forensic sciences.


1. Exhortations for more and better research into human behavior and the legal system were once seen as radical. See generally John H. Schlegel, American Legal Realism and Empirical Social Science (1995). Today, they are commonplace. For a concise overview of studies of judicial reliance on social science evidence, see Shari S. Diamond & Jonathan D. Casper, Empirical Evidence and the Death Penalty: Past and Future, 50 J. Soc. Issues 177 (1994). [BACK]

2. Unfortunately, the research was not capable of bearing the weight that the Supreme Court seemed to place upon it. See Harry Kalven Jr., The Quest for the Middle Range: Empirical Inquiry and Legal Policy, in Law in a Changing America 56, 65-66 (Geoffrey C. Hazard Jr. ed., 1968); Wallace D. Loh, In Quest of Brown's Promise: Social Research and Social Values in School Desegregation, 58 Wash. L. Rev. 129 (1982) (book review). [BACK]

3. E.g., State v. Chapple, 660 P.2d 1208, 1224 (Ariz. 1983) (reversing a conviction for excluding testimony about these studies). For citations to the case law and scientific literature, see, e.g., Modern Scientific Evidence (David Faigman et al. eds., 1997); Brian L. Cutler & Steven D. Penrod, Mistaken Identity: The Eyewitness, Psychology, and the Law (1995); 1 McCormick on Evidence 206(A) (John W. Strong ed., 4th ed. 1992); Kipling D. Williams et al., Eyewitness Identification and Testimony, in Handbook of Psychology and Law 141(D.K. Kagehiro & W.S. Lauter eds., 1992). [BACK]

4. See Raynor v. Merrell Pharmaceuticals, Inc., 1997 WL 18170 (D.C. Cir. Jan. 21, 1997); infra Chapter 4. [BACK]

5. The testimony of a social psychologist about stereotyping played a limited, and controversial, role in Price Waterhouse v. Hopkins, 490 U.S. 228 (1989). Compare Gerald V. Barrett & Scott B. Morris, The American Psychological Association's Amicus Curiae Brief in Price Waterhouse v. Hopkins: The Values of Science Versus the Values of the Law, 17 Law & Hum. Behav. 201 (1993), with Susan T. Fiske et al., What Constitutes a Scientific Review? A Majority Retort to Barrett and Morris, 17 Law & Hum. Behav. 217 (1993). But see Allan J. Tompkins & Jeffrey E. Pfeifer, Modern Social-Scientific Theories and Data Concerning Discrimination: Implications for Using Social Science Evidence in the Courts, in Handbook of Psychology and Law 385, 399 (D.K. Kagehiro & W.S. Lauter eds., 1992) (implying that no controversy exists among psychologists). [BACK]

6. Robinson v. Jacksonville Shipyards, Inc., 760 F. Supp. 1486 (M.D. Fla. 1991); Jenson v. Eveleth Taconite Co., 824 F. Supp. 847 (D. Minn. 1993). But cf. Johnson v. Los Angeles County Fire Dep't, 865 F. Supp. 1430, 1441 (C.D. Cal. 1994) (excluding expert testimony that attempted to extrapolate from a study of the effects of a sexually explicit and degrading films to pinups from Playboy magazine). [BACK]

7. Willem A. Wagenaar, The Proper Seat: A Bayesian Discussion of the Position of Expert Witnesses, 12 Law & Hum. Behav. 499, 501-04 (1988) (describing the difficulty of presenting the results of such an experiment to a court in the Netherlands). [BACK]

8. E.g., United States v. Youritan Construction Co., 370 F. Supp. 643 (N.D. Cal. 1973), aff'd in part, 509 F.2d 623 (9th Cir. 1975); cf. Ian Ayres, Fair Driving: Gender and Race Discrimination in Retail Car Negotiations, 104 Harv. L. Rev. 817 (1991). [BACK]

9. Nanda v. Ford Motor Co., 509 F.2d 213, 223 (7th Cir. 1974) (striking car with a ram to see whether the impact would dislodge the fuel pipe). [BACK]

10. In re Bendectin Litigation, 857 F.2d 290 (6th Cir. 1988), cert. denied, 488 U.S. 1006 (1989). [BACK]

11. Michael D. Green, Bendectin and Birth Defects: The Challenges of Mass Toxic Substances Litigation 231 (1996). Considering the many variables that influence the incidence of birth defects and the fact that Bendectin, if weakly teratogenic, might produce a relatively small number of cases, the demonstration may have had more impact than it merited. Id. Such limitations on observational studies are discussed in Chapter 3, and other studies of Bendectin are noted in Chapter 4. [BACK]

12. For a review of the admissibility of the results of pretrial experiments, see 1 McCormick on Evidence, supra note 3, at 202. [BACK]

13. Even the interpretation of controlled experiments to investigate the relative effectiveness of two analgesics is not always simple, as manufacturers accused of deceptive or false advertising have discovered. E.g., McNeil-P.P.C. v. Bristol-Myers Squibb Co., 755 F. Supp. 1206 (S.D.N.Y. 1990), aff'd, 938 F.2d 1544 (2d Cir. 1991) ("crossover" study purportedly demonstrating therapeutic superiority of Excedrin over Tylenol found to be tainted by "carryover" effect). For a discussion of the benefits and dangers of crossover (also called "within subjects") studies, as opposed to the randomly selected control groups discussed in Chapter 3, see Thomas A. Louis et al., Crossover and Self-Controlled Designs in Clinical Research, in Medical Uses of Statistics(John C. Bailar III & Frederick Mosteller eds., 2d ed. 1992). [BACK]

14. E.g., Sterling Drug, Inc. v. FTC, 741 F.2d 1146, 1153 (9th Cir. 1984) ("it is the consensus of experts with experience in comparing analgesic efficacy who testified in this proceeding that at this time well-controlled clinical tests are necessary to establish the comparative superiority of one brand of aspirin over others."). [BACK]

15. The odds ratio (defined in Chapter 4) was larger in trials in which the allocation sequence had been inadequately concealed yielded by 30% to 40%, on average, compared with trials in which authors reported adequate allocation concealment. K.F. Schutz et al., Empirical Evidence of Bias: Dimensions of Methodological Quality Associated with Estimates of Treatment Effects in Controlled Trials. 273 J.A.M.A. 408 (1995). [BACK]

16. For explanations of the designs of clinical experiments, see, e.g., Bailar & Mosteller, supra note 13; Curtis L. Meinhert, Clinical Trials: Design, Conduct, and Analysis (1986); L.M. Friedman et al., Fundamentals of Clinical Trials (1985). The story of the acceptance of the need for clinical trials in medicine is told in J. Rosser Matthews, Quantification and the Quest for Medical Certainty (1995). Some reasons that participants in medical research sometimes seek to break the code allocating patients to treatment and control groups, however, are discussed in Kenneth F. Schulz, Subverting Randomization in Controlled Trials, 274 J.A.M.A. 1456 (1995). [BACK]