|Quantifying Probative Value
|© 1986 by David H. Kaye. |
This article appeared in the Boston University Law Journal, vol. 66, May 1986 / July 1986, pp. 761 - 766.
Attorneys, statisticians, psychologists, political scientists and philosophers have written at length about efforts to describe the burden of persuasion in numerical terms.1 Less attention has been paid to quantifying the probative value of courtroom evidence.2 Professor Richard Friedman's paper, A Close Look at Probative Value,3 reveals the richness of this heretofore unexplored territory.4 Friedman describes three conceivable measures [--762--]
A litigant proposes to introduce an item of evidence E to help prove his story S.5 Does E have sufficient probative value to justify its admission? In part, the answer depends on how much probative value E has.6 Likelihood theory suggests many closely related ways of expressing this quantity.7 Suppose that S is the government's story that John conspired to sell drugs, and that E is incontrovertible evidence that John was in debt at the time he allegedly joined the conspiracy. Let SC be John's story that he never agreed to any narcotics deal. Then we may speak of Pr(E|S), the probability of E on the condition that S occurred, and Pr(E|SC), the probability of E on the condition that SC occurred. That is, we think of the evidence E as analogous to the outcome of an experiment and consider the probability of its being observed under competing hypotheses. Since these probabilities pertain to unique events, they are personal probabilities.
Likelihood is distinct from probability. Instead of estimating the probabilities of E given S and SC, we seek a measure of the degree to which E supports S. Calling this measure the likelihood of S, we write it as L(S;E), and define it to be proportional to the probability of E given S. In other words, L(S;E) = aPr(E|S), where a is an arbitrary, positive constant.8 The likelihood of SC is L(SC;E) = aPr(E|SC). If S is more likely to be true than SC, then its likelihood is larger than that for SC.9
One measure of how much more likely S is than SC is the likelihood ratio LR = L(S;E) / L(SC;E) = Pr(E|S) / Pr(E|SC). If S and SC are equally likely, then LR = 1. The evidence supports each story to the same degree. If LR > 1, then the evidence is more probative of S than of SC. Presumably, the plaintiff will introduce such evidence when it is available. If LR < 1, then E supports SC more than S. Defendant would be expected to produce such [--763--]
|PV = LR||(1a)|
This definition has a simple multiplicative property. If E1 and E2 denote the introduction of two items of evidence, then PV(E1 Ç E2) = PV(E1)PV(E2).
Other measures of PV come to mind.10 The most prevalent defines PV as the logarithm of the likelihood ratio:
|PV = log LR||(1b)|
With this definition, PV is additive: PV(E1 Ç E2) = PV(E1) + PV(E2).11
One advantage of (1b) over (1a) is that it measures PV on the same scale for plaintiffs and defendants. Plaintiffs' relevant evidence will have log-likelihood ratios (log LRs) from 0+ to ¥. Defendants' relevant evidence will have log-LRs from 0- to -¥. In contrast, the likelihood ratio itself allows plaintiffs' relevant evidence to have PVs from 1+ to ¥, but it compresses the PVs for defendants' relevant evidence into the interval (0, 1).12
Expressions like (1a) and (1b) do not refer to the prior odds in favor of a story. They try to capture an idea of "intrinsic" probative value of evidence. In (1a) or (1b), PV has the same value whether the evidence is introduced in support of a story that already has been shown to be highly likely or in support of an initially implausible story. PV is simply a function of the evidence itself, and the order in which the evidence is introduced has no effect.13
Likelihood plays an important role in classical and in Bayesian statistics. The Bayesian believes that rational partial beliefs conform to the probability [--764--]
Pr(S|E) / Pr(SC|E) = LR Pr(S) / Pr(SC).
Letting w1 stand for the posterior odds Pr(S|E) / Pr(SC|E), and w0 stand for the prior odds Pr(S) / Pr(SC), we have the more compact formula w1 = LR w0. In words, the posterior odds are given by the product of the likelihood ratio and the prior odds. As usual, using logarithmic units changes the multiplicative property to an additive one: log w1 = log LR + log w0. The posterior log-odds are the prior log-odds plus the probative value. Evidence that is more likely under S than SC raises the log-odds, while evidence that is more likely under SC lowers the log-odds. Thus, Bayesians typically are content with (1b).
Again, there are variations. One might think that a good explanation of evidence E is a story S such that Pr(E|S) is much greater than Pr(E), but Pr(S) is not too small.14 This suggests that we might quantify probative value as15
|PV = log Pr(E|S) - log Pr(E) + 1/2 log Pr(S)||(2)|
The difference between the first two terms measures how much S increases the probability of E. If the prior probability of S is very small, then the third term will be a large negative number, which will decrease PV.
For the Bayesian, then, probative value may be defined, as it was in likelihood theory, as a simple function of evidence E. In this way, the Bayesian separates the process of probability revision into two components -- the "intrinsic" probative value of the evidence, and the prior odds of the story (that come from other background information). Alternatively, and unlike the pure likelihood theorist, the Bayesian can adopt a measure of probative value, such as (2), that refers not merely to E but also to the starting point -- the prior probability or odds. Friedman pursues the latter course.
Choosing the Right Expression
Having indicated the impressive variety of possible measures of probative value, let us turn to the specific measure that Friedman adopts. He defines probative value as [--765--]
|PV = Pr(S|E) - Pr(S)||(3a)|
This definition, he insists, follows from two criteria -- simplicity and fidelity to rule 401.16
I am not so sure. For brevity, let p0 represent the prior probability Pr(S), and let p1 represent the posterior probability Pr(S|E). Then we can rewrite (3a) as
PV = p1 - p0 = w1 / (1 + w1) - w0 / (1 + w0)
Since w1 = LR w0, we conclude that
|PV = w0 [LR / (1 + LR w0) - 1 / (1 + w0)]||(3b)|
Equation (3b) restates Friedman's choice of PV in terms of its Bayesian components -- the prior odds w0 and the likelihood ratio LR. Seen in this conceptually explicit form, one may wonder whether (3a) is quite as simple as Friedman maintains.
Likewise the phrasing of Rule 401 hardly seems dispositive. Federal Rule of Evidence 401 states when E is relevant -- not how relevant it may be. Equations (1a) and (1b) supply equally suitable mathematical representations of this definition of relevance. For instance, evidence is relevant as that term is used in Rule 401 if its log-likelihood ratio is not zero.
In addition to the inability of these two criteria to single out (3a), the formula itself seems problematic. One of Friedman's arguments against other measures of PV is that they are insensitive to the prior probability.17 He suggests that the probative value of evidence ought to be small when the probability of the story without the evidence is close to one. The implicit assumption is that the measure of PV should justify the exclusion of evidence in these circumstances. This argument, however, is a double-edged sword. Suppose that E1 has a log LR of -3. From the likelihood perspective, E is moderately powerful evidence for the defendant. It supports SC a thousand times more18 than it supports S. If the defendant has two other items of evidence E2 and E3, each with the same log LR, this body of evidence could take the odds from a seemingly overwhelming billion for the plaintiff down to the indecisive posterior odds of one. Our measure of PV ought to recognize that such evidence is admissible absent strong counterweights. Yet, under (3a), it is not clear that this highly probative evidence would be admitted. As soon as the defendant offers E1, the plaintiff will point out that PV is barely distinguishable from zero.19 [--766--]
The search for an interpretation of PV should be guided by the uses to which such an expression will be put. Without a quantitative measure of prejudicial effect and the other counterweights of Rule 403, I do not see how a mathematical expression of PV could find direct forensic application. The only purpose that I can see is heuristic. A suitable formulation for PV may clarify our thinking about what it means to say that evidence is very probative, slightly probative, and so on. These rough quantifications are useful in performing the balancing required under Rule 403.
In making Rule 403 arguments, however, there is more than one way to proceed. On the one hand, we can seek a scalar measure of probative value, such as Friedman's. This will entail blending prior odds with the likelihood ratio in an expression such as (2) or (3b). On the other hand, we can continue to think of LR (and its ilk) as a measure of "intrinsic" probative value, and w0 as a statement of where we stand before considering the proffered evidence. Thus, we could argue that E has great probative value in the abstract (large, positive log LR), but little probative value in context, because the preceding evidence and background information already makes S extremely probable (large, positive log w0). It is not clear that having a single number to measure this contextual probative value is more analytically helpful than having a pair of numbers.20 Nor is it clear that a scalar PV fits more neatly into the phrasing of Rule 403.
For these reasons, the exclusive pursuit of a unique representation for PV may be a mistake. There may be more than one right answer to the question of measuring the probative value of evidence. If so, then in analyzing rules of evidence with mathematical machinery that includes a quantitative representation of PV, we should seek robust results -- results that do not depend on which member of the family of plausible expressions for PV we build into the analysis.21
* Regents' Professor and Fellow, Center for the Study of Law, Science, and Technology, Arizona State University, . Laurence Winer made helpful comments on a draft of this paper.
1. E.g., Brook, The Use of Statistical Evidence of Identification in Civil Litigation: Well-Worn Hypotheticals, Real Cases, and Controversy, 29 St. Louis U. L.J. 293 (1985); Milanich, Decision Theory and Standards of Proof, 5 Law & Hum. Behav. 87 (1981).
2. The relative paucity of legal writing on this topic is in contrast to the ample literature on the role of evidence in scientific theories, much of it involving functions that describe the weight of such evidence. E.g., P. Horwich, Probability and Evidence (1982); The Concept of Evidence (P. Achinstein ed. 1983). In the legal realm, the most influential work is Professor Richard Lempert's penetrating exposition of the likelihood ratio as a measure of probative value. See Lempert, Modeling Relevance, 75 Mich. L. Rev. 1021 (1977). The most thorough and sophisticated treatment of the likelihood ratio in this context is in the work of Professors David Schum and Ann Martin. See Schum & Martin, Formal and Empirical Research on Cascaded Inference in Jurisprudence, 17 Law & Soc'y Rev. 105 (1982).
3. Friedman, A Close Look at Probative Value, 66 B.U. L. Rev. 733 (1986).
4. One may ask whether the effort to quantify is a theoretical divertissement with no practical value. To be sure, the concern is theoretical, but as Alfred North Whitehead once remarked, "[i]t is no paradox to say that in our most theoretical moods we may be nearest to our most practical applications," quoted in I. Good, Probability and the Weighing of Evidence 31 (1950). Explicating a quantitative measure in the context of a clear theory of probative value can be of assistance in thinking about evidence even if the quantification is not an explicit part of courtroom practice. See Lempert, supra note 2. The effort is also essential to devising an intelligent rule governing the forensic presentation of statistical analyses. See Kaye, Is Proof of Statistical Significance Relevant? 61 Wash. L. Rev. 1333 (1986); Kaye, Hypothesis Testing in the Courtroom, in Contributions to the Theory and Applications of Statistics 331 (A. Gelfand ed., 1987).
5. In Kaye, Do We Need a Calculus of Weight to Understand Proof Beyond a Reasonable Doubt?, 66 B.U. L. Rev. 675, 679-61 (1986), I distinguish between the jth item of evidence Ej and the event Ej of its introduction in court. In this comment, I shall simply write E, leaving it to the context to indicate the applicable meaning.
6. McCormick on Evidence § 185 (E. Cleary 3d ed. 1984).
7. For a previous description of likelihood theory in a legal context, see Kaye, Book Review, 80 Mich. L. Rev. 833 (1982). A leading philosophically oriented treatment is A. Edwards, Likelihood (1972).
8. Notice that unlike probabilities, likelihoods need not lie between zero and one.
9. Notice the underlying assumption that S is more likely to be true if the evidence E is more likely to arise under S than under SC. That L(S;E) > L(SC;E) does not imply that S is true. The evidence E can appear even if S is false. It is simply less likely to appear under SC than under S.
10. Kenny and Oppenheim, Degree of Factual Support, 10 Phil. Sci. 307 (1952), argue for PV = (Pr(E|S) - Pr(E|-S)) / (Pr(E|S + Pr(E|-S)). This is the hyperbolic sine of half the ln(LR). See I. Good, Good Thinking 160 (1982).
11. This interpretation of PV also may be motivated from the perspective of information theory. See id. at 220-22; V. Barnett, Comparative Statistical Inference 200-301 (2d ed. 1982). Good calls log LR the weight of evidence, and has referred to it in at least 33 publications. I. Good, supra note 10 at 159.
12. The problem here is psychological, since there are as many real numbers between zero and one as there are real numbers above one. To make LR more symmetric, however, one could redefine LR by inverting it when considering defendants' evidence.
13. This explication of PV does not itself capture the intuitive notion of surprising evidence. For a Bayesian account of surprise, see P. Horwich, supra note 2, at 100-04.
14. Cf. Diaconis, Theories of Data Analysis: From Magical Thinking Through Classical Statistics, in Exploring Data Tables, Trends and Shapes 1, 27 (D. Hoaglin, F. Mosteller & J. Tukey eds. 1985).
15. Good, The Philosophy of Exploratory Data Analysis, 50 Phil. Sci. 283 (1983). Good gives an argument for the coefficient of 1/2 on the last term.
16. Friedman, supra note 3, at 733, 738.
17. E.g., id. at 733, 741-45.
18. In this example, I am using a base 10 logarithm.
19. Prior log-odds of 9 correspond to a p0 of nearly one. E1 adds log LR = -3 to these odds, resulting in log w1 = 6. The corresponding p1 is still very close to one. Hence, PV = p1 - p0 is approximately zero. To escape this embarrassment, one might argue that E1 should be admitted subject to "connecting up" with E2 and E3. This, however, is not an accepted application of the "connecting up" doctrine. Rather, such admission usually involves the promise to prove an antecedent fact that is needed to show that the evidence is material. See McCormick on Evidence supra note 6, § 58.