Quantifying Probative Value
D.H. Kaye ^{*} |

© 1986 by David H. Kaye. This article appeared in the Boston
University Law Journal, vol. 66, May 1986 / July 1986, pp. 761 - 766. |

Attorneys, statisticians, psychologists, political
scientists and philosophers have written at length about efforts to describe the
burden of persuasion in numerical terms.^{1} Less
attention has been paid to quantifying the probative value of courtroom
evidence.^{2} Professor Richard Friedman's paper,
*A Close Look at Probative Value*,^{3} reveals the
richness of this heretofore unexplored territory.^{4} Friedman
describes three conceivable measures [--762--]

of probative value, and he concludes that only one of these is suitable for legal analysis. It seems worthwhile to note the broad range of plausible alternatives.

*Likelihood Theory*

A litigant proposes to introduce an item of evidence E to help
prove his story S.^{5} Does E have sufficient probative
value to justify its admission? In part, the answer depends on how much probative value E
has.^{6} Likelihood theory suggests many closely related ways
of expressing this quantity.^{7} Suppose that S is the government's
story that John conspired to sell drugs, and that E is incontrovertible evidence that John
was in debt at the time he allegedly joined the conspiracy. Let S^{C} be John's story
that he never agreed to any narcotics deal. Then we may speak of Pr(E|S), the
probability of E on the condition that S occurred, and Pr(E|S^{C}), the probability
of E on the condition that S^{C} occurred. That is, we think of the evidence E
as analogous to the outcome of an experiment and consider the probability of its being
observed under competing hypotheses. Since these probabilities pertain to unique events,
they are personal probabilities.

Likelihood is distinct from probability. Instead of
estimating the probabilities of E given S and S^{C}, we seek a measure of the degree
to which E supports S. Calling this measure the likelihood of S, we write it as L(S;E),
and define it to be proportional to the probability of E given S. In other words,
L(S;E) = aPr(E|S), where
a is an arbitrary, positive
constant.^{8} The likelihood of S^{C} is L(S^{C};E)
= aPr(E|S^{C}). If S is more
likely to be true than S^{C}, then its likelihood is larger
than that for S^{C}.^{9}

One measure of how much more likely S is than S^{C}
is the likelihood ratio LR = L(S;E) / L(S^{C};E) = Pr(E|S) / Pr(E|S^{C}). If
S and S^{C} are equally likely, then LR = 1. The evidence supports each story
to the same degree. If LR > 1, then the evidence is more probative of S than of S^{C}.
Presumably, the plaintiff will introduce such evidence when it is available. If LR
< 1, then E supports S^{C} more than S. Defendant would be expected to produce such
[--763--]

evidence. We may adopt the magnitude of LR as an expression of probative value:

PV = LR | (1a) |

This definition has a simple multiplicative property. If E_{1} and E_{2}
denote the introduction of two items of evidence, then PV(E_{1} Ç
E_{2}) = PV(E_{1})PV(E_{2}).

Other measures of PV come to
mind.^{10} The most prevalent defines PV as the
logarithm of the likelihood ratio:

PV = log LR | (1b) |

With this definition, PV is additive: PV(E_{1} Ç
E_{2}) = PV(E_{1}) + PV(E_{2}).^{11}

One advantage of (1b) over (1a)
is that it measures PV on the same scale for plaintiffs and defendants. Plaintiffs'
relevant evidence will have log-likelihood ratios (log LRs) from 0+ to
¥.
Defendants' relevant evidence will have log-LRs from 0- to
-¥. In contrast,
the likelihood ratio itself allows plaintiffs' relevant evidence to have PVs
from 1+ to ¥,
but it compresses the PVs for defendants' relevant evidence
into the interval (0, 1).^{12}

Expressions like (1a) and (1b) do not refer to
the prior odds in favor of a story. They try to capture an idea of "intrinsic"
probative value of evidence. In (1a) or (1b), PV has the same value whether
the evidence is introduced in support of a story that already has been shown to
be highly likely or in support of an initially implausible story. PV is simply a
function of the evidence itself, and the order in which the evidence is
introduced has no effect.^{13}

*Bayesian Inference*

Likelihood plays an important role in classical and in Bayesian statistics. The Bayesian believes that rational partial beliefs conform to the probability [--764--]

calculus. Probability changes after contact with evidence via Bayes's Theorem:

Pr(S|E) / Pr(S^{C}|E) = LR Pr(S) / Pr(S^{C}).

Letting w_{1} stand for the posterior odds Pr(S|E) / Pr(S^{C}|E),
and w_{0} stand for the prior odds Pr(S) / Pr(S^{C}),
we have the more compact formula w_{1} = LR w_{0}.
In words, the posterior odds are given by the product of the
likelihood ratio and the prior odds. As usual, using logarithmic units changes
the multiplicative property to an additive one: log w_{1} = log LR + log w_{0}.
The posterior log-odds are the prior log-odds plus the probative value. Evidence
that is more likely under S than S^{C} raises the log-odds, while evidence
that is more likely under S^{C} lowers the log-odds. Thus, Bayesians
typically are content with (1b).

Again, there are variations. One might think that a good
explanation of evidence E is a story S such that Pr(E|S) is much greater than Pr(E),
but Pr(S) is not too small.^{14} This suggests that
we might quantify probative value as^{15}

PV = log Pr(E|S) - log Pr(E) + 1/2 log Pr(S) | (2) |

The difference between the first two terms measures how much S increases the probability of E. If the prior probability of S is very small, then the third term will be a large negative number, which will decrease PV.

For the Bayesian, then, probative value may be defined, as it was in likelihood theory, as a simple function of evidence E. In this way, the Bayesian separates the process of probability revision into two components -- the "intrinsic" probative value of the evidence, and the prior odds of the story (that come from other background information). Alternatively, and unlike the pure likelihood theorist, the Bayesian can adopt a measure of probative value, such as (2), that refers not merely to E but also to the starting point -- the prior probability or odds. Friedman pursues the latter course.

*Choosing the Right Expression*

Having indicated the impressive variety of possible measures of probative value, let us turn to the specific measure that Friedman adopts. He defines probative value as [--765--]

PV = Pr(S|E) - Pr(S) | (3a) |

This definition, he insists, follows from two criteria -- simplicity and
fidelity to rule 401.^{16}

I am not so sure. For brevity, let p_{0} represent
the prior probability Pr(S), and let p_{1} represent the posterior probability
Pr(S|E). Then we can rewrite (3a) as

PV = p_{1} - p_{0} =
w_{1} / (1 + w_{1}) - w_{0} / (1 + w_{0})

Since w_{1} = LR w_{0}, we conclude that

PV = w_{0}
[LR / (1 + LR w_{0}) - 1 / (1 + w_{0})] | (3b) |

Equation (3b) restates Friedman's choice of PV in terms of its Bayesian
components -- the prior odds w_{0} and the likelihood ratio LR. Seen in this
conceptually explicit form, one may wonder whether (3a) is quite as simple as
Friedman maintains.

Likewise the phrasing of Rule 401 hardly seems dispositive. Federal Rule of Evidence 401 states when E is relevant -- not how relevant it may be. Equations (1a) and (1b) supply equally suitable mathematical representations of this definition of relevance. For instance, evidence is relevant as that term is used in Rule 401 if its log-likelihood ratio is not zero.

In addition to the inability of these two criteria
to single out (3a), the formula itself seems problematic. One of Friedman's
arguments against other measures of PV is that they are insensitive to the prior
probability.^{17} He suggests that the probative
value of evidence ought to be small when the probability of the story without
the evidence is close to one. The implicit assumption is that the measure of PV
should justify the exclusion of evidence in these circumstances. This argument,
however, is a double-edged sword. Suppose that E_{1} has a log LR of -3. From the
likelihood perspective, E is moderately powerful evidence for the defendant. It
supports S^{C} a thousand times more^{18} than
it supports S. If the defendant has two other items of evidence E_{2} and E_{3},
each with the same log LR, this body of evidence could take the odds from a
seemingly overwhelming billion for the plaintiff down to the indecisive
posterior odds of one. Our measure of PV ought to recognize that such evidence
is admissible absent strong counterweights. Yet, under (3a), it is not clear
that this highly probative evidence would be admitted. As soon as the defendant
offers E_{1}, the plaintiff will point out that PV is barely distinguishable from
zero.^{19}^{ } [--766--]

The search for an interpretation of PV should be guided by the uses to which such an expression will be put. Without a quantitative measure of prejudicial effect and the other counterweights of Rule 403, I do not see how a mathematical expression of PV could find direct forensic application. The only purpose that I can see is heuristic. A suitable formulation for PV may clarify our thinking about what it means to say that evidence is very probative, slightly probative, and so on. These rough quantifications are useful in performing the balancing required under Rule 403.

In making Rule 403 arguments, however, there is more
than one way to proceed. On the one hand, we can seek a scalar measure of
probative value, such as Friedman's. This will entail blending prior odds with
the likelihood ratio in an expression such as (2) or (3b). On the other hand, we
can continue to think of LR (and its ilk) as a measure of "intrinsic"
probative value, and w_{0} as a statement of where we stand before considering
the proffered evidence. Thus, we could argue that E has great probative value in
the abstract (large, positive log LR), but little probative value in context,
because the preceding evidence and background information already makes S
extremely probable (large, positive log w_{0}). It is not clear that having a
single number to measure this contextual probative value is more analytically
helpful than having a pair of numbers.^{20} Nor
is it clear that a scalar PV fits more neatly into the phrasing of Rule 403.

For these reasons, the exclusive pursuit of a unique
representation for PV may be a mistake. There may be more than one right answer
to the question of measuring the probative value of evidence. If so, then in
analyzing rules of evidence with mathematical machinery that includes a
quantitative representation of PV, we should seek robust results -- results that
do not depend on which member of the family of plausible expressions for PV we
build into the analysis.^{21}

**NOTES**

* Regents' Professor and Fellow, Center for the Study of Law, Science, and Technology, Arizona State University, . Laurence Winer made helpful comments on a draft of this paper.

1. *E.g.*, Brook, *The Use of Statistical Evidence of
Identification in Civil Litigation: Well-Worn Hypotheticals, Real Cases, and Controversy*,
29 St. Louis U. L.J. 293 (1985); Milanich, *Decision Theory and Standards of Proof*,
5 Law & Hum. Behav. 87 (1981).

2. The relative paucity of legal writing on this topic is in contrast
to the ample literature on the role of evidence in scientific theories, much of it
involving functions that describe the weight of such evidence. *E.g.*, P. Horwich,
Probability and Evidence (1982); The Concept of Evidence (P. Achinstein ed.
1983). In the legal realm, the most influential work is Professor Richard Lempert's
penetrating exposition of the likelihood ratio as a measure of probative value. *See *
Lempert, *Modeling Relevance*, 75 Mich. L. Rev. 1021 (1977). The most thorough and
sophisticated treatment of the likelihood ratio in this context is in the work of Professors
David Schum and Ann Martin. *See *Schum & Martin, *Formal and Empirical Research
on Cascaded Inference in Jurisprudence*, 17 Law & Soc'y Rev. 105 (1982).

3. Friedman, *A Close Look at Probative Value*, 66 B.U. L.
Rev. 733 (1986).

4. One may ask whether the effort to quantify is a theoretical
divertissement with no practical value. To be sure, the concern is theoretical, but
as Alfred North Whitehead once remarked, "[i]t is no paradox to say that in our
most theoretical moods we may be nearest to our most practical
applications," *quoted in* I. Good, Probability and the Weighing of
Evidence 31 (1950). Explicating a quantitative measure in the context of a clear
theory of probative value can be of assistance in thinking about evidence even
if the quantification is not an explicit part of courtroom practice. *See*
Lempert, *supra* note 2. The effort is also essential to devising an
intelligent rule governing the forensic presentation of statistical analyses. *See*
Kaye, *Is Proof of Statistical Significance Relevant?* 61 Wash. L. Rev.
1333 (1986); Kaye, *Hypothesis Testing in the Courtroom*, * in* Contributions
to the Theory and Applications of Statistics 331 (A. Gelfand ed., 1987).

5. In Kaye, *Do We Need a Calculus of Weight to Understand
Proof Beyond a Reasonable Doubt?*, 66 B.U. L. Rev. 675, 679-61 (1986), I
distinguish between the jth item of evidence E_{j} and the event *E*_{j} of
its introduction in court. In this comment, I shall simply write E, leaving it
to the context to indicate the applicable meaning.

6. McCormick on Evidence § 185 (E. Cleary 3d ed. 1984).

7. For a previous description of likelihood theory in a legal context, see Kaye, Book Review, 80 Mich. L. Rev. 833 (1982). A leading philosophically oriented treatment is A. Edwards, Likelihood (1972).

8. Notice that unlike probabilities, likelihoods need not lie between zero and one.

9. Notice the underlying assumption that S is more likely
to be true if the evidence E is more likely to arise under S than under
S^{C}. That L(S;E) > L(S^{C};E) does not imply that S is true. The
evidence E can appear even if S is false. It is simply less likely to appear
under S^{C} than under S.

10. Kenny and Oppenheim, *Degree of Factual Support*,
10 Phil. Sci. 307 (1952), argue for PV = (Pr(E|S) - Pr(E|-S)) / (Pr(E|S + Pr(E|-S)).
This is the hyperbolic sine of half the ln(LR). *See* I. Good, Good
Thinking 160
(1982).

11. This interpretation of PV also may be motivated from
the perspective of information theory. *See id*. at 220-22; V. Barnett,
Comparative Statistical Inference 200-301 (2d ed. 1982). Good calls log LR the
weight of evidence, and has referred to it in at least 33 publications. I. Good,
*supra* note 10 at 159.

12. The problem here is psychological, since there are as many real numbers between zero and one as there are real numbers above one. To make LR more symmetric, however, one could redefine LR by inverting it when considering defendants' evidence.

13. This explication of PV does not itself capture the
intuitive notion of surprising evidence. For a Bayesian account of surprise, see
P. Horwich, *supra* note 2, at 100-04.

14. *Cf. *Diaconis, *Theories of Data Analysis: From
Magical Thinking Through Classical Statistics*, * in* Exploring Data Tables,
Trends and Shapes 1, 27 (D. Hoaglin, F. Mosteller & J. Tukey eds. 1985).

15. Good, *The Philosophy of Exploratory Data Analysis*,
50 Phil. Sci. 283 (1983). Good gives an argument for the coefficient of 1/2 on
the last term.

16. Friedman, *supra* note 3, at 733, 738.

17. *E.g., id.* at 733, 741-45.

18. In this example, I am using a base 10 logarithm.

19. Prior log-odds of 9 correspond to a p_{0} of nearly one.
E_{1} adds log LR = -3 to these odds, resulting in log w_{1} = 6. The
corresponding p_{1} is still very close to one. Hence,
PV = p_{1} - p_{0} is
approximately zero. To escape this embarrassment, one might argue that E_{1}
should be admitted subject to "connecting up" with E_{2} and E_{3}.
This, however, is not an accepted application of the "connecting up"
doctrine. Rather, such admission usually involves the promise to prove an
antecedent fact that is needed to show that the evidence is material. *See*
McCormick on Evidence *supra* note 6, § 58.

*Id.* §
185 at 546-47 n.35.