Allen in Wonderland:
Words in and Out of Context
©1999 D.H. Kaye
This paper supplements my comment Bayes, Burdens, and Base Rates, 4 Int'l J. Evid. & Proof 260 (2000) [hereinafter Kaye III], by replying to accusations made by Professor Ronald Allen in his comment, Clarifying the Burden of Persuasion and Bayesian Decision Rules: A Response to Professor Kaye, 4 Int'l J. Evid. & Proof 246 (2000) [hereinafter Allen III]. These accusations are not of sufficient intellectual importance to justify including them in my published response, but are serious in that amount to charges of academic misconduct. This paper therefore examines each count of Professor Allen's complaint.
Count 1. Quoting out of context to create a false impression (in Kaye I)
According to Professor Allen, in Statistical Decision Theory and the Burdens of Persuasion: Completeness, Generality, and Utility, 1 Int'l J. Evid. & Proof 313 (1997) [hereinafter Kaye I], I quoted him out of context to distort the plain and obvious meaning of his writing:
In detailing my sins, Prof. Kaye interestingly neglects the very next sentence in my original paper. Following the grandiose claims, I said: "If no deserving plaintiffs go to trial, the preponderance standard can only result in errors against defendants, a point, generalized, that makes the fashioning of algorithmic proofs about burdens of persuasion difficult." I thought it plain that this passage makes a point about the actual operation of the legal system rather than a point about mathematics, as it posits a counterfactual, and counterfactuals usually are employed to explicate some feature of the real world. By leaving this sentence out, Kaye converts what in context is obviously a true statement about reality into an erroneous statement about mathematics.
Allen III, at 248 (footnotes omitted).
To answer the charge of quoting out of context requires a recapitulation of what Professor Allen wrote and how I interpreted it. The exercise reveals that the omitted sentences did not change or clarify the meaning of the quoted material in any way.
In Rationality, Algorithms, and Juridical Proof: A Preliminary Inquiry, 1 Int'l J. Evid. & Proof 254 (1997) [hereinafter Allen I], Professor Allen wrote the following:
Evidence has also experienced the demise of legal theorems. The best example is the various proofs that employing the civil burden of persuasion of a preponderance of the evidence will minimize or optimize errors. These are all false as general proofs (although not as special cases), and all for the same reasons. They neglected base rates and the accuracy of probability assessments of liability, and virtually any relationship at all can exist between the subjective assessments of liability and the truth of factual assertions at trial. If for example, no deserving plaintiffs go to trial, the preponderance standard can only result in errors against defendants, a point, generalized, that makes the fashioning of algorithmic proofs about burdens of persuasion difficult.
Id. at 254-55. This is the full discussion. Nothing has been omitted. It struck me (and still strikes me) as reflecting a very confused understanding of the "legal theorems." On the best translation I can give it, this passage asserts three things:
In Kaye I, I pointed out that contrary to the first assertion in this passage, "there are no such proofs" that "the optimal decision rule minimizes the actual number of errors." Id. at 314. Professor Allen complains that I distorted his meaning by not quoting the "counterfactual" sentence about the third point -- even though that point is offered merely as an "example" of the more general statement that I quoted in full. Yet, I clearly recognized that "this passage makes a point about the actual operation of the legal system," as Professor Allen now emphasizes. I wrote: "Allen, it seems, would prefer a rule that minimizes the actual frequency of errors (or that produces a particular mix of errors of one kind as opposed to another)." Id. My parenthetical about the mix of errors was responsive to the portion of the passage -- "If for example, no deserving plaintiffs go to trial, the preponderance standard can only result in errors against defendants . . . " -- that I stand accused of ignoring.
Furthermore, nothing in this awkward sentence about some generalization of "no deserving plaintiffs" and "algorithmic proofs" alters the apparent meaning of the prior sentences. Professor Allen asks us to believe that he was not writing about the content of the "mathematics," but only about "some feature of the real world." The reality is that he was writing about both. He first referred to mathematical proofs "that employing the civil burden of persuasion of a preponderance of the evidence will minimize or optimize errors." Allen I at 254. Then he stated that these proofs are "generally false" because they neglected base rates and objective probabilities. I took him to task for suggesting that the proofs proved (or were intended to prove) something about the total number or the mix of actual errors. And, I insisted that "[t]he proofs remain true for all possible base rates" because "the optimal decision rule minimizes the expectation of some function of the losses . . . not the actual number of errors." Kaye I at 314.
It is unfortunate that Professor Allen has taken so long to explain that these remarks rested on a misunderstanding of what he meant to say. His first response, Reasoning and Its Foundation: Some Responses, 1 Int'l J. Evid. & Proof 343 (1997) [hereinafter Allen II], referred to no such misunderstanding. Had he conceded or acknowledged that the preponderance standard minimizes the expected total number of errors regardless of the value of the base rate P, it would not have been necessary for me to demonstrate the point so laboriously in Clarifying the Burden of Persuasion: What Bayesian Decision Rules Do and Do Not Do, 3 J. Evid. & Proof 1 (1999) [hereinafter Kaye II]. These papers take us to the second count in the indictment that Professor Allen has issued.
Count II. Quoting out of context to create a false impression (in Kaye II)
Professor Allen alleges a second instance of quoting out of context, this one in the fiftieth footnote in Kaye II. Having the luxury of space, I shall quote him, once again, in full. Then I provide the entirety of the pertinent text from Allen II, including the words that the indictment deletes.
Perhaps the passage [in Allen I] was ambiguous. The ambiguity was soon clarified, although again in a passage from which Prof. Kaye has excised the critical part. In responding to Prof. Kaye's initial criticisms, I gave various examples of regrettable and even bizarre consequences that could occur from the perspective of the system as a whole based upon the expected utilities of the fact finders. In his polemic, Prof. Kaye quotes the first two of the following sentences, but excises the next three, which obscures the meaning of the first two:
. . . I could show how lowering the standard of proof in criminal cases . . no matter what the relative disutility of erroneous verdicts for defendants and the state could reduce . . . 'expected losses.' I could also construct a world having the opposite effect. The reason I can do this is because the legal system has no interest in a fact finder's subjective expected utility. Rather its . . . concern is the operation of the system as a whole. Thus, it is perfectly understandable that the legal system (that is, those of us who construct it) may disagree with a fact finder's assessment of probabilities, and take action, system-wide to bring the implications of such assessments in line with our own.
Here I am plainly distinguishing between decision making by fact finders at trial, and the appraisal of that decision making by "those of us who construct" the legal system, and assert explicitly that "the legal system has no interest in a fact finder's subjective expected utility." Prof. Kaye neglects this point in his critique, and focuses instead solely on the decision by, and the expected utility of, the fact finder. All of his purported examples that show my mathematical mistakes are similarly examples of decision by the primary decision maker, not by someone else appraising that decision, which leads to Prof. Kaye's category mistake. My assertion was about the empirical reality of trials, not mathematics stripped from its relevant empirical context; it was about the actual operation of the legal system, not the internal logic of any algorithm. Transposing my comments about the real world into the mathematical world that Prof. Kaye wishes to discuss does indeed transmute them into "sheer fantasy," as he alleges, but the transmutation comes from Prof. Kaye taking statements out of the context that defines them and gives them meaning.
Allen III, at 248-49 (footnote omitted).
To decide whether the charge of misrepresentation is sound, we must ask what Professor Allen was saying about "the mathematical world," what he was saying about "the real world," and what he proposed about their intersection. The full discussion of the "legal theorems" in Allen II follows. The portions that were reproduced in Kaye II are in black; the sentences that I omitted and that supposedly "define[] them and give[] them meaning" are in green; the portions that neither Allen nor I chose to reproduce are in purple:
Prof. Kaye asserts that no matter what the base rate, his theory of expected losses applies equally well, and that it has nothing to do with the number of errors, so long as ‘every erroneous for a plaintiff entails the same loss as every erroneous verdict for a defendant. If this were true, it would be astonishing. On the basis of very little substantive knowledge -- all you know is a little algebra and that 'every erroneous verdict for a plaintiff entails the same loss as every erroneous verdict for a defendant' -- a general decision making algorithm appears that will maximize your expected utility, and it has nothing to do with error minimization, as I in a burst of silliness, suggested. Really? Compare two worlds, one in which there are 100 errors and one in which there are 101. In which world, in Prof. Kaye's terms, would we have a greater expected loss? Remember that we know nothing about the actual distribution of errors or their size, because Prof. Kaye's world is one largely devoid of substantive knowledge. Obviously we would have greater expected loss in the world with 101 errors. But this means that to reduce expected errors, you have to reduce errors, which is exactly the point that Kaye so severely criticizes me for suggesting. Even more remarkably, after roundly criticizing me for making such a silly point, Kaye buries away in a footnote exactly the same point: For a loss function that gives equal weight to errors favoring plaintiffs and defendants, the expected loss is proportional to the expected number of errors'. 'Directly proportional' would be more accurate, but there is no reason to quibble over words.
Even more remarkable is Prof. Kaye's assertion that his 'proofs remain true for all possible base rates.' Remember what the proof is -- it is a proof that a certain rule, preponderance of the evidence, will minimize expected losses. I asserted it is true in only a limited number of situations. He says 'The proofs remains true for all possible base rates.' We have already established that we use words differently, so perhaps I misunderstand what 'true' means. Let me be clear why I think this is false. Consider a world in which no deserving defendants go to trial, and for some deserving plaintiffs the fact finder assesses the likelihood of their case [sic] to be .5 or less. All such cases are errors, offset by no competing errors for the defendant. In this world, is the .5 rule 'optimal'? Obviously not. Lowering the standard can only reduce the total number of errors and thus the total expected loss (although one would have to worry about secondary consequences). Thus, the assumptions underlying Kaye's proof turn out to be quite rigorous. The base rates and the assignments of probabilities have to be in particular relationships in order for any rule to minimize expected losses. In the infinite number of worlds in which these relationships do not hold, expected losses will not be minimized.
I can play this game with virtually any burden of persuasion and utility function. For example, I could show how lowering the standard of proof in criminal cases (yes, 'lowering'), no matter what the relative disutility of erroneous verdicts for defendants and the state, could reduce (yes, 'reduce') 'expected losses.' I could also construct a world having the opposite effect. The reason I can do this is because the legal system has no interest in a fact finder's subjective expected utility. Rather its (if I may reify it) concern is the operation of the system as a whole. Thus, it is perfectly understandable that the legal system (that is, those of us who construct it) may disagree with a fact finder's assessment of probabilities, and take action, system-wide to bring the implications of such assessments in line with our own. This was the (I thought obvious) point I was making in my somewhat casual introduction that Kaye chooses to examine with such care. In applying the algorithms of Kaye to the real world, we quickly see that they are of limited utility. There are an enormous number of incentives operating on litigants in such a way that one could readily believe that the base rates of deserving plaintiffs and defendants are incommensurate and that fact finders' appraisals of probability are skewed one way or another, and do not result in nice, normal curves.
There is much confusion in this discussion, but the only question here is whether the three green sentences transform the meaning of passages like the following from statements about a well-known statistical theory into statements about "the real world":
Prof. Kaye asserts that no matter what the base rate, his theory of expected losses applies equally well, and that it has nothing to do with the number of errors, so long as 'every erroneous for a plaintiff entails the same loss as every erroneous verdict for a defendant. If this were true, it would be astonishing. [A] general decision making algorithm appears that will maximize your expected utility, and it has nothing to do with error minimization, as I in a burst of silliness, suggested. Really?
Even more remarkable is Prof. Kaye's assertion that his 'proofs remain true for all possible base rates.' Remember what the proof is -- it is a proof that a certain rule, preponderance of the evidence, will minimize expected losses. I asserted it is true in only a limited number of situations. He says 'The proofs remains true for all possible base rates.'
If Professor Allen was speaking of what he generously called my "proofs" and my "theory of expected losses," then he was speaking of mathematics. The algebra involved plainly shows that expected loss is minimized for all values of the base rate P. Neither Allen I nor Allen II evinces the slightest awareness of that mathematical result. Instead, as discussed at length in Kaye II, these papers (and these particular sentences) display great concern for actual error rates, which are affected by the value of P and have no necessary relationship to minimizing expected errors. This concern is appropriate, but it does not affect the meaning of sentences like these:
[T]he assumptions underlying Kaye's proof turn out to be quite rigorous. The base rates and the assignments of probabilities have to be in particular relationships in order for any rule to minimize expected losses. In the infinite number of worlds in which these relationships do not hold, expected losses will not be minimized.
Allen III offers a rationalization of this misinterpretation of "Kaye's proof." Professor Allen maintains that when he wrote that "the legal system has no interest in a fact finder's subjective expected utility," he was making it clear that he never denied that a Bayesian decision rule maximizes the decisionmaker's expected utility. Apparently, the obvious meaning that would have emerged if only Kaye II had included the three green sentences in footnote 50 is that all the talk in Allen I and Allen II of "proofs" being "generally false" and "neglecting base rates" merely meant that even though the Bayesian decision rule always maximizes the expected utility of a factfinder in a given case, it does not necessarily maximize the expected utility of "someone else appraising that decision." Professor Allen once sneered at the idea that "a general decision making algorithm appears that will maximize your expected utility." Allen II at 346 (emphasis added). He now insists that he did not mean "you" as "the decisionmaker," but "you" as "someone else." Lewis Carroll's Humpty Dumpty, who could make words mean whatever he chose, would appreciate this exegesis. And, it may be true. Still, I must resist the charge that I took the sentence quoted in footnote 50 out of context. The sentence follows extended quotations from Allen II, and Professor Allen's position as to the way the rule might operate in "the real world" is presented clearly and fairly in Kaye II.
Count III. Using "literal truths" to mislead
The final charge is that I grossly misrepresented Professor Allen's writing (although I was not speaking of him in particular) when I wrote the following:
One difficulty with the remarks of many jurisprudential skeptics is that they neither offer nor defend any specific competing interpretations.
and
Claims that the axioms are peculiarly inapposite to legal fact finding typically do not consider the axioms themselves and the justifications that have been offered for them.
Professor Allen takes these remarks as an attack on his writing style and an attempt to distort by means of a "literally true" but functionally false statements:
I fear this is further evidence that my writing style is exceedingly opaque, for these were just the two main points of my original paper. Admittedly, I employed John Earman's discursive rendition of Savage's postulates rather than the postulates themselves. Employing Earman's descriptions rather than Savage's (or someone else's) "postulates" leaves Prof. Kaye's lament literally true (I did not use anyone's formal statement of their postulates) although so misleading as to be false (I used logical equivalents of their postulates expressed in understandable English). The risk, of course, is of a serious misimpression in anyone not fully conversant in the vocabulary of the literally true statement.
Allen III, at 253 (footnotes omitted).
Inasmuch as my statements were not directed specifically at Professor Allen, and my observations pertained merely to "many" skeptics and a "typical" feature of their arguments, I could observe that even if what he says of his paper is true, it does not negate my characterization of the literature. However, that would be the kind of "literal" defense of truth that Professor Allen seems to find odious, so I shall consider whether Allen I possesses the features that I noted.
I think it does. As far as I can tell, Allen I contains no "specific competing interpretations" of how decisions should be made under conditions of risk. Allen III proposes that "human reason and judgment operating upon the vast amount of information obtained throughout the years by any sentient human being, employing a large array of tools, including utilities, beliefs, and arithmetic" will do the trick. That may be, but I would not call this as "a specific competing interpretation."
It also remains difficult to discern which axiom or axioms he rejects. Is it connectedness, transitivity, independence, or normality? Perhaps it is all of them. He explains that he uses "John Earman's discursive rendition of Savage's postulates," which are "logical equivalents . . . expressed in understandable English." But Allen I merely cited to two pages of Earman's book, and these two pages make no effort to restate the postulates of SEU theory, discursively or otherwise. See John Earman, Bayes or Bust 56–57 (1992). Rather, they discuss the implications of the fact that "actual inductive agents . . . lack the logical and computational powers required to meet the Bayesian norms." The only "norm" mentioned is "probability axiom (A2)." Id. at 56. This is a reference to the axiom of probability theory that holds, as Earman puts it, that P(A) = 1 if "A is valid in the sense that A is true in all models of all possible worlds." Id. at 36. Does Allen mean to claim that the probability of a tautology should not be taken to be one?
More likely, he just follows Earman (and others) in pointing out that "by their very nature these [actual] agents fall short of the logical omniscience that requires recognition of all logical truths in the domain of Pr." Id. at 56. In short, like many others who have questioned the applicability of Bayesian decision theory, Allen does not indicate which axioms are unacceptable. He argues, correctly, that it is too hard for jurors to be perfect Bayesians. That is an argument that I discussed. We may disagree as to implications of this fact for the use of the theory in the legal domain, but this does not undermine my observation that although "[i]t is easy enough to find disagreement about the plausibility of certain axioms in the philosophical literature," "[w]hat is less obvious is what feature of adjudicative factfinding makes any particular postulate less plausible in law than in other fields." Kaye II, at 20 n. 54.
Summation
The words that Professor Allen has used to describe certain mathematical analyses of the properties of decision rules such as the preponderance standard invite misunderstanding. Contrary to what Professor Allen has written or implied, there are no mathematical proofs that purport to show that the preponderance standard minimizes actual errors, or that it minimizes expected errors where the expectation is computed by someone other than the decisionmaker. Moreover, in the same year that Professor Allen mischaracterized the "legal theorems" as efforts to establish that the total number of actual errors are minimized, he wrote that there is an "algebraic proof" that the preponderance standard gives an equal mix of actual errors. Not only does this misapprehend the proof (see D.H. Kaye, The Error of Equal Errors (1999)), but it is inconsistent with the characterizations of the preponderance standard in Allen I, Allen II, and Allen III. This inconstancy makes it difficult to discern what Professor Allen really thinks about the mathematics of the "legal theorems" that he finds wanting. His latest position seems to be that the preponderance minimizes expected loss (as computed by the factfinder) regardless of base rates and the quality of the factfinder's personal probabilities, but that this is ultimately of little interest to the legal system.
Kaye I and Kaye II criticize Allen I and Allen II for their descriptions of the decision-theoretic analysis of the preponderance standard. Allen III argues that the criticism is unfair because it quotes Professor Allen out of context and uses "literal truths" to mislead its readers. I have tried to reassure those readers that the criticisms were fair responses to the reasonable constructions of Professor Allen's words. It may be that Professor Allen meant something quite different by these words. Although, as Alice replied to Humpty Dumpty, the question is whether words can have so many meanings, in the end, none of this bickering about Professor Allen's choice of phrases and sentences has much bearing on whether Bayesian decision theory advances our understanding of the burdens of persuasion. On that issue, the reader must examine a thick and difficult literature that grows out of a few remarkably simple and powerful ideas.