Aproveitando o dia do professor, aceito a provocacao do meu grande mestre para assuntos de estatistica Bayesiana (aprendi com ele que o adjetivo eh pleonastico) e probabilidae subjetiva (a parte objetiva eh outra historia), Carlos Alberto de Braganca Pereira, continuando na rede ABL nossas discussoes sobre o tema de randomizacao. No artigo anexo, - Decoupling, Sparsity, Randomization and Objective Bayesian Inference. Cybernetics And Human Knowing. Vol. 15, no. 2, pp. 49-68 tento responder a algumas das alfinetadas de meu mestre. Um abraco a todos, --- Julio Stern > Date: Fri, 16 Oct 2009 02:00:53 -0300 > From: cpereira@ime.usp.br > To: abe-l@ime.usp.br > Subject: [ABE-L]: Provocação > > Aproveitando o dia do professor e considerando que a ESAMP está próxima, coloco > aqui uma aula do DeFinetti sobre aleatorização. > Divirtam-se > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > Induction and Sample Randomization > Lecture XIII (Friday 27 April, 1979) > > Exchangeability and Convergence to the Observed Frequency > > I would like to discuss the relation between the concepts of random experiment > and exchangeable experiment. After all, there is only a lexical difference > between the two notions, which can be summarized as follows: the _expression_ > “equally probable events with unknown but constant probability,” used by the > objectivists does not make any sense from the subjectivist point of view, simply > because there is no such a thing as an unknown probability (the probability > being that which a certain person assigns at a certain time). > > However, what is typical of these cases is exchangeability: those cases in which > one speaks of independent events with unknown but constant probability are, in > fact, all cases of exchangeability. However, behind this terminological > difference lies a conceptual difference concerning the problem of inductive > inference. The objectivists do not answer this question satisfactorily and in > fact, they almost completely neglect it. Their argument goes as follows: since, > in the long run, frequency coincides with probability, in order to determine the > probability it is sufficient to observe a somewhat large number of experiments. > From the subjectivist point of view, this argument is unacceptable. Indeed, for > us subjectivists, probability cannot be determined empirically but it is > evaluated by everyone, at any instant, on the basis of one’s own experience. > Probability, in fact, changes with every new experience. > Suppose we are drawing from an urn containing white and black balls in unknown > proportions. Suppose, however, that we know that the percentage of white balls > is one of the following: 30%, 50%, 70%, 80%. I shall call the four possible > hypotheses about the percentage of white balls H1, H2, H3 and H4, respectively. > > Suppose that an initial probability is assigned to each of the hypotheses Hi > respectively. As we continue the draws, those probabilities change according to > Bayes’ theorem. In fact, the probability of the hypothesis that is closest to > the observed frequency undergoes an increase. And it is probable that certain > sequences obtain such that, in the long run, the probability of one of the > hypotheses Hi will get really close to 1. And the probability relative to a > single shot would be very probably very close to the observed frequency. > However, we must always bear in mind the influence of the initial probabilities > assigned to the hypotheses Hi.1 > > ALPHA: However, the subjective differences are always tempered by this > convergence. Therefore, the Bayesian method, provided that the condition of > exchangeability is satisfied, is in some sense a self-corrective method (to use > Reichenbach’s term).2 > > DE FINETTI: Yes, it is. Who uses this term? > > ALPHA: Reichenbach who, however, referred to the estimation of frequencies > rather than subjective probabilities. According to him an estimation rule is > self-corrective when the limit of the difference between the estimate obtained > with that rule and the observed frequency is 0. > > BETA: Hence, the subjective probability of one of the hypotheses converges to > the value 1 as the number of experiments grows. > > DE FINETTI: Yes, provided that it is borne in mind that all this does not hold > necessarily but depends on the premises (exchangeability).3 > > BETA: Let us suppose that there are three urns: the first one containing only > black balls, the second one only white balls and the third one half white and > half black. > > DE FINETTI: This is a very simple case. In fact, as soon as two balls of > distinct colors were drawn, it would be known with certainty which urn is being > used for the draws. If, on the other hand, only white or only black balls were > drawn, then— as the number of shots grows—the probability that the draws are > being made using the first of the second urn would rapidly increase. > > BETA: At the beginning, the probability reflects the personal state of mind of > whoever makes the evaluation. But then, as new draws are carried out, > differences among people’s opinions tend to disappear. Therefore, the growth of > knowledge leads the opinions to converge. > > DE FINETTI: Yes, the differences in the initial opinions have no other > consequence than delaying the preponderance of the observed frequency over the > initial opinion itself. > > Bayesian Statistics and Sample Randomization > > ALPHA: Let us now tackle the problem of the methods of random selection of > statistical samples. Savage, in this booklet, which you might be familiar with ... > DE FINETTI: What is the title? > > ALPHA: The Foundations of Statistical Inference4 Barnard and Cox, 1962). It is a > short summary of the course that Savage taught for the International > Mathematical Summer Centre in Italy (Savage, 1959). Immediately after that > course, as explained in the book, Savage went to London. > > DE FINETTI: OK, I understand: it is the report that Savage presented at the > conference in London. > > ALPHA: As Savage writes: “the problem of analyzing the idea of randomization is > more acute and at present more baffling, for subjectivists than for objectivists, > more baffling because an ideal subjectivist would not need randomization at all” > (Savage, 1962, p. 34). Perhaps Savage intended to say that the subjectivist, > since he should not neglect any piece of information, would have no reason to > resort to randomization by means of which some of the information available is > actually excluded. But, Savage continues, “[t]he need for randomization > presumably lies in the imperfection of actual people and, perhaps, in the fact > that more than one person is ordinarily concerned with an investigation.” > (ibid.) This sentence suggests a new argument supporting the rationality of the > randomization of statistical samples: thanks to the randomization, the > likelihood can be computed more inter-subjectively. In fact, the Bayesian method > produces the convergence when the likelihood is the same for everyone.5 But if > the draws are not randomized, then the likelihood varies, in general, from > person to person and this might preclude convergence. What is your opinion about > this justification of the use of randomization in the formation of statistical > samples? > > DE FINETTI: I seem to agree with this. But I should think more carefully about it. > > ALPHA: Savage adds: “the imperfections of real people with respect to subjective > probability are vagueness and temptation to self-deception ... and randomization > properly employed may perhaps alleviate both these defects.” (ibid.) Do you > believe that Savage’s analysis is correct or do you believe that there could be > other reasons that make rational the use of the randomization of samples? It > seems to me that the practice of randomization could be justified by means of > the need for the inter-subjectivity of science. A scientific community, in fact, > accepts a result when the majority of its members recognize its value. Is it > possible to use the method of randomization in order to facilitate the agreement > of many peoples’ judgments? > > DE FINETTI: The problem of the randomization of the samples has a mixed > character, as it does not have a probabilistic nature only. Randomization is a > measure that guards us from the instinctive tendency — which is often followed > bona fide —to fiddle the results. This can be done in many ways. For instance, > it can happen that a researcher excludes some abnormal piece of data thinking > that it might be the consequence of a typo or it might be due to a faulty > measurement. This would be legitimate if it turned out, for instance, that a > certain individual’s height is 170 meters: it would be reasonable to assume that > in reality the value of the height is 170 centimeters. But in other cases there > could be a tendency to alter the real data because it is considered unreliable. > Or there could be a tendency to round off. If many people in a sample turn out > to have a height of exactly 170 centimeters and very few people a height of 169 > or 171 centimeters, then it would be natural to suspect that a rounding off of > the data has taken place. Randomization is a procedure that guards the data from > some forms of manipulation and in particular, a biased selection of the sample. > > ALPHA: An observation that occurred to me at this moment is as follows. The > randomization of the sample makes it easier to determine the state of > information. Taking into account all the information that one possesses would be > a lot more complicated if the choice was not random. When the sampling is > random, the influence of many relevant pieces of information present on the > state of information of the single individuals is eliminated. > > DE FINETTI: Also those considerations need to be made cautiously. Suppose, for > instance, that despite the fact that the selection has been done correctly from > the point of view of precautions (re-shuffling, etc.), the sample turns out to > be decidedly skewed towards heights that are clearly too big. The suspicion > could then arise that this might be due to a systematic tendency to choose tall > people. In any case, the problem of the random selection of statistical samples > is a very complicated problem and I have never managed to find a completely > satisfactory solution to it. > > ALPHA: The problem consists in this: strictly speaking one should try to > maximize the quantity of empirical information, whereas with the random > selection, one intentionally deprives oneself of some information that could > turn out to be relevant. If it were known that one individual satisfies some > relevant property, this information should also be taken into account rather > than neglected because that individual does not belong to the randomly selected > sample.6 > > Editor’s Notes > 1. For precise details see Chapter 8. > 2. “The inductive procedure, therefore, has the character of a method of trial > and error so devised that, for sequences having a limit of the frequency, it > will automatically lead to success in a finite number of steps. It may be > called a self corrective method (or an asymptotic method)” (Reichenbach, 1949, > p. 446). Reichenbach points out (ibid., note 1) that C. S. Peirce had already > stressed in 1878, without however explaining the reason for it, the “constant > tendency of the inductive process to correct itself” (Hartshorne and Weiss, > 1960, vol. 2, p. 456). > 3. Important observation, often neglected: Bayes’s theorem alone is not > sufficient to guarantee the convergence. > 4. The book contains a contribution by Savage (1962). > 5. To put the matters in more Definettian terms, all random samples are > exchangeable and all stratified random samples are partially exchangeable. > 6. The problem of random samples has been addressed by many Bayesian authors. > See, for example, the following authors: Stone (1969); Rubin (1978); Swijtink > (1982); Kadane and Seidenfeld (1990); Spiegelhalter, Freedman, and Parmar > (1994); Papineau (1994); Berry and Kadane (1997); Frangakis, Rubin, and Zhou > (2002); Kyburg and Teng (2002); Berry (2004); Localio, Berlin, and Have (2005); > Worral (2007). > > > Carlos Alberto de Bragança Pereira <cpereira@ime.usp.br> Hotmail: Trusted email with Microsoft’s powerful SPAM protection. Sign up now. |
Attachment:
jmschk3.pdf
Description: Adobe PDF document