In the CollegeConfidential discussion of my blog post The Difficulty With Data, CC poster mihcal1 made the following compelling comment:
*
So basically, it’s a perfect setup for the Illusion of Validity
Why is MIT’s admissions process better than random? Say you weeded out the un-qualified (the fewer-than-half of applicants insufficiently prepared to do the work at MIT) and then threw dice to stochastically select among the remaining candidates. Would this produce a lesser class?
*
The link in mihcal1’s post takes you to an article from New York Times magazine by Daniel Kahneman. Kahneman is a pioneer of behavioral economics and the psychology of decision making. He is one of my favorite social scientists, and his work laid the foundation of inquiry for much of the social science research I love. </p>
<p>In his article, Kahneman describes his time working as a psychologist for the Israeli Army. They were tasked, among other things, with putting officer candidates through a series of challenges (an application, as it were) to test their leadership potential. They would watch the candidates as they completed challenges, and then they would predict how well they would succeed at officer candidate school.</p>
<p>According to Kahneman:
*
…as it turned out, despite our certainty about the potential of individual candidates, our forecasts were largely useless. The evidence was overwhelming. Every few months we had a feedback session in which we could compare our evaluations of future cadets with the judgments of their commanders at the officer-training school. The story was always the same: our ability to predict performance at the school was negligible. Our forecasts were better than blind guesses, but not by much.
…</p>
<p>I thought that what was happening to us was remarkable. The statistical evidence of our failure should have shaken our confidence in our judgments of particular candidates, but it did not. It should also have caused us to moderate our predictions, but it did not. We knew as a general fact that our predictions were little better than random guesses, but we continued to feel and act as if each particular prediction was valid. I was reminded of visual illusions, which remain compelling even when you know that what you see is false. I was so struck by the analogy that I coined a term for our experience: the illusion of validity.
*
Why, asked mihcal1, were we, as admissions officers, so sure that we were right in our decisions? What made us think our decisions would be better than random guesses? And how can we know?</p>
<p>This is a very good question to ask, and a very difficult one to answer.</p>
<p>Part of the reason it is so difficult to answer is because of the problems I discussed in the last post, which is basically: well, what makes our decisions “better”? How do we know if one applicant is “better” than the other?</p>
<p>We could cherrypick any number of metrics that would make the case in our favor. For example, over the last decade or so, our average applicant SAT score has gone up, and our average rate of admission has gone down. You might intepret this to say that we are admitting smarter students, and that we are doing a good job of recruiting applications too, so hey, we’re all going a pretty good job!</p>
<p>Of course, I think those are terrible metrics by which to measure an applicant or an admissions process. What matters isn’t raw SAT score, or how many people we can convince to apply. What matters is making sure that we bring smart students who feel at home here. Who love the community they are in. Who believe in the things that we do here at MIT and who will go out and change the world to be a better place.</p>
<p>As it turns out those things are much, much harder to measure.</p>
<p>Does this mean that our process is no better than random? That all we are doing is admissions shamanism, huddling in rooms behind closed doors before pronouncing our wisdom unto the world, when really we’re just guessing like all the rest?</p>
<p>I don’t think so. And there are a few reasons why.</p>
<p>One reason is to remember a fundamental limitation of social science, which is that it is situation dependent, and thus it is most usefully and reliably deployed for falsifying specific hypotheses rather than drawing conclusions across contexts.</p>
<p>For example, Kahneman cites research into decades of data which demonstrate that most stock pickers and fund managers basically do no better than random guessing would predict. This is a sort of easy question for the social scientific method. Hypothesis: variance in skill explains differences in performance between investment managers. Test: do stock pickers routinely perform better than random chance would predict? Result: mostly, no; hypothesis false, or at least shaken.</p>
<p>But it’s not clear that an admissions process is anything like picking stocks, so it’s also not clear that you can run the same sort of test and have it still be meaningful. Carrying an insight so loosely across contexts is a dangerous venture indeed. And this is actually a problem with the Army example too. What’s to say that the psychologists weren’t “better” at picking officers than their future commanders? Without measuring the judgments of the commanders, how could we know? And how would we measure it? </p>
<p>Obviously Kahneman thinks that some people (Israeli Armi commanders) are better at picking some things (future officers) than other people (inexperienced psychologists). So let’s approach this from another angle: what circumstances, according to Kahneman, might make you think that an expert is actually an expert; that a professional is actually good at their job, and not merely reproducing the random?</p>
<p>Quoth Kahneman:
*
True intuitive expertise is learned from prolonged experience with good feedback on mistakes. You are probably an expert in guessing your spouse’s mood from one word on the telephone; chess players find a strong move in a single glance at a complex position; and true legends of instant diagnoses are common among physicians. To know whether you can trust a particular intuitive judgment, there are two questions you should ask: Is the environment in which the judgment is made sufficiently regular to enable predictions from the available evidence? The answer is yes for diagnosticians, no for stock pickers. Do the professionals have an adequate opportunity to learn the cues and the regularities? The answer here depends on the professionals’ experience and on the quality and speed with which they discover their mistakes…Many of the professionals we encounter easily pass both tests, and their off-the-cuff judgments deserve to be taken seriously.
In other words, if you have a lot of experience, and if you have good, quick feedback on mistakes, then your intuition is likely to be better than random chance.
*
This, I think, characterizes our admissions office. In any given admissions committee, decades of admissions experience are bent towards examining a single applicant and all of the data we have about them. In fact, I must admit I laughed a little at Kahneman’s reference to “true legends of instant diagnoses are common among physicians”, because McGreggor Crowley, who directs our admissions process, is a physician, and if there is anybody who is legendary for his ability to “diagnose” an applicant, it’s him.</p>
<p>And we have good, rapid feedback too. We meet most students we admit soon after at CPW. We then spend four (or more) years living with them. They work in our offices. We advise them academically. We become friends (and frenemies) as the years go on. So we live and learn from our successes and our misjudgements.</p>
<p>(Of course, so much of what happens here is beyond our control anyway. We select for the potential to do awesome things, but it’s the students, not us, who will do them. So if there is any cognitive bias at work here, it’s probably not the illusion of validity but a fundamental attribution error).</p>
<p>Finally, there is the point that David made in his last blog post, which is essentially that there are many types of admissions processes, and that it doesn’t matter whether they are “fair” as much as it matters that they “work”, which is to say that they produce the sort of community that you aspire to be a part of.</p>
<p>I think there is a lot of truth in that. Fundamentally an admissions process is measured not by what it is but by what it does, which is of course to constitute a community. That doesn’t mean we aren’t reflective or analytical about the way we do things - in fact, we employ two statisticians specifically to run the data and tell us how to do things better!</p>
<p>But it does mean that the real standard which matters is whether the students, the faculty, and the world think that MIT students are awesome people who do awesome things, and that our students feel at home here. By this standard, I think our process does a very, very good job.</p>
<p>And that, my friends, is no illusion.