MIT reveals data discrediting the Revealed Preferences rankings

<p>MIT admissions posted, in passing, the following data on its blog site. The posting was last year but just percolated into CC through a comment the other day on the MIT board.</p>

<p>
[quote]
going by last year's mutual [Caltech and MIT] admits, it's about 19% to Caltech, 64% to MIT, and 17% to a different college altogether. The response rate to our study was 92% (1390 of 1508 admits responding) so it should be fairly accurate. </p>

<p>Posted by: Ben on March 15, 2006 03:56 PM

[/quote]
</p>

<p>Wow!</p>

<p>The Revealed Preferences paper, often cited here by Harvard cheerleaders, ranked universities by cross-admit decisions of a sample of a few thousand elite students. It had Caltech second among all universities, ahead of MIT (#4), with an estimated probability above 80 percent that the Caltech > MIT ranking (if not the specific numerical weights leading to that ranking) is correct. The sole rationale for the rankings is that they are derived from cross-admit data, so this looks pretty bad.</p>

<p>I will post some more technical comments later on what else those numbers reveal about Revealed Preferences, but to get a 3:1 advantage so wrong at the very top of the rankings (which were supposed to be the most stable, or the only stable, part of the list) seems rather fatal.</p>

<p>It was fun while it lasted.</p>

<p>I'd be also very interested in reading similar posts by the Bens of other colleges. Your comments just remind me how easily a methodology adopted by authors of the Revealed Preferences paper can go flawed.</p>

<p>Unfortunately you don't understand the methodology used by the authors of that paper. It's more complicated than that.</p>

<p>Exactly what misunderstanding of the methodology are you talking about? If you understand the mathematics that they use, the implications of the Caltech-MIT data point are quite striking.</p>

<p>
[quote]
I'd be also very interested in reading similar posts by the Bens of other colleges.

[/quote]
</p>

<p>Here is a linked article on Yale admissions showing that the Revealed Preferences ranking also mispredicts Yale vs Stanford. The study says 84 percent probability that Yale ranks higher, and about a 60 percent victory rate for Yale. The article, which appeared in November 2000, reveals that in the same admissions cycle that the Revealed Preferences study cohort belonged to (high school class of 2000, which applied before the dot-com bust), Stanford beat Yale in cross-admits. </p>

<p>That is as damaging to the credibility of the RP ranking as the large Caltech vs MIT discrepancy. While Caltech may be a special case with a small number of applications, HYPS are the schools that get the most applications (hundreds from the Revealed Preferences cohort), they are directly comparable with highly overlapping applicant pools, they produce the most cross admit battles in the study, and are the group of schools for which there is the best chance to potentially reach a stable ranking. If the rankings already break down in this range, they are unstable everywhere, just as one would expect from the mathematics.</p>

<p><a href="http://www.yalealumnimagazine.com/issues/00_11/admissions.html%5B/url%5D"&gt;http://www.yalealumnimagazine.com/issues/00_11/admissions.html&lt;/a&gt;&lt;/p>

<p>
[quote]

After all the pitches have been made, Yale manages to convince two-thirds of its admitted students to attend. What happens to the other third? The majority are lost to a handful of other colleges, most often Harvard, Princeton, MIT, and Stanford. Harvard has long won the majority of students admitted to both Yale and Harvard, but Yale has traditionally won the competition for "common admits" with other colleges. Recently, however, Yale has begun to lose more Stanford common admits than it wins, particularly students from the Western states. "The dot-com world is a big part of the draw for Stanford," concedes Shaw

[/quote]
</p>

<p>posterX, why don't you educate me why one can't call a methodology flawed when its conclusion contradicts the truth? Do I care to claim that I am not fluent in the math? I do admit though Ben of MIT might be citing their 2006 data only.</p>

<p>BTW r u somehow related to the authors of the paper? (kidding)</p>

<p>MIT2011, see posting above for data on applicants in the same year as the RP study (and apparently also the late 1990's), for Yale vs Stanford. It will be fun to see if the RP fanboys can defend the "methodology" when most of them don't understand the methods to begin with.</p>

<p>I'm very familiar with the methodology, but no, not related to any of the authors. My point is only that your comments about Caltech versus MIT have no relation to the methods of the study you are attacking. You are just complaining, and trying to make a mountain out of a few grains of sand, or more likely, just trying to prop up MIT.</p>

<p>Unfortunately you are false on both accounts, re "make a mountain out of a few grains of sand, or more likely, just trying to prop up MIT". All I know is that the RP fits neither Y/S nor M/C. Shall I say the RP theory is yet to be proven with true data? Am I attacking? lol. As far as proping up MIT, why? Do you have to prop up any of HYPS?</p>

<p>And you know what, I think you'd be good at politics because you always try to figure out others' agenda or motivation.</p>

<p>
[quote]
I'm very familiar with the methodology ... your comments about Caltech versus MIT have no relation to the methods of the study you are attacking.

[/quote]
</p>

<p>As it happens, I also am very familiar with the methods used. Care to explain your comments on how the data on Caltech, its #2 ranking, and its cross-admit battles with MIT are but "a few grains of sand", unrelated to the merits of the study?</p>

<p>This thread is sort of pointless.</p>

<p>Considering all of the disclaimers that are readily stated in the article,</p>

<p>
[quote]
We reemphasize that we use the College Admissions Project data to construct an example of a revealed preference ranking. If we had had much greater resources, we would have surveyed a more fully representative sample of students in the United States.

[/quote]

[quote]
Among the top ten colleges, we generally enjoy confidence of about 75 percent that a college is ranked higher than the college listed one or two below it.

[/quote]

[quote]
Because the students are drawn from schools that send several students to selective colleges each year, the students in the sample are probably slightly better informed than the typical high aptitude applicant.

[/quote]
</p>

<p>you stating that the results may not reflect reality is hardly shocking. As the authors loosely quantify, I should not be shocked that up to 25 percent of the rankings are slightly off!</p>

<p>Though, of course, you do have a point that people shouldn't use revealed preference's data to support their argument, but you didn't need to find any external data to make that assertion; the authors of the papers readily pointed this out for you and everyone else.</p>

<p>As an aside,


is a little deceptive of a statistic because of the vast disparity in student body size. Of course MIT is going to attract more accepted students; it is a much more broad institute. The fact of the matter, though, is that it doesn't (significantly) affect the quality of, say, the Caltech class, so it's a pretty irrelevant statistic.</p>

<p>Let me show you what I mean. Say there are 300 cross-admits between Caltech and MIT, which is a pretty reasonable assumption considering the overlap between the two school's interests. </p>

<p>Now, Caltech will get 57 of these students, and MIT will get 192, while the rest head off to Harvard (I kid, but only slightly).</p>

<p>At first this looks terrible for Caltech, but consider that Caltech only enrolls 200 students per year compared to MIT's enrollment of 1000. Caltech actually ends up with a higher percentage of its class (57/200 = 28.5 %) of common admits than MIT does (192/1000 = 19.2 %). Really, as long as Caltech gets 1 common admit student for every 5 that MIT gets, Caltech is breaking even in terms of getting the students both it and MIT want, which I think few people actually understand.</p>

<p>Hence, if you're really looking for a flaw in revealed preferences methodology, this would be it; revealed preferences don't measure the resulting quality of the student body so much as is a popularity contest among high-school seniors.</p>

<p>
[quote]
Considering all of the disclaimers that are readily stated in the article,

[/quote]
</p>

<p>I don't think it's nearly as modest as you suggest, nor understood in any disclaimed way by the RP fans who repost the link in these boards. Anyway, I would like to hear posterX's (or anyone else's) views about the technical issues.</p>

<p>
[quote]
Caltech actually ends up with a higher percentage of its class (57/200 = 28.5 %) of common admits than MIT does (192/1000 = 19.2 %).

[/quote]
</p>

<p>That's evidence of Caltech having a stronger student body (a higher proportion of students able to gain admission at both schools), but that does not change the interpretation of the cross-admit tournament. For whatever reasons, MIT is more desirable to most of the dual admits and thus wins at a rate of 64 to 19. </p>

<p>It is similar with MIT and Harvard: MIT has stronger students overall, and so probably a higher proportion of dual admits, but Harvard has more ability to matriculate those double admits. At least, I have seen claims that it does in other thread, but those may have been just a reference to the Revealed Preference ranking. Direct data on Harvard vs MIT would be interesting.</p>

<p>PosterX is right in saying the methodologies are different. </p>

<ol>
<li><p>The ranked preferences study is a survey of people who are (hopefully) randomly selected. In this way, they are meant to reflect general preferences in the age group deciding between colleges.</p></li>
<li><p>Cross-admit statistics reflect the choices of people who are actually admitted to multiple colleges. </p></li>
</ol>

<p>The assumption of the ranked preference authors is that, on the whole, group 1 will act more or less like group 2. However, as the discussion shows, the two groups can and do act quite differently. </p>

<p>There are lots of ways to interpret these differences. The more positive view is that the ranked preference study is a good way to understand how different colleges are ranked generally by people similar to those making the actual decisions. A negative view is to dismiss the ranked preferences study due to evidence that it is not useful in predicting crossmatch statistics.</p>

<p>As I see it, the ranked perspective has some value for estimating general views but there is a need for someone to examine how accurate it is against actual crossmatch choices (though that would mean the colleges would have to agree to disclose the data, which might be hard to get them to do). I am sceptical about the estimate of 75% predictive value. Even if that is close for the group of people surveyed as a whole, it may be quite wrong for the people crossmatched at specific colleges.</p>

<p>....is there an inherent assumption in the RP paper being that all
'choosers' are equally weighted ....?</p>

<p>where do the IMO'ers, RSI's and SIEMENS-finalists like to go....
...is that going to be a different result than other types of
'choosers'...</p>

<p>I'm not going to pretend to have read the revealed preferences methodology, but I figure to rank such an extensive list of colleges, one school being listed above another should work something like this:</p>

<p>% of cross-admits to school A and school X won by school A</p>

<p>vs.</p>

<p>% of cross-admits to school B and school X won by school B</p>

<p>It is impossible to utilize head to head matchups between schools to formulate such a ranking due to natural matriculation advantages.</p>

<p>Examples: </p>

<ol>
<li><p>Cross-admits to (insert Ivy league college) and MIT would likely have the type of math background required to be admitted to MIT and therefore more likely to pursue hard sciences and attend MIT.</p></li>
<li><p>Stanford focuses much of its recruitment effort in California (40%+ of Stanford admits are California residents). The large number of instate admits ensures an advantage against east coast schools in the cross-admit battle. Against Caltech however, Stanford may lose the cross-admit battle, even if Caltech (assuming it doesn't pursue the same California focused recruitment strategy) loses the cross-admit battle to the same east coast schools Stanford beats.</p></li>
</ol>

<p>siserune,</p>

<p>As for the article in the Yale Alumni Magazine, it was written in 2000. The dot-com buzz has since died down which could account for a change in preferences.</p>

<p>Even among elite universities, from year to year different schools are "fashionable" and cross-admit victories in one year do not necessarily indicate a long term trend.</p>

<p>
[quote]
The assumption of the ranked preference authors is that, on the whole, group 1 will act more or less like group 2.

[/quote]
</p>

<p>That isn't the problem revealed by the MIT and Yale/Stanford data. If group 1 (students in general) and group 2 (cross-admits) behave differently, this is just a semantic issue of how to describe the results of the study. The problem is that the study is unreliable, and apparently wrong in many ways, in "revealing" the preferences of its study cohort (group 2, as you put it). </p>

<p>re: the dot-com boom influencing the numbers, the Yale/Stanford numbers were given for the same year of students sampled in the Revealed Preferences study -- the cohort that graduated in spring 2000 and applied to college in late 1999.</p>

<p>
[quote]
MIT reveals data discrediting the Revealed Preferences rankings

[/quote]
</p>

<p>Uh, I hate to tell you this, but I believe you have misunderstood how the RP data was constructed. </p>

<p>
[quote]
The Revealed Preferences paper, often cited here by Harvard cheerleaders, ranked universities by cross-admit decisions of a sample of a few thousand elite students

[/quote]
</p>

<p>Right there is the first and most crucial error, an error that has been made by numerous people here on this board. The RP paper NEVER says that they ranked universities by cross-admit decisions. Not at all. Instead, the RP paper models what their sample of students would have done if they were forced to choose between 2 schools. Actual choices have nothing to do with the paper. For example, most people won't even apply to both schools of a particular pair, simply because nobody goes around applying to the entire set of 100 or so schools that were measured in the ranking. In fact, the very act of choosing not to even apply to a particular school is, by itself, a key aspect of revealed preference. {The very act of not applying to a particular school reveals one of 2 possible pieces of information, #1, that you don't think that you can get into that school, or #2, that you wouldn't go to that school even if you did get in. It is #2 that carries information on revealed preference, and the way to separate #1 and #2 is to see whether you applied to schools that are just as selective as the one in question, and if you did, you can rule out #1.} And even if you do apply to both schools of a particular pair, that's not to say that you will actually get into both. </p>

<p>In fact, if the reporting of cross-admit data was all the RP study did, then frankly, it would not be a very interesting study. The entire "value-add" of the paper is in the model - in the attempt to model what people WOULD have chosen if forced to pick between 2 options. It says nothing about whether those choices actually came to fruition. </p>

<p>
[quote]
It had Caltech second among all universities, ahead of MIT (#4), with an estimated probability above 80 percent that the Caltech > MIT ranking (if not the specific numerical weights leading to that ranking) is correct. The sole rationale for the rankings is that they are derived from cross-admit data, so this looks pretty bad

[/quote]
</p>

<p>I would hardly say that this looks "pretty bad" in the least. You can invoke a very simple explanation - that a lot of people who would choose Caltech don't even get into MIT (but would still have chosen Caltech even if they did get into MIT). The fact that they didn't get into both schools means that they aren't counted as cross-admits. Another explanation is that plenty of other students didn't even apply to either MIT or Caltech but if forced to pick between one or the other, the model indicates that they are more likely to pick Caltech. </p>

<p>
[quote]
I will post some more technical comments later on what else those numbers reveal about Revealed Preferences, but to get a 3:1 advantage so wrong at the very top of the rankings (which were supposed to be the most stable, or the only stable, part of the list) seems rather fatal.

[/quote]
</p>

<p>Like I said, it's not wrong or fatal, it's just a different methodology. Cross-admit data looks at actual cross-admit information, but completely misses information about those students who don't get admitted to both schools, or who don't even apply to both schools (which comprise most students). In other words, cross-admit data inherently samples on the dependent variable and hence is inherently highly biased.</p>

<p>
[quote]
That isn't the problem revealed by the MIT and Yale/Stanford data. If group 1 (students in general) and group 2 (cross-admits) behave differently, this is just a semantic issue of how to describe the results of the study. The problem is that the study is unreliable, and apparently wrong in many ways, in "revealing" the preferences of its study cohort (group 2, as you put it).

[/quote]
</p>

<p>Uh, no, this is not just a 'semantic issue of how to describe the results'. It is the very heart of the study itself. The study is NOT unreliable, unless you have actually found a flaw in the methodology. </p>

<p>Look, nobody is saying that the RP study is a perfect study. Not even the authors claim it to be so. But we at least have to understand exactly what it means.</p>

<p>Replying to sakky:</p>

<p>
[quote]
The RP paper NEVER says that they ranked universities by cross-admit decisions.

[/quote]
</p>

<p>The RP paper says it many times over, since it's obviously true: they take a list of cross-admit decisions (i.e. for each student in the sample, specify the list of schools that accepted the student and which of those was selected), and from that information alone, produce a ranking. That is what it means to "rank universities by cross-admit decisions".</p>

<p>
[quote]
Instead, the RP paper models what their sample of students would have done if they were forced to choose between 2 schools.

[/quote]
</p>

<p>No. The RP paper models what those students might have chosen from ANY GIVEN MENU of schools (not just 2) out of the full set of about 110. This is an important point, because it constrains the possible models that can be used. </p>

<p>
[quote]
Actual choices have nothing to do with the paper.

[/quote]
</p>

<p>Actual choices are the absolute CRUX of the paper. What differentiates RP from all the other rankings is that it makes predictions (be they good or bad) about an observable that exists outside the model: the cross-admit rates. Otherwise the whole thing would be pointless. They could have used any of a million different summary statistics based on the oh-so-unmanipulable cross admit data, and it would have been no better than the US News rankings where somebody publishes a list according to some arbitrary formula. The RP claim to fame is that it measures a very intuitive and appealing notion of quality ---- relative desirability as demonstrated by (i.e. statistically predictive of) cross-admit decisions. </p>

<p>
[quote]
For example, most people won't even apply to both schools of a particular pair, simply because nobody goes around applying to the entire set of 100 or so schools that were measured in the ranking.

[/quote]
</p>

<p>Unfortunately the amount of missing data (the sparsity of the matrix of cross-admit results) is a problem for the particular model that they use. In this application it is not a "standard social science model" as you have often claimed. </p>

<p>
[quote]
In fact, the very act of choosing not to even apply to a particular school is, by itself, a key aspect of revealed preference.

[/quote]
</p>

<p>Maybe you didn't read the paper. Where people apply is not a form of revealed preference that they attempt to model, and the RP ranking rewards specialty schools that are negatively "preferred" by a majority who would never apply there (BYU, Caltech, and others). In the other direction, a school that is "preferred" enough to be a favorite safety school will suffer in the rankings.</p>

<p>
[quote]
I would hardly say that this looks "pretty bad" in the least.
You can invoke a very simple explanation - that a lot of people who would choose Caltech don't even get into MIT (but would still have chosen Caltech even if they did get into MIT). The fact that they didn't get into both schools means that they aren't counted as cross-admits.

[/quote]
</p>

<p>That "explanation" is reversed (it would lower Caltech's rating and you are trying to explain an inflated rating). More importantly, any explanation based on what different data might have shown is a concession that the model is unstable -- the rankings are not a reflection of reality but of accidents in the data, because the method of ranking is sensitive to accidents. That is precisely what one expects for the RP model with this amount of data. </p>

<p>
[quote]

[quote]
That isn't the problem revealed by the MIT and Yale/Stanford data. If group 1 (students in general) and group 2 (cross-admits) behave differently, this is just a semantic issue of how to describe the results of the study. The problem is that the study is unreliable, and apparently wrong in many ways, in "revealing" the preferences of its study cohort (group 2, as you put it).

[/quote]

Uh, no, this is not just a 'semantic issue of how to describe the results'. It is the very heart of the study itself.

[/quote]
</p>

<p>If the sample population (group 2) doesn't typify the population they want to measure (group 1), the results can be qualified as being the preferences of "applicants with attributes X,Y,and Z" rather than "what applicants prefer".
That is a semantic difference, and doesn't impinge upon whether the study accurately described its data, i.e. was it effective as a model of group 2's cross-admit choices. </p>

<p>However, you are right in one way that is bad for the RP study: if the results would be substantially different for the intended population (group 1) compared to the sample population (group 2) it indicates sensitivity to the sample, which means the results are not reliable.</p>

<p>
[quote]
The RP paper says it many times over, since it's obviously true: they take a list of cross-admit decisions (i.e. for each student in the sample, specify the list of schools that accepted the student and which of those was selected), and from that information alone, produce a ranking. That is what it means to "rank universities by cross-admit decisions".

[/quote]
</p>

<p>No, that is a completely false. The RP study NEVER relies on cross-admit data. The RP study uses information about admissions decisions (but NOT cross-admit information) as raw data. It then MODELS the data to fill the missing gaps, notably the missing information regarding schools that a student would have wanted to go to (but didn't get admitted to) or schools that the student didn't even apply to in the first place. Hence, the modeled data is BETTER than the cross-admit data as long as the model holds, because cross-admit data, by definition, does not include missing data. </p>

<p>
[quote]
The RP paper models what those students might have chosen from ANY GIVEN MENU of schools (not just 2) out of the full set of about 110. This is an important point, because it constrains the possible models that can be used.

[/quote]
</p>

<p>I was talking about an interpretation of the RP study for the purposes of THIS thread. Since you were the one who was talking about cross-admits, which by definition, means a comparison of only 2 schools, then I was interpreting the RP information thusly. </p>

<p>
[quote]
Actual choices are the absolute CRUX of the paper

[/quote]
</p>

<p>Uh, wrong. There is a world of difference between somebody preferring, say, Harvard and actually having the CHOICE of Harvard. Just because you don't have the actual choice of a particular school doesn't mean that you don't want it. </p>

<p>
[quote]
Unfortunately the amount of missing data (the sparsity of the matrix of cross-admit results) is a problem for the particular model that they use. In this application it is not a "standard social science model" as you have often claimed.

[/quote]
</p>

<p>It is no more problematic than the references to the chess modeling that are referenced in Glickman (1999, 2001). </p>

<p>Nevertheless, neither me nor the authors claim that the RP study is comprehensive or complete. I simply claim is that it is better than the other available ranking systems out there and is also better than raw cross-admit data. For example, which mainstream social science model does the USNews ranking adhere to? Or Gourman? </p>

<p>
[quote]
Maybe you didn't read the paper. Where people apply is not a form of revealed preference that they attempt to model, and the RP ranking rewards specialty schools that are negatively "preferred" by a majority who would never apply there (BYU, Caltech, and others). In the other direction, a school that is "preferred" enough to be a favorite safety school will suffer in the rankings.

[/quote]
</p>

<p>Perhaps you didn't read the paper. Specifically, you may not have read section 7 in which the authors explicitly discuss the notion of self-section and perform a RP study that measures only those students who are interested in technical subjects, and finds that Caltech STILL outranks MIT. </p>

<p>
[quote]
That "explanation" is reversed (it would lower Caltech's rating and you are trying to explain an inflated rating). More importantly, any explanation based on what different data might have shown is a concession that the model is unstable -- the rankings are not a reflection of reality but of accidents in the data, because the method of ranking is sensitive to accidents. That is precisely what one expects for the RP model with this amount of data.

[/quote]
</p>

<p>Again, nobody, not even the authors, is contending that the study is complete. That's not the point. The point is that the study's data is MORE complete than actual cross-admit data precisely because cross-admit data by definition has large gaps of information (again, those who don't get into both schools or don't even apply to both schools). </p>

<p>Now, I agree with you that Caltech's ranking in the RP probably is inflated relative to the entire set of schools, and in particular, may well be inflated relative to those schools that have little overlap with Caltech (i.e. more humanities-oriented schools). But that's not the point that we're discussing. The point we are discussing is what is Caltech's RP ranking relative to MIT, both of which are obviously technically oriented schools. </p>

<p>
[quote]
If the sample population (group 2) doesn't typify the population they want to measure (group 1), the results can be qualified as being the preferences of "applicants with attributes X,Y,and Z" rather than "what applicants prefer".
That is a semantic difference, and doesn't impinge upon whether the study accurately described its data, i.e. was it effective as a model of group 2's cross-admit choices.

[/quote]
</p>

<p>See above. Again, nobody is saying that Caltech's RP ranking isn't inflated relative to the entire set of schools. It probably is. The authors say so explicitly. </p>

<p>What counts for the purposes of this discussion is whether Caltech's RP ranking relative to MIT is inflated. </p>

<p>
[quote]
However, you are right in one way that is bad for the RP study: if the results would be substantially different for the intended population (group 1) compared to the sample population (group 2) it indicates sensitivity to the sample, which means the results are not reliable.

[/quote]
</p>

<p>Again, nobody is arguing that the results are 100% reliable. Of course they are not. No rankings results are. A few changes here and there in the methodology in USNews can also result in wild swings in rankings. </p>

<p>Again, the real value of the RP is not that it is a fundamentally perfect study. Rather, it is more grounded in theory than any of the other studies out there. What exactly is the theoretical justification and workup for the methodology in USNews? Or Shanghai Jiao Tong? Or THES? Or any of the other rankings? Whatever else you might say about RP, I would hardly say that it is worse than any of those other rankings.</p>