<p>Sounds like you are beginning to agree with me that it's an unrepresentative sample, and that one really can't tell from it the percentage of 1600 (or 1500 or 1400) scorers admitted to a given institution for the year under consideration.</p>
<p>What is "Tufts syndrome"?</p>
<p>Clearly the sample is underrepresented, the co-author already admits that only those in the top 10% of their classes are represented. That is not in dispute.</p>
<p>Again, I quote verbatim the passage from the book that describes these numbers:
[quote]
To refine our figures, we estimated the chances of admissions for a hypothetical male applicant who is approximately average within the survey in terms of activities and high school attended, and who has no other distinguishing characteristics. We assess the prospects of this average student four times, giving him four different sets of test scores varying from 650 to 800 on each of five SAT tests (the SAT-1 verbal and math, and three SAT-2 subject tests).
[/quote]
Let me emphasize two words here: estimated and hypothetical. My apparently limited knowledge of the English language tells me that hypothetical applicants are not actual ones, and that projections based on regression models are not equivalent to actual survey data. I'm not sure where you get the idea that these figures reflect the percentages of actual 1600-scorers in a survey that got in, because the plain text of the book clearly says the opposite.</p>
<p>The percentages of admissions among actual 1600 scorers in the survey is the statistic drawn from a sample that is used to estimate the parameter of the population, i.e. the true rate of admissions among all 1600 scorers. Similarly the statistic can be used to simulate the hypothetical admission chances of persons based on the known variables. Does that explain it?</p>
<p>
[quote]
The percentages of admissions among actual 1600 scorers in the survey is the statistic drawn from a sample that is used to estimate the parameter of the population, i.e. the true rate of admissions among all 1600 scorers. Similarly the statistic can be used to simulate the hypothetical admission chances of persons based on the known variables. Does that explain it?
[/quote]
No, and this appears to be the misconception. You and Byerly seem to think that these numbers were based more or less directly off survey data - where there's a sample of 1600 scorers, you find their admissions rates, and use those to approximate the rate for the entire population. That is standard statistical practice, but it's not what happened.</p>
<p>Instead, the authors used a multivariable regression model to estimate the chances of a hypothetical male applicant with a certain set of qualifications. This is very, very different.</p>
<p>Oh, I was under the impression that the estimates for the hypothetical applicant is based on the survey data of 1600 scorers alone. If what you say of the table is true then your claim and the co-author's claim are at odds with respect to the very text of the book that deals with that particular table. That sounds like a very easy issue to resolve.</p>
<p>
[quote]
That sounds like a very easy issue to resolve.
[/quote]
lol, I know!</p>
<br>
<p>Statistics are estimates by definition.</p>
<br>
<p>Not so. Census data reporting the number of white families living in Greensboro, NC are statistics, but not estimates.</p>
<p>Understand that the authors took steps to eliminate the admit rate biases resulting when individuals were legacies, recruited athletes, URMs, etc. This was key to their thesis that "equivalent" early applicants were admitted at a far higher rate than RD applicants.</p>
<p>Right, and this was the point of the numbers we're arguing about. They were hypothetical figures dealing with the admissions chances of a "completely average" male given a few sets of test scores. The idea was to prove that EA vs RD made a big difference for "normal" applicants.</p>
<p>The problem is that they did not have a large database of perfectly average applicants at their disposal, so they had to simulate one instead. This simulation, based on a regression model, is inevitably going to be a bit fluky, which is the problem with using the numbers in this context on CC. Plus, the numbers aren't even supposed to represent the chances for the whole pool of 1600-scorers, like you initially claimed.</p>
<p>You are still confused. That's not the situation at all. And you are also confused about what I "claimed."</p>
<p>
[quote]
Not so. Census data reporting the number of white families living in Greensboro, NC are statistics, but not estimates.
[/quote]
</p>
<p>If the census data covers every single person in the population, then it is a parameter, not a statistic. If the census data is only able to reach a portion of the population, then it is a statistic used to estimate the parameter of the population.</p>
<p>And if the Democrats had had their way, the last federal census WOULD have been based largely on estimates. The idea was to conjer up some number of homeless, illegal immigrants, elderly shut-ins, etc in primarily Northern Democratic urban areas, add them according to a formula, and thus: </p>
<p>(1) preserve Congressional Districts which would otherwise be lost to the South, and </p>
<p>(2) pump up their population-based federal aid distributions.</p>
<p>Okay, as long as we're throwing in political references, I might as well try an elaborate analogy to illustrate what's going on here.</p>
<p>Let's say that we were trying to determine the impact that some variable - say, being a regular churchgoer vs. not being one - has on some outcome - say, voting for Bush in the 2004 election. The former is analogous to early vs. regular action, while the latter is analogous to "acceptance."</p>
<p>Now, clearly there is some relationship between being a churchgoer and voting for Bush, as we can see from basic exit polls; a higher percentage of the devoutly religious voted Republican. Similarly, we can see right away that applying early action has some relationship with being accepted - a higher percentage of early applicants are admitted. But these are merely correlations - to show that there is probably a causative relationship we'd have to dig deeper, which is exactly what the authors did.</p>
<p>So what's the next step? Well, if we follow the example of The Early Admissions Game, we create a hypothetical voter who is "average" within the data pool. We might end up with a white, married middle-aged female. For good measure, we'll endow this imaginary voter with several possible incomes - say, $30,000, $60,000, $90,000, and $120,000 (income correlates with voting, like SAT score does with acceptance). Finally, we'll use a regression model to estimate the probability that a white, married middle-aged female will vote for Bush at each income level. We'll do this separately for the "regular churchgoers" and the "less religious" within our data set, to try to establish that religious devotion will indeed affect our "average" individual's vote.</p>
<p>We'll probably come up with some interesting results. But what you have done is analogous to taking the numbers we estimated on voting vs. income for white middle-aged married women and saying that they represent the relationship between income and voting in general. That's not a valid inference.</p>
<p>Plus, the data are not credible for churches of some denominations.</p>
<p>lol... well, thanks for trying to extend the analogy, but I was thinking exit polls (where polling people ask voters leaving the polling stations), like the surveys used for the book. Maybe some people fudge a bit in exit polls, but probably not much, since there's no rational reason (the same could be said for the surveys).</p>
<p>Randomperson's analogy is an excellent one, and he/she clearly understand analytics. We used three different tools to analyze data. (1) We got access to 4-7 years worth of data from 14 of the 20 most selective schools in the country. This dataset included every possible variable for which you would want to control - including the admission officers own ratings. The dataset had close to 1,000,000 records. The data showed that at almost every school, applicants were admitted at significantly higher rates - even when controlling for all other key variables. Given the massive size of this database, the confidence intervals are very tight - even for relatively small SAT bands at each school.(2) We used the same data to evaluate white males in the top 10% of their class. We evaluated the admit rates in ED and Regular decision at each school for each SAT band. Again, applying early was a significant advantage in each SAT band for each school. And again, the confidence intervals were tight. (3) We followed several thousand applicants across the application process. While we had enough records for the schools referenced in the book, I agree with Randomperson that we did not have sufficient numbers in each SAT band for each school to have narrow confidence intervals around the statistics.</p>
<p>But doesn't your model predict for Regular Decision applicants with an SAT of 1400 a one tenth of one percent (0.1%) chance of admission to Harvard and a thirty-five plus (35.4%) chance of admission to MIT?</p>
<p>It would seem those confidence intervals would have to be <em>very</em> wide indeed. No?</p>
<p>Leon_02138, yes, the numbers seem pretty fluky, but we have to consider how they are created. College admissions, where dozens of variables interact in indescribably complex ways, cannot be well approximated by even the most sophisticated regression analysis.</p>
<p>So if regression analysis has so many flaws in cases like these, why do we use it? Frankly, there's no better alternative. The ideal statistical practice is to conduct a controlled experiment, but that's out of the question here (what would we do? fake thousands of applications to test the admissions officers?). We're left only able to collect and analyze data, and although there are ridiculously complicated nonlinear relationships between different variables, our mathematics can only do so much. Thus the weird results, like the 0.1% vs. 35.4% gulf you mentioned. The main point of the book - that early applicants have an edge over regular ones at most schools - was proven quite conclusively.</p>
<p>And to Mr. Fairbanks - thanks again for writing the book and coming on the thread to help clear things up. I really enjoyed reading The Early Admissions Game. It admirably demonstrated that the standard admissions spiel about early action ("higher early admissions rates are due only to the incredible strength of the early pool") doesn't hold statistical merit, and I hope that more admissions officers will consider adjusting their practices.</p>