Perfect score + GPA applicants are actually pretty rare even at Harvard

Hello - I’ve had a chance to read through the Arcidiancono report. This is quite interesting. The most informative piece of information is on page 136 Table B.7.2. I was most interested in this, because it provides details on what factors are most predictive of admission to Harvard. Although Harvard will have you believe their decisions are holistic, everything can be reduced to a number – in this case an Odds ratio, which is a simple way to measure the effect of different variables and the chances of admission. Using logistic regression is a powerful way to hold other factors constant, and to examine the effect of individual variables on the outcome of interest (in this case admission to Harvard). It allows us to tease out which factors are most influential in the admissions process.

http://samv91khoyt2i553a2t1s05i-wpengine.netdna-ssl.com/wp-content/uploads/2018/06/Doc-415-1-Arcidiacono-Expert-Report.pdf

Arcidiacono fits 6 different logistic regression models to estimate odds of admission based on various factors. Each of the models takes different variables into account. The 6th model is the most comprehensive, and also the most informative. Arcidiancono’s models are displayed in Table B.7.2, but are displayed in a confusing way. The coefficients for the logistic model are presented along with standard errors. I would have preferred an odds ratio along with the 95% CI instead.

I took the data from B.7.2 and converted the coefficients into an odds ratio by taking the natural logarithm, and then rank ordered them from high to low. Arcidiacono also includes interaction terms, which I didn’t include for brevity sake. But here are the odds of being admitted to Harvard from high to low, and the accompanying odds:



Arcidiancono Report Logit Coefficient and Odds Ratio of Admission by Factor

Factor          Coefficient Odds
Athlete         7.85        2563.17
African American    2.66        14.28
Deans Interest      2.32        10.20
Legacy          1.84        6.30
Faculty child       1.70        5.50
Hispanic        1.42        4.13
Early Decision      1.28        3.60
Disadvantaged       1.08        2.95
Double legacy       0.63        1.88
Fee Waiver      0.52        1.69
Academic Index      0.41        1.51
Applied for FA      0.16        1.17
Gender: Female      0.13        1.14
Major: Comp Sci     0.11        1.12
Major: Physical Sci 0.05        1.05
Major: Humanities   0.03        1.03
First Generation    0.02        1.02
Major: Math     0.02        1.02
Major: Unspecified  -0.01       0.99
Major: Engineering  -0.02       0.99
Major: Biology      -0.08       0.93
Asian American      -0.27       0.76


So how to interpret this? Lets take an example. The fitted model suggests that holding each of the other factors equal, an early decision applicant has a 3.6 higher odds of being admitted (or an absolute percentage of 260% higher chance). Accordingly, odds ratio below 1.0 indicate a lower chance of being admitted. Based upon this, there is no better way to get admitted than to be a recruited athlete (2563 odds). A far second is being of African American race (odds ratio of 14.28), followed in third place by being on the Dean’s Special Interest list.

But there are other important pieces of information here. In particular the choice of major listed on the application has little influence on admission chances. The advantage given to legacy applicants is about the same as being a child of faculty/staff. However being on the Dean’s Interest list has a huge impact on admission (odds 10.2). Items that I normally would have thought to have a huge impact are very low down on the list: especially academic index. Having a high AI only increases the odds of admission only marginally (OR 1.51), same for first generation status (OR 1.02). I guess having a high AI is a given for Harvard, but the admissions people will have you believe that the high school transcript and test scores are the two most important factors. This is clearly not the case.

The following controllable factors appear to increase the odds of admission: applying early decision, applying for fee waiver, applying for financial aid, computer science major. These uncontrollable items also increase odds of admission: female gender, disadvantaged status, race (african american or hispanic), dean’s interest list, legacy, faculty child, double legacy. These factors have a negative impact on admission odds: Unspecified major, Engineering major, Biology major, Asian American.

I hope I’m interpreting these tables correctly. If I made a mistake in this interpretation, please feel free to point out.

I’m going to read the Card report later this weekend, will provide my comments later.

Useful ideas, @sgopal2. I’m looking forward to your ideas on the Card reports.

To my mind, Card’s use of facially biased variables like the Personal Rating as a control does not pass the laugh test.

As well, his inclusion of special admit categories such as recruited athletes and development candidates in his models looks to be an effort to goalseek. No one thinks race plays any particular role that is independent of powerful hooks like recruited athlete, Dean/Director list or legacy, so the inclusion of those specially hooked students will of necessity weaken the significance of race in that portion of the applicant pool where race would be expected to make a difference.

@sgopal2
Thanks for the data analysis. A few questions:
I thought Harvard does not have early decision only early action? And wouldn’t most athletes/legacy be applying in the early action round? How come double legacy is less “helpful”/reduces the odds of getting admitted than legacy alone?

You folks seem to be rebuilding the dictator based on a piece of his nose. As seen by the plaintiff’s rep. Lol.

I get the curiosity. Be be aware you’re looking for pieces that fit, not examining first hand. You can’t speak from authority, only guess patterns after the fact. How many have seen more than an app or two? How many understand the institutional priorities?

A “piece of the nose” that explains the majority of variance in admission decisions.

Harvard’s Dean of Admissions can speak from authority, has seen more than an app or two, and understands institutional priorities. Nevertheless, as shown in the lawsuit, when he wanted to learn about how low income influenced admissions decisions, he asked the OIR for a statistical analysis. And the Harvard OIR provided a logit regression coefficient admissions model, with similar coefficient magnitudes to what was quoted in the post above.

Harvard’s rep in the lawsuit also agrees with this approach saying, “I agree with Prof. Arcidiacono’s general approach, … The use of a multivariate logit model makes sense. Multivariate regression analysis is a widely accepted and common statistical technique in both academia and litigation…” He goes on to make his own similar logit regression coefficient model of admission decisions, which came to similar conclusions for the most part (Asian coefficient differed).

It’s standard practice to analyze how well admissions decisions meet institutional or other goals via statistical regression analysis, rather than just take someone’s word for it or assume admissions cannot be modeled when they include holistic criteria.

On using numbers, one thing that has struck me about the Harvard admissions process is that it does not make any attempt to quantify or “rate” the personal essay. This is different from Duke’s process, for instance, in which ratings are in fact assigned to essays. (We learned about this process from Arcidiacono’s earlier study on the time progression of GPA differences and switching major behavior; except for rating the essay, it is otherwise broadly similar to Harvard’s.)

From the Harvard documents, it is clear that unobservable aspects of the personal essay factor into both the Personal and Overall Ratings, yet the testimony in the depos is that there are no hard and fast rules, even for junior adcoms (unfortunately, we cannot see - yet - what is contained in the interview and “reading” procedures that do exist).

Notably, we see little difference or obvious evidence of bias by race/ethnicity in the alumni interviews, LoRs, school counselor reports, and evaluation of ECs, which also presumably contain subjective and largely unobservable judgments. However, we do see large differences in the Personal Rating, and enormous ones on the Overall Rating. It’s not clear to me how many different adcoms are involved at the Overall Rating stage (the “final reader”), but if I were plaintiff’s counsel I’d start sharpening up my cross examination knives right now.

I think there are quite a few people here who have had direct experience with HYP, both as students themselves and as alumni and parents. I wouldn’t worry too much that people don’t understand the institutional priorities.

That being said, I am surprised at just how important the development candidates are for an institution as wealthy as Harvard - perhaps I shouldn’t have been given the performance of HMC since Meyer left.

Also, the extent of legacy preference is a surprise. Legacies who were academically merely in the top half (roughly) of the applicant pool, meaning an Academic Rating of from 2- through 2+, had a blended admit rate of 55% over the classes 2009-2016. (I doubt there were more than a few legacies who were Academic 1s, as there are so few generally, so really we are talking about the 2s.) This is an extraordinary preference considering that this is just a single academic measure, resulting in higher than even the raw admit rates for perfect score/perfect GPA candidates. See p.4 of the exhibit here: http://samv91khoyt2i553a2t1s05i-wpengine.netdna-ssl.com/wp-content/uploads/2018/06/Doc-421-112-May-1-2013-Memorandum.pdf

This isn’t about students, applicants, and families, though, what they experience once through the gates. It’s about the gatekeepers. And as I continue to say, what you see in an applicant you know is not the presentation that student makes in his or her app. This is more than getting a middling LoR. It’s the content and thinking. The app is the vehicle, not simply the transcript and an EC list. Bright kids can make a lot of mistakes. Even before you’re at the point when institutional needs predominate, the high stats kids who seem to have it all going for them, to family and friends, can make mistakes.

Regardless of comments that 80% could do the work, only about 10% stand out, of those. (So of about 16k past first cut, maybe 1600. Call it more, if you wish. But final decisions are not among that 80%.)

It’s not about national awards, how few test takers achieve perfection, who has the most AP, who ran what fundraiser. Nor is it about (CC advice to) just do what interests you. And not to write an essay someone could pick up and know it’s you. Etc.

Read a lot of chance threads and you have an example of how so many focus on stats and some good sounding ECs. They miss the rest. If you want a tippy top, you need to think on their level. That’s more than high school Top Dawg.

Merit is not just stats. Or a list of ECs. So complaints it’s not about “pure mertiocracy” still miss the larger point. It’s not about stats-based merit alone. Nor just a seemingly impressive EC list. (Research is not a tip, eg. It is good, but not a defining point. What does matter is the fact a kid did stretch, took on some form of challenges, does more than just a few hs based activities.) And you can see from chance threads how many kids miss this.

Teacher LoRs reflect the high school classroom. Alums don’t see apps and generally aren’t privy to the decision factors. Each of these offers it’s own legit view. But the full app is key. That’s by the student’s hand and mind. Nor do you need a separate essay rating to see the whole a kid is presenting.

The top perforing Asian American kids are great. Usually activated. But just “being” great is not enough. We’ve all seen the Common App and a range of supplements, right? This is much more than come as you are. Each question adds to form an overall presentation.

Lots of strawmen here. The more numbers that come out the more straw gets stuffed.

Can’t wait for the Harvard trial. I’m laying in my popcorn supplies early.

My advice would be for any kid yet to apply (or parents) to truly research the college, what it says and shows, not get stuck on what looks good in the one high school. Unfortunately, many top performers don’t. CC keeps telling them to look at the CDS info about…stats. And to only do the activities they want (lest it be seen as padding or their interests too broad.) They get sidetracked by number of vol hours (often easy things or the random hours of NHS,) they miss the larger persepctive that lots of kids program apps, start a blog, tutor peers, and that founding a club, holding a fundraiser/etc is not an “it.” Some don’t even do the ECs that matter to their prospective majors.

And the kicker is that so many cannot answer the persistent “Why Us?” question. (Some colleges do not ask it directly, but look for answers in other sections of the full app.)

That’s not straw.

@makemesmart: Professor Arcidiacono obtained a dataset from Harvard that covers the admissions cycles from the graduating classes of 2014-2019. He merged this with data that Harvard created from the College Board.

Early Action/Early Decision: Aracidiacono uses the terms interchangeably in his report. But you’re right, Harvard has EA, not ED. But the meaning is that early applicants have a 3.6 higher odds of acceptance vs non-early applicants.

Athletes in early round: Aracidiacono created two distinct datasets: baseline and expanded. The baseline dataset excludes the special groups (EA, athlete, legacy, Deans Interest, faculty kid, etc). The datasets are limited to domestic applicants only. The outcome of interest was simply acceptance vs rejection. He didn’t look at EA vs RD. Also notably absent are waitlist/defer decisions from the model.

Double-legacy vs legacy: For some reason Aracidiacono fitted both of these variables separately into his models. As you rightfully point out, all double-legacies are by definition also (single) legacies. This is known as “multicollinearity” in statistical jargon. He probably should have excluded double-legacies from the model because the two variables are so highly correlated with each other. So the results are somewhat misleading, and probably shouldn’t be interpreted in that way. I’d need to see some more of the model diagnostics (QQ plot, residuals, etc) to see if the double-legacy variables are indeed correct, so I really can’t comment. These details are unfortunately not contained in the report.

Of ocurse, they can be well rounded, by any respectable definition.
Let’s leave it at that.

@sgopal2
Card criticized Arcidiacono for excluding early action candidates who were otherwise unhooked from the baseline dataset in Arcidiacono’s initial report. This fair criticism was accepted and Arcidiacono included unhooked early action candidates in the revised baseline dataset in the rebuttal report. This seems a sensible modification.

About double legacy, I thought it was modeled akin to an interaction term where double legacy was estimated as an additional coefficient over single legacy status alone, but I might be misremembering that.

@satchelsf: yes I’ve seen a bit of the back and forth between Card and Arcidiacono. I haven’t read Card’s report yet, but will write back here.

The double-legacy and single-legacy factors were included in all 6 of the models by Aracidiancono as independent variables. I agree that it would have made more sense to include double-legacies as an interaction term instead of a main effect. Not sure why he did this. But the details provided in the report are not complete, so perhaps there is another explanation.

MODERATOR’S NOTE: Please remember that there is only ONE thread on CC in which race in admissions can be discussed. I had to delete several posts and am closing this thread.