"Race" in College Admission FAQ & Discussion 9

<p>

</p>

<p>Sounds like blackmail to me.</p>

<p>

</p>

<p>I am glad to see you make a distinction between a racist and an EAGS. I think of him more as a chauvinist. He seems to be quite full of himself, have a rather low opinion of humanities and social sciences, and some of his students as well. I remember he once said that he felt like crying reading some of their stuff, and this is aside from physics. Pretty harsh.</p>

<p>He did acknowledge to a poster that you were correct but did not say anything further on your analysis. I was waiting for the other shoe to drop but, disappointingly, nothing happened.</p>

<p>score = a<em>talent + b</em>prep + c*luck; a, b, c > 0</p>

<p>I wonder where simple tragedies, such as waking up with a terrible headache, fever, achy breaky heart, test apprehension, mild distractedness, writers block etc fall in this equation. If I were to postulate it as luck, these contribute to a negative luck.</p>

<p>As a parent, I have seen or heard of all of the above.</p>

<p>

</p>

<p>Of course your math is basic. As I “showed,” </p>

<p>750 = 20<em>talent + 30</em>prep + 10luck*</p>

<p>if “prep” is higher than that of a non-Asian applicant, as expected if the observation belongs to an Asian application, holding “luck” constant, “talent” has to go down. That is very basic math. I have never once that was wrong. It is your INTERPRETATION and your POLICY PRESCRIPTION that are all hilariously mistaken. Yes, holding constant a score, talent and prep are perfectly negatively correlated: when one goes up, the other goes down. (The presence or absence of luck is irrelevant, as you well know.)</p>

<p>But this in no way suggests that unconditionally, talent and prep / Asian have any relationship, much less a negative one. All it’s saying is that one doesn’t have to be extremely talented to achieve a 750; one “simply” has to prep a lot. (More on this later.) Equivalently, one doesn’t have to prep a lot to achieve a 750 if one is extremely talented.</p>

<p>Again, going back to [Kevin</a> Murphy’s web page](<a href=“http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html]Kevin”>Graphical Models), we see that another name for Berkson’s bias / fallacy / paradox is “explaining away.” Quoting Murphy,</p>

<p>

</p>

<p>Talent and prep / Asian indeed “compete” to “explain” the observed score. Thus, they become conditionally dependent (perfectly negatively correlated) given an observed score, even though in the population, we have no reason to assume any correlation between talent and Asian, despite sis’s many attempts to try to make me look like an “East Asian genetic supremacist.”</p>

<p>So what does this all mean? There is no relationship between Asian and talent. There only appears to be one when a score is fixed. In all the examples sis selectively gave, the two variables were KNOWN to be uncorrelated (truth). Yet, in certain SETTINGS, two variables known to be uncorrelated ACQUIRED a correlation. Does that mean they actually were correlated to begin with? No.</p>

<p>

</p>

<p>Oh, so you’re finally admitting, however grudgingly, that your whole “psychic powers” charade yesterday evening was total nonsense.</p>

<p>

</p>

<p>Thanks, sis!</p>

<p>

</p>

<p>Wow, what a slick bluff! I’d love to see your poker face in the flesh.</p>

<p>I mean, seriously, let’s step back at read this part of the paragraph in greater detail. sis is essentially threatening that if I don’t back off, I’ll embarrass myself and give him another opportunity to use one of his lame “defrocked” putdowns. Sorry, sis, you’re going to have to do better than an empty threat. The way you mentioned those two examples from Murphy and Pearl is ludicrous:</p>

<p>

</p>

<p>Let’s take a look at what sis is really doing here. Two examples “[from college admission] to illustrate the effect of conditioning” appear to be appropriate given that “the matter at hand” indeed relates to college admission. But as I have already mentioned, and as anyone can verify since I have hyperlinked to both Murphy and Pearl, the authors gave those examples to illustrate Berkson’s bias / fallacy / paradox. So somehow, Berkson’s bias / fallacy / paradox is “NOT a good way to explain the matter at hand” regarding college admissions, but Murphy and Pearl both saw fit to use examples from college admissions to describe Berkson’s bias / fallacy / paradox.</p>

<p>I don’t even know what you’re threatening to do, given that you admitted that “Berkson’s paradox is analogous (or a special case) [to your “meritocratic discounting” sham]…” So I’m calling your bluff. Go ahead, sis. Tell us why your special case of Berkson’s paradox involves a correlation that is not spurious.</p>

<p><a href=“The%20presence%20or%20absence%20of%20luck%20is%20irrelevant,%20as%20you%20well%20know.”>quote</a>

[/quote]
</p>

<p>The argument is the same if you consider additional variables or not, as long as the overall result of the academic factors and other endogenous choices, tends to be higher (all else equal) for Asians. For instance, you could break “preparation” into several sub-variables such as preparation level in each grade. It isn’t necessary that Asians are higher on every single factor, e.g., they could prepare less in 12th grade as a vacation from 11 previous years of preparation that are more intensive than other groups, as long as the total level of preparation ultimately attained tends to be higher. </p>

<p>The exception to this is if there are score-raising factors that are strong in other groups and not Asians. English fluency and years in USA come to mind, and a correct evaluation of immigrants’ verbal scores would take those into account with a score bonus of some sort (and in theory, East Asian immigrants should get more than immigrants whose native languages are related to English). It is known that something like this is really the case in admission, and the Espenshade studies all found positive effects on admission from immigrant status; the media accounts about “discrimination” did not report this, although most of the Asian applicants in his study were in the immigrant categories that (according to the regressions) received these effects.</p>

<p>

</p>

<p>Who said anything about the unconditional distribution? The college admissions problem is to evaluate an applicant given the score. This means conditional on the score (what else did you think it could possibly mean?). And in the conditional distribution of the variables one is interested in, the relationships that you call “spurious” do hold, statistically. You agree that this inverse relationship between, say, the ASIAN and Talent variables in our running example, although spurious for the general population, is true for the conditional distribution. </p>

<p>The part you need to explain is why it is “wrong” or “paradoxical” to use the conditional distribution for the purpose of a performance-predicting admission. You are in effect advocating that such an admission entirely or sporadically disregard an applicant’s scores during the evaluation process, which is clearly suboptimal. Admissions is not running a sociological study on racial patterns in the general population, it is trying to predict outcomes for individuals who supply their score information.</p>

<p>I’m half Mongolian and half Persian. Should I put down asian and white on my application?</p>

<p>

</p>

<p>It is true for the conditional distribution, and that ought to tell you that something is amiss. The whole point of Berkson’s 1946 Biometrics Bulletin article was to caution against findings of relationships between risk factors and diseases that arise because of statistics; that is, there is no relationship unconditionally, but statistics will make it appear that a relationship exists, even without errors in the statistical computations.</p>

<p>In the general population, there’s no relationship between being Asian and talent. But yes, when you hold constant the SAT score and assume it is a function of talent and effort, which are independent variables both in the statistical sense and in the sense of being manipulated, then since Asians on average have higher values of effort than non-Asians, they will necessarily have lower values of talent.</p>

<p>A perfect negative correlation between effort and talent emerged only because of simple arithmetic; when effort goes up, talent has to go down. That doesn’t mean there is a relationship between the two.</p>

<p>Actually, being Asian has nothing to do with the discussion. If a college has any reason to suspect that a candidate, Asian or not, has a high value for effort, then the estimate of talent given a score of 750 and our assumed relationship must be low. The implication of this is really ridiculous when you think about it; for any given SAT score, people who work more are likely to be less talented. Does anyone actually believe that?</p>

<p>Again, to paraphrase [url=<a href=“http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html]Murphy[/url”>Graphical Models]Murphy[/url</a>], the problem is that talent and effort “compete” to “explain” the given SAT score. They become conditionally dependent (perfectly negatively correlated) holding the score constant even though there is no relationship unconditionally.</p>

<p>

</p>

<p>Why? Don’t you consider it much more amiss that your reasoning leads to Fabrizio’s Paradox, which says that in judging an applicant’s potential given his test score, it is optimal for an admissions office to disregard the score (i.e., to use the distribution not conditioned on score)? </p>

<p>If it is correct to utilize the conditional distribution, then for purposes of predicting applicant’s outcomes after admission, the negative relationships between Talent and ASIAN, Effort etc hold, and the only thing “amiss” was your intuition.</p>

<p>That is exactly what happened (and happened, and …). You found it incomprehensible that a predictively valid, systematic, but negative relationship could exist between the variables Talent and ASIAN (or Effort, or Academic Preparation) in the evaluation of individual applicants given Score and race. This convinced you that something is amiss. But you cannot explain specifically anything that is actually wrong with the argument by direct reference to that argument alone, such as stating why it is invalid to consider a “mechanical relationship” (i.e., to assume that Score could in principle be represented by mathematics and statistical modeling), or why the distribution conditioned on the given values of Score and ASIAN is not the right one. Those two are your only surviving attempts at a criticism.</p>

<p>The present state of your two counterarguments, after a dozen requests for clarification, is the claim that “everyone can see” mechanical relations are un-kosher (QED!); and that my argument resembles some fallacious arguments that are analyzed in medical statistics papers about diseases and hospitals. </p>

<p>If the resemblance is as clear as you say (e.g., you say I am incompetent for not perceiving it) why are you unable to construct a specific correspondence between the terms in my argument like (Score, Talent, Effort, ASIAN) and functionally similar elements of the fallacies such as (Disease A, disease B, hospital, doctor) that would expose the error? An isomorphism of errors, so to speak. Throwing around terms like Berkson’s Fallacy or the Swiss Cheese Effect or McSnorgle’s Trichotomy is impressive, I admit, but how do you justify the assertion that my reasoning is equivalent to using the “hospital patients” distribution to reason about the “general population”, or vice versa? In fact, it is equivalent to using the hospital distribution to reason about diseases in the hospital, and the general population distribution to reason about diseases at large, which is exactly how it should be done for purposes of estimation and prediction.</p>

<p>

</p>

<p>Again, you miss the whole point of Berkson’s example. He assumed that there was no relationship between hypertension and skin cancer, that is, properly implemented studies should not find a link between the two. But he showed that a relationship could easily be found in seemingly scientifically controlled case-control studies as long as hospitalization rates among the three groups to be studied differed. That relationship arised purely because of statistics; by assumption, it did not exist in the general population, but statistically, it had to exist in the study.</p>

<p>It’s the same thing here. There is no relationship between being Asian and talent in the general population. But given our assumed relationship between the SAT score and talent / effort, we can easily show that since on average, Asians prep more (i.e. have higher values of effort than non-Asians), they will necessarily have lower values of talent (than non-Asians), for any given SAT score.</p>

<p>Arithmetically, that has to be true. But what are the implications, beyond using Asians polemically? Under your policy prescription of “meritocratic discounting,” if a college has any reason to suspect that applicant i with SAT score 750 has a high value of effort, then given our assumed relationship, he must have a low value of talent, regardless of racial classification. For example, if we believe that wealthy students can put in more effort than non-wealthy students due to being able to afford test prep both in nominal costs and opportunity costs (i.e. they do not have to work part-time jobs to support their families), then given our assumed relationship, colleges should estimate the talent of wealthy applicants as lower than that of non-wealthy applicants, for any given SAT score.</p>

<p>The correlation must hold true for any given SAT score and given our assumed relationship between the score and talent / effort. But, again, does anyone actually believe that? </p>

<p>

</p>

<p>Let’s go back to [Murphy’s</a> example](<a href=“http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html]Murphy’s”>Graphical Models). He let C denote the event that someone is admitted to college, which is true if someone is either brainy (B) or sporty (S). In the general population, B and S are (statistically) independent. If we restrict our attention to a population of college students (C is true), then it can easily be shown that being brainy is negatively correlated with being sporty and vice versa. Why? Either property alone is sufficient to explain the evidence on C; the two “compete” to “explain” C.</p>

<p>Now, let C denote the event that someone has an SAT score of 750. To simplify the correspondence, let E and T be binary variables where a value of 1 indicates high effort or high talent, respectively, while a value of 0 indicates low effort or low talent, respectively. C is true if someone is either high effort (E) or high talent (T). In the general population, E and T are (statistically) independent. If we restrict our attention to a population of students who have earned an SAT score of 750 (C is true), then again, we can show that having high effort makes an applicant less likely to have high talent and vice versa. Why? Either property alone is sufficient to explain the evidence on C; the two “compete” to “explain” C.</p>

<p>Again, when we restrict our attention to situations where C is true, two independent variables BECOME dependent because of statistics. They “compete” to “explain” C, and so if one goes up, the other goes down. That’s all. To argue for “meritocratic discounting” based on a “one goes up, the other goes down” relationship that arises because we have restricted our attention to certain events is quite absurd.</p>

<p>

</p>

<p>The point was clear enough: learning statistics in 30 minutes didn’t work. The backup plan is to voluminously restate the undisputed material and keep stalling on the central question:</p>

<p>Why. Is. The. Conditional. Distribution. The. Wrong. One?</p>

<p>Equivalently: how does disregarding applicants’ test scores optimize admissions decision-making?</p>

<p>Your argument is that using the score (i.e., calculating in the conditional distribution) produces results that are suprising, paradoxical, not true in the “general population”, and not to your liking. That’s all true, but none those statements is an explanation of why prediction, evaluation, and decision-making that disregards the known information, such as an applicant’s SAT score, is superior to using the known information, which means using the conditional distribution.</p>

<p>Note: that the conditional distribution gives wrong answers for some questions about the general population is interesting (as reflected in words like “paradox”) but the admissions problem does not involve those questions. It asks only how to estimate, or reason about, an applicant’s unobserved characteristics from the observed ones. If given an applicant’s score there is a relationship of the kind we discussed between a variable you can observe (being Asian) and one you cannot (Talent, Effort, Academic Preparation etc) then adding the knowledge that the applicant is Asian should raise the estimate of Effort/Preparation and lower estimate of Talent.</p>

<p>

</p>

<p>The admissions problem takes place “in the hospital”, not the general population. Which is to say that you misunderstood the analogy between the problem we are talking about and the one you E-Z-learned from skimming a web page. This is one of the reasons I recommended to set it aside and reason about the admissions problem directly.</p>

<p>

</p>

<p>It is correct that “Asian” is the “race” designation for a Mongolian person in the usual case, and “white” is the race designation for a Persian person in the usual case. So you would have a basis for marking both “Asian” and “white” on the “choose one or more” optional form question about race. You also have the choice of not marking anything at all. That is up to you, by federal law. If you have some reason to think that this background is unusual and might add diversity to the next class at some college you desire, you can give more details about your background in the student-generated parts of the application form, such as your answers to short answer questions or in your college application essays. I have no idea how unusual a background this actually is, and how colleges might differ in considering such background as an admission factor or not. </p>

<p>Good luck in your applications.</p>

<p><a href=“fabrizio:”>quote</a> </p>

<p>Now, let C denote the event that someone has an SAT score of 750. [etc]

[/quote]
</p>

<p>The patient (an applicant) is admitted to the hospital (the population of 750 scorers, or scores of 750+). The patient is hospitalized due to disease T (talent) or E (effort), or both. An admissions officer walks through the hospital and sees a patient. He is not told whether the patient has disease T or E.</p>

<p>The admissions problem is to estimate the probability that the patient suffers from T. </p>

<p>Coming a bit closer to the patient, the admissions officer sees that he is from ethnic group A. It is known that the incidence of disease E is substantially higher in group A. </p>

<p>The admission officer should raise his estimate of E and lower his estimate of T given that A is true. Given that A is not true, he should lower his estimate of E and raise the estimate of T.</p>

<p>This situation occurs in medicine in screening tests. A patient gets a positive result on a test for the presence of some disease. This could happen for two mutually exclusive reasons: having the disease, and getting a false positive on the test. The doctor (now playing the role of admissions officer in our fable) wants to determine whether the patient really has the disease or was a false positive. The test has different rates of effectiveness in different subpopulations, as is the for some genetic screens. Given the patient’s race or ethnicity, one’s estimate of the chance that the test result is correct should be revised. Because there is a relationship – in fact a perfect 100% negative correlation – between the “has disease” and “false test result” outcomes, this means that given an applicant (I mean, patient’s) race one should also revise the estimate of how likely it is that they have the disease (or talent), given the score (I mean, test result).</p>

<p>

</p>

<p>Because. It. Introduces. Selection. Bias. Your finding is totally spurious; it is purely a result of two factors “competing” to “explain” the outcome (i.e. an SAT score of 750).</p>

<p>Given our assumed relationship between score, talent, and effort, sis wants everyone to believe that Asians should be seen as less talented than non-Asians for any SAT score because holding constant the SAT score, high values of effort must be accompanied by low values of talent, or else the equality is violated.</p>

<p>So in the general population, Asians are neither more talented nor less talented than non-Asians, but among high schoolers who have taken the SAT, Asians are less talented than non-Asians. Seems quite paradoxical, right? How can this be? Easy. Talent and effort “compete” to “explain” the given SAT score. When one goes up, the other goes down, or else the equality is broken.</p>

<p>That’s all there is to it: when one goes up, the other goes down. There is no actual relationship; there only appears to be one because we’re holding the score constant. Thus, the observed negative correlation between Asian and talent given a score is spurious.</p>

<p>

</p>

<p>E and T “compete” to “explain” the score. Having E makes it less likely to have T, and vice versa, even though E and T are statistically independent. That’s all there is to it. Two independent variables became conditionally dependent upon observing a fixed score. That in no way suggests that there’s any actual relationship between the two variables.</p>

<p>

</p>

<p>Please explain, in terms of the stylized example with binary variables E,T,A, and a numerical variable Score, which of the following probability distributions is the best one for reasoning about the probabilities of E or T or both, for a given candidate with Score=750 and Race=Asian. We do want to make a prediction/estimate about T and/or the pair (E,T) for the candidate, either in the form a single guess as to the value, or a set of probabilities for the possible values (2 possibilities in the case of T, or 3 in the case of the (E,T) pair). We do not care whether the method used involves “true” or “spurious” relationships or some other technique, only about prediction accuracy. The available distributions are:</p>

<ol>
<li><p>The joint distribution of (E,T,A,Score), i.e., the “general population” distribution. Equivalently, the joint distribution of (E,T) derived from this.</p></li>
<li><p>The joint distribution of (E,T,A) conditional on Score=750, i.e., take account of the candidate’s score but not his race. Equivalently, the joint distribution of (E,T) derived from this.</p></li>
<li><p>The joint distribution of (E,T) conditional on Score=750 and A (Asian) being true.</p></li>
</ol>

<p>Which one will lead to the most accurate predictions and decisions? You can just write “1”, “2”, or “3” if that saves time.</p>

<p>

</p>

<p>What is a “finding”? We are asking only what method best predicts T and/or (E,T), given the information at hand. We do not care whatsoever about sociological findings as to who is more talented, or whether there is a neat causal relationship as is the case with A and E. If there is a negative relationship between A(sian) and T(alent) it can be marked down to an accident of statistics rather than a “fact about reality”. This does not answer the question of whether the accident should be used or ignored for purposes of accurate prediction, when looked at from the purely statistical point of view excluding legal, practical or other non-statistical considerations.</p>

<p>

</p>

<p>Do you understand that most statistical methods in the real world are based on “non-actual” relationships? That physics and engineering take account of “fictitious forces”, such as the Coriolis force? The terms “spurious” and “reality” are relevant to interpretation of the models (e.g., do we make inferences about Asian talent or discrimination from the sign of an ASIAN coefficient in a statistical model) but not to the selection of which model is optimal or correct.</p>

<p>

</p>

<p>I rest my case…</p>

<p>

</p>

<p>…or not. Look, siserune, if you’re admitting (or at least accepting the possibility) that the negative relationship is nothing more than an “accident of statistics,” what reason would you have to believe that such an accident in any way aids “accurate prediction”? Would your predictions not be inaccurate as a result of relying upon things that appear to be true but in reality are not?</p>

<p>

</p>

<p>There’s the mistake. Two years of it.</p>

<p>There is no connection between what is “causally correct” (or straightforward, intuitively reasonable and so forth) and what is “scientifically correct” or “predictively accurate”, that is, optimal for actions, decisions, modeling and understanding of the world. It is relatively rare that every part of a model have a clear causal interpretation. </p>

<p>For example, if you construct a statistical model to predict the price of houses in terms of variables that all positively contribute to price, such as number of bedrooms and bathrooms and closets, then if there are enough variables you will probably find that one of the coefficients (let’s say, number of closets) is negative, so that increasing number of closets reduces the projected value. This is a “spurious negative relationship” since adding more closets, all else equal, increases the value of a house. If you react to that by throwing out the model, or removing closets from the model, then you are overruling the objectively selected model, which was chosen so as to optimize some sort of prediction or performance criterion, with an inferior one that impoverishes your understanding. It will literally impoverish you if you are buying or selling houses and using the wrong model to appraise their value.</p>

<p>Now, if you would please choose door number 1,2 or 3 we can determine whether there is any further dispute about discounting of Asian scores in a “pure performance-predicting admission”. The correct answer is #3, the conditional distribution. Do you agree?</p>

<p>

</p>

<p>Well, see, sis, there’s a few problems here. First, you are now talking about something different. Back when you were making the case for your “meritocratic discounting” sham, you insisted that the goal was to predict the value of an independent variable (talent) given the value of the dependent variable (SAT score) and an assessment of the other independent variable (effort). With your housing example, the goal is to predict the value of the dependent variable (price) given a set of independent variables (e.g. bedrooms, bathrooms, and closets). Since we are no longer holding the dependent variable constant, there’s no reason to believe that if bedrooms goes up, then bathrooms has to go down, holding closets constant (c.f. given an SAT score of 750, holding luck constant, if effort is high, then talent has to be low to maintain equality).</p>

<p>Second, you have no case to argue that the coefficient on closets (or on any of the other variables) is negative. At least with Asians and effort, you could cite Steinberg et al. (1997) to argue that Asians “exert more effort” (excuse me, “spend more time”) than non-Asians. Here, you simply assumed that the coefficient is negative with nothing to back it up.</p>

<p>

</p>

<p>No.</p>

<p>

</p>

<p>The words “probably will be negative” meant that (1) this happens frequently in practice, and (2) there are underlying mathematical reasons why the probability of having a negative coefficient in the regression rapidly approaches 1 as the number of variables gets larger, no matter whether the influence of each one is positive. It is also very easy to (3) artificially construct simple examples of regressions where the signs are “spurious”, and (4) we have already discussed two regressions from Espenshade’s papers where coefficients are “spurious” in the sense that their direct causal interpretation is known to be wrong. </p>

<p>So the question is not whether such examples can occur, but how your pseudoscientific dichotomy between spurious and genuine correlations would apply to them: statistical models, innocently designed to best fit the data, that show “spurious negative relationships”. Do you</p>

<p>A. throw out the model,
B. accept the model as correct for predictive purposes,
C. remove the variables with negative coefficients, or
D. keep the model, but only if you like the result?</p>

<p>Actually, let’s consider Espenshade’s results in more detail. In his book he displayed a linear regression predicting college students’ class rank, and the coefficient ASIAN was negative (indicating underperformance relative to credentials and socioeconomics). This negative relation is “spurious” since changing one’s race or the language of one’s surname should have no impact on ability to perform academically, and Asians are not known to carry a grade reduction gene. If a university were to replicate Espenshade’s results on its own data, with Asian being a negative predictor of academic performance, then when predicting applicants’ academic performance is it your contention that the model makes worse predictions than the same regression run without the ASIAN variable? Or that one should just drop the Asian coefficient coefficient from Espenshade’s model and keep the others, or that the model should be treated as bogus, or what?</p>

<p>As we also discussed earlier, Espenshade’s regressions for admissions probability showed that it was not increasing in the number of AP exams. The direct causal interpretation of this is false, as having more academic credentials is always better, or at least no worse (and it correlates with other factors that should also become more favorable as you ratchet up the number of AP’s). Do you agree that this is a “spurious negative relationship” between more AP exams and admission probability? If so, does it mean the model is invalid for making predictions of admission chances?</p>