Is Any Month's SAT Curve Easier? A Mini-Study and an Answer

<p>Introduction</p>

<p>There are many theories about which month’s SAT curve is the “easiest.” One of the most common theory is that the May test has the kindest curve because it is when the majority of slackers who have put off taking the test take the SAT, hence bringing down the curve. However, this theory has never been proven. In fact, there is relatively little evidence that suggests any test has an easier, or for that matter harsher, curve.</p>

<p>Interestingly, those who believe that certain curves will be easier forget an important point-the test is curved to account for difficulty. Kinder curves are made for harder tests while harsher curves accompany easier tests. This is done to make each test essentially equal to each other. Nonetheless, it is still worth asking the question: Are any SAT curves easier?</p>

<p>This Mini-Study</p>

<p>This Mini-study seeks to address this question of SAT curve difficulties. Thanks to data gathered by erikthered, it is possible to view the curves of 20 past SAT’s. This mini-study uses statistics in order to analyze whether any of the curves is actually easier (or harsher) than others.</p>

<p>Throughout this mini-study I employ a bit of statistical jargon. If you haven’t taken stats, don’t fret; just skip down to the conclusion where I explain the results in plain English.</p>

<p>WARNING!!</p>

<p>I must warn against drawing any definitive conclusions!!! I do draw my own personal conclusions at the end, but I do not guarantee these to be 100% correct. This data is likely insufficient to warrant adjusting one’s testing schedule. This mini-study was done merely out of curiosity sake. I do not believe it is prudent to use this mini-study to decide which test to take, hoping to score an easier curve. As I said earlier, the tests are curved to adjust for difficulty. Also, there is the chance that I made a mistake somewhere along the line.</p>

<p>Methodology</p>

<p>The data was taken offline from erikthered’s PDF that lists 20 past curves. 7 were from October, 6 from May, 6 from January, and one from March. Because there was insufficient data for March, only May, January, and October were compared. The curves were also compared by section. Thus, there were nine month/section combinations being compared.</p>

<p>To find the difficulty of the curve, the first thing that I did was take the average scaled scores from the first 15 raw scores for each sitting (similar to what erikthered did). A higher average indicates a kinder curve (and harder test) while a lower average indicates a harsher curve (and easier test). Each month’s data was then combined and average. Two-tailed hypothesis testing was used to compare the three months. Positive t-scores indicate a kinder curve while negative t-scores indicate a harsher curve. Two-tail alpha was set to 0.1, so the critical cutoff was roughly 1.943 standard deviations.</p>

<hr>

<p>Statistics Box</p>

<p>A common theme that runs through this mini-study is the z-score/t-score. Simply put, the z-score is the number of standard deviations that the data is from the average. Standard deviation is a measure of how widely spread the data is. It is the average distance that a data point is from the center.</p>

<p>Another number that is used is the p-value. P-value is the probability that a result occurs because of natural variability. If a p-value is very high (for example, 50%), then that indicates that there is a 50% chance that the effect was due to natural variability. </p>

<h2>Alpha is the maximum acceptable p-value for the result to be considered “statistically significant.” Significance indicates that it is likely not due to chance and that there is something going on. Statistically significant should not be confused with the common definition of significant, which normally means “a lot.” Results (as you shall soon see) can be statistically significant without actually being significant.</h2>

<p>The second test done broke each curve down into the top 15 scaled/raw scores. Each of the 15 highest scaled scores for each sitting was compared to the average scaled score throughout all the curves for the identical raw score. Thus, for each SAT sitting I generated 15 points of data that gave how much above or below average the scaled score was for the same raw score for each curve. To be honest, I’m not sure if this is allowed in stats. But I went ahead anyways!</p>

<p>Once again, I performed two-tailed hypothesis test with alpha=.1. Because of the large sample size, 1.645 became the critical cutoff. The z-score that was produced signified how many standard deviations away from the average the scaled score of the curve was away from the average scaled score for the corresponding raw score. For each month there was between 90 and 105 data points, depending on the month.</p>

<p>Results</p>

<p>The table below indicates the results. Positive z-scores indicate a kinder curve while negative z-scores indicate a harsher curve. The absolute value of the z-scores indicates the significance of the result with higher absolute values indicating greater significance. I also averaged the three subjects to get each month’s z-score. The top half of the table is for the first tests where the curves were compared as one unit while the bottom half of the table is for the second test where the curves were broken down into smaller parts. This table DOES NOT show how much the curves actually were above or below average, only the z-score. Sorry that the table doesn't look very pretty.</p>

<p>Full January May October</p>

<p>Math 0.2685 0.710 -1.115</p>

<p>Crit. Read -1.025 -0.812 1.788</p>

<p>Writing -0.928 -0.148 0.352</p>

<p>AVG -0.561 -0.083 0.342</p>

<p>Broken January May October</p>

<p>Math 1.351 2.770 -3.816</p>

<p>Crit. Read -3.511 -2.803 5.845</p>

<p>Writing -2.648 0.269 2.202</p>

<p>AVG -1.603 0.0787 1.411</p>

<p>The following table replaces all the numbers with not significant, significant high, and significant low.</p>

<p>Full January May October</p>

<p>Math Not Sig. Not Sig. Not Sig.</p>

<p>Crit. Read Not Sig. Not Sig. Not Sig.</p>

<p>Writing Not Sig. Not Sig. Not Sig.</p>

<p>Broken January May October</p>

<p>Math Not Sig. High Low</p>

<p>Crit. Read Low Low High</p>

<p>Writing Low Not Sig. High</p>

<p>Discussion</p>

<p>As you can see from the tables, the results are rather contradictory. Although for several months and several sections the curve was found to be significantly harsher/kinder, in no instances did both tests agree that the curve was actually different for any combination. The fact that the two tests never both indicated significance for any month/section combo suggests that any “significance” must be taken with caution.</p>

<p>When the curves were taken as a whole (all three sections), none of the months were significantly different from the average. When averaged out through all three sections, none of the months is more than .56 standard deviations from the average in the first test. This indicates that there is a very high probability (close to 29%) that this result was due to natural variability. Remember, the chance that it is due to natural variability must be less than 10% for the result to be considered significant.</p>

<p>For the broken tests, the results were a bit different. There actually were several months/subject combos that displayed high enough significance to be different from the mean. Some of the month/section combos even had up to 5.8 sigma confidence (enough to prove the existence of a new particle). However, although there was a statistically significant difference for several months/sections, the actual difference was miniscule. At best, the October sitting of the SAT for the critical reading was above average (indicating an easier curve) by a mere 5.09 points. Differences of just 2 or 3 points were much more common. The likely reason that they were so statistically significant was because the way the tests were performed created roughly a hundred data points per month, considerably shrinking the standard deviation. </p>

<p>When the broken tests were averaged out, none of the months were statistically significantly different from the others. The one that came closest was the January sitting (The reason it is not significant at alpha=.1 is because a two-tailed test was used. Had it been one-tailed, the p-value of .0545 would have been considered significant at alpha=.1)</p>

<p>Conclusion</p>

<p>The primary conclusions that can be drawn is as follows. Keep in mind that these are merely my conclusions and are not guaranteed to be correct:</p>

<p>[ul]
[li]In no case did both statistical tests indicate significance for any scenario. This suggests that there may be no month/section combination that is sure to be significantly different from the average curve.[/li][li]In the broken test, many of the section/month combos actually were significant. However,** the effect was very small; typically less than 5 points with differences of 2-3 being more common.[/li][li]Variation was enormous.** Although this was not shown, the variance in the broken tests was quite large with differences between the average scaled score for a given raw score and the actual scaled score being as much as 26 points. [/li][li]Thus, even if you were to try and use these results to pick an appropriate date, you could get unlucky and very likely swing the opposite direction.[/ul]</p>[/li]
<p>To put it sweetly and simply for those of you who are getting tired of slogging through AP stats jargon:</p>

<p>There were no month/section combos that were significantly different from the average in both statistical tests. In the ones that were significant in one test, the effect was so small that it would not warrant changing testing date.</p>

<p>Once again, keep in mind that although the curves may be (very slightly) different from month to month, the tests are created so that the difficulty of the test and the curve cancel out so that each sitting is roughly the same difficulty.</p>

<p>Some of you may have noticed something that I did incorrectly. Because each month represented roughly one third of the data, I violated the 10% condition in which no more than 10% of the population may be sampled. However, the reason for this condition is because samplings above 10% cause the sample to no longer be normal and instead resemble the population. But because the population was roughly normal, I did not think this would be a major issue.</p>

<p>If there is anything that I did incorrectly throughout my mini-study, please tell me and I will try and fix it. I typed this up in a single day so I didn’t spend a lot of time developing a perfect method.</p>

<p>Finally, thanks to erikthered/fignewton for posting all of the past SAT curves. This mini-study would not have been possible without you.</p>

<hr>

<p>Questions? Comments? Concerns? Compliments? Rants? Rages? Post away!</p>

<p>ok, can someone summarize this?</p>

<p>what’s the best time to take the test?</p>

<p>

</p>

<p>10 characters</p>

<p>Nice little investigation you did there. And I could actually follow the math after taking AP Stats now!</p>

<p>Do you have a link to the last 20 scoring tables, that would be very useful info to have, thanks.</p>

<p>Uh, this is from a year ago guys :stuck_out_tongue:
I doubt he’s still responding.</p>

<p>Dang, I forgot that I did this. Glad you guys enjoyed it. And carlucill, if you just google search “erikthered SAT curves” it’s the first link that pops up.</p>

<p>Unfortunately, cheerioswithmilk, you reveal in your thread title and in your opening lines that you don’t know what you’re talking about</p>

<p>Look up the word curve*–you should fairly quickly realize that the SAT doesn’t have a curve.</p>

<p>Do some basic research into “equating”**, and you will see it has nothing to do with grading on a “curve.” In fact, equating is starkly different. Grading on a curve means that one student’s score is affected by the performance of other students. But College Board clearly states, “Equating also ensures that a student’s score does not depend on how well others did on the same edition of the test.” (emphasis added)</p>

<p>If you understand equating, then you also understand that it doesn’t matter if the score conversion appears easier on one date than another. An easier score conversion doesn’t mean that a student does better on that test–the whole point of equating is to make sure that students perform comparably on all tests, despite differences in content difficulty.</p>

<p><em>Try here: [Grading</a> on a curve - Wikipedia, the free encyclopedia](<a href=“http://en.wikipedia.org/wiki/Grading_on_a_curve]Grading”>Norm-referenced test - Wikipedia)
*</em>Try here: <a href=“The SAT – SAT Suite | College Board”>The SAT – SAT Suite | College Board;

<p>As others have pointed out, “the curve” has become a colloquial expression for the chart that converts raw score to scaled score. </p>

<p>But I still don’t see the point of this analysis. Here is why:</p>

<p>Suppose that this analysis revealed that every test had the exact same scale except for two exceptions. Say that -2 = 760 every time in both reading and math. But for some reason, January was always -2=740 and March was -2=800. This would be the kind of stark pattern you were hoping to find when you began the analysis of all those old curves, so let’s pretend you found it. What would you do with this information?</p>

<ol>
<li><p>You could decide to take March because the “curve” is so lenient. But wait – it’s lenient for a reason. The equating sections have revealed that the test was truly hard (and it wasn’t just that the students were weak – they did well on the equating questions). To benefit from the “lenient” curve, you will need to conquor the harder questions that led to the lenient curve in the first place.</p></li>
<li><p>You could decide to take January figuring that the curve was harsh because the actual questions were easier. The students did well, even though they did not do unusually well on the equating questions. So the actual test must have been easier that day. Who wouldn’t want to take an easier test? Oh wait – the curve will be harsher.</p></li>
</ol>

<p>So pick your poison. Either way, your score depends on what YOU do, not what the “curve” was. And besides, as Cheerios has found, there is no pattern anyway. So just go prepare for the test and then take the one that suits your schedule.</p>

<p>“As others have pointed out, ‘the curve’ has become a colloquial expression for the chart that converts raw score to scaled score.”</p>

<p>Who has pointed that out?</p>

<p>Perhaps you need to read more carefully, because the very first paragraph of cheerioswithmilk’s original post makes clear that he (or she) is talking about an actual curve–in which a student’s ultimate score is affected by his or her performance relative to other test takers. He (or she) is not using some nonsensical shorthand for the score conversion. See here:</p>

<p>“One of the most common theory is that the May test has the kindest curve because it is when the majority of slackers who have put off taking the test take the SAT, hence bringing down the curve. However, this theory has never been proven.”</p>

<p>To repeat, “curve” already has an established meaning where grading is concerned–a meaning with which all students are already familiar. To continue the inaccurate use of the term “curve” in the context of SAT scoring–even when you know full well how misleading it is?–is to help perpetuate a basic misunderstanding that leads to so many bad choices and wastes so many people’s time.</p>

<p>pckeller, I have known real, actual students who have forgone the opportunity to take the SAT on what–in hindsight, at least–appeared to be the opportune date, simply because of some superstitious belief that that was when “all the seniors” took it or “all the smart kids” took it.</p>

<p>Do you really see no harm in perpetuating this kind of foolishness simply because you take some inexplicable comfort in referring to something (an emphatically uncurved grading scheme) as its opposite (a “curve”)?</p>

<p>Not sure how you can read my post and say that I am helping to perpetuate this foolishness. It was my intention to show that trying to choose which day to take the SAT based on what you think the score conversion chart will be like is useless even if you COULD predict what the chart will be like – and you can’t.</p>

<p>Maybe I didn’t say it clearly enough or maybe you didn’t read it clearly enough. Maybe both. In any case, I’ll say it again: just prepare for the test and take it whenever it suits your schedule. [I don’t see how you can parse that sentence to that it perpetuates the foolishness you are worried about!]</p>

<p>As for the colloquial use of the term “curve” here at CC, it was mentioned in another thread. So I was trying not to take credit for a point someone else had made. </p>

<p>I have another suggestion about this, but I’ll put it in its own thread.</p>