So what was the historically easy SAT administration date? Comments?

fignewton · August 9, 2009, 2:29pm

^Xiggi, I don’t see how those two statements are contradictory: a comparison is made not between two students taking the same test, but between a group of students taking a new test with a group that took a previous test. I describe the scaling procedure below as clearly as I can; it is consistent with both statements. Regardless, scaling doesn’t work by parsing a sentence on a web site; rather, it is a logical, mathematical procedure well described in the PDF I linked to above and described below. I’m not a language person, so let’s not debate semantics. If you have a procedure in mind which makes math sense, and which is supported by both the CB web site and technical docs, please describe it in detail.

A standardized test simply doesn’t work unless you have a basis of comparison by which you can separate two situations: a) variations in test difficulty, and b) variations in the ability of the test-takers. Imagine a case where two SATs happened to be identical in difficulty and the first group of test takers was much stronger than the second. The distribution of raw scores would go down from the first test to the second. Now imagine a case where the first SAT was easier than the second, but the two groups of test takers were of the same ability level. Again the raw scores would go down. In the first case, the scaled scores should on average be lower for the second group: they are weaker students after all. The curve should therefore stay the same for both tests. In the second case, the scaled scores should on average be the same: the two groups of students have the same ability and it would be unfair not to help the second group with a more generous curve.

In general both the test difficulty and the test takers’ ability level can vary, and neither are known before the test date. How do you determine ability level separately from test difficulty? By comparing the new bunch of test takers with a previous bunch that had the same equating sections. The difference in the two distributions of raw scores on these sections (which have identical questions) tells the CB “OK, this new bunch of students is lower in ability than the previous bunch, since they didn’t do as well on these identical questions”. Or, it could say “OK, this new bunch of students is right in line in ability with the previous bunch, because they did just as well on these repeated questions.” Or, it could say: “OK, this new bunch of students is higher in ability than the previous bunch, since they did better on these identical questions”.

At this point the CB can correct for these varying ability levels before determining whether the test was easier, harder, or just right. For example, in the case of two groups with the same abilities, the two distribution of raw scores on the scored sections can be compared directly. If the second set of scores is lower on average than previously, for example, it is because the new test was a little harder than before and the curve should be more generous. In the case of the second group being lower in ability, the second distribution of raw scores on the scored sections would have to be shifted up (simplifying nasty math here) before a comparison to the scores of the first group can be made. If these two distributions are now the same, for example, then the test difficulty was just right and the curve should be the same as it was before.

All of this leads to several effects: 1) curves aren’t known ahead of time and take after-the-fact number crunching to determine; 2) the curves will vary from one test to another, and the variation in any particular month is random; 3) percentiles are not fixed and the final score distribution is not adjusted to fit a particular bell curve (the % of people scoring 750 and above, e.g., will vary from one test to another, as will the average score); 4) your score does not depend on how well others did on the test.

pckeller · August 9, 2009, 2:55pm

Xiggi and Fig - I have a question for you about this post. (It is just curiosity about the theory – I don’t think any of this is important for the test taker, where so many more important and more controllable factors affect their score.)

If you are right that on a given day, the test taking cohort’s performance on the equating section is what they use to determine whether the group was a stronger or weaker group, how is it also true that other people’s scores don’t affect mine? For example, suppose I was –

Never mind. Just thought it through…OK, so here is what you have to do to game the system: you have to figure out what day of the year will have students who are likely to do really well on the experimental sections while bombing the sections that count. That’s the day to take the test. Shouldn’t be too hard to figure out…

Anyway, thank you both for the interesting posts.

xiggi · August 9, 2009, 3:21pm

Fig, inasmuch as I understand your theory and the white paper you quoted, I believe that item 1 and 4 in your conclusion cannot coexist.

1) curves aren’t known ahead of time and take after-the-fact number crunching to determine 4) your score does not depend on how well others did on the test.

If the scaling and the curves require number crunching after the fact and rely on ANY part of the performance of the students taking THAT particular test, it CANNOT be said “your score does not depend on how well others did on the test” because it intimates that the scores of the entire cohort are determined by an analysis of that same cohort against historical tests.

Simply stated, only one of the statements “1) curves aren’t known ahead of time and take after-the-fact number crunching to determine and 4) your score does not depend on how well others did on the test” can be true. After all, why would there be a need to crunch numbers of the last test if the performance of students would not affect anyone taking the same test. This is different from saying that the performance of a group on the experimental sections WILL affect the FUTURE test takers!

Please note that, for all I KNOW, your reliance on the white paper might very well lead the ONLY truth. After all, the writers are not exactly outsiders offering idle speculation. As I wrote earlier, I prefer to rely on the clear and concise statement posted on the TCB site because it reflects a number of prior statements by the organization on the issue of equating.

And, again, I do not think that this “problem” has any relevance to any students planning to prepare or take the SAT tests.

aluminum_boat · August 9, 2009, 3:36pm

The only arguement anyone can make for a good curve is with subject tests.
June ones have a harsh curve because everyone remembers stuff from AP and honors classes.
October on the other hand… Is easier curve-wise.

I think

pckeller · August 9, 2009, 4:15pm

Here is how you can reconcile the two seemingly conflicting statements.

They can make the statement" “Your score is not effected by the scores of other test takers” while still using everyone’s scores to set the curve because as Fig argues, they are using your scores TWICE. “Not effected” is not the same as “not used”.

If you use the results of the experimental sections to compare the group to other groups and then use the results of the group to set the curve, I think those effects cancel each other out in terms of your score being effected by the scores of the other test-takers. That’s what I meant when I made my tongue-in-cheek comment that you should find test-takers who score high in the experimental (so that the group seems smart) and then low on the rest (so that the test seems worthy of an easier curve). The fact that such a test-taking cohort is not likely to exist enables the SAT to claim that your score is not affected by your cohort.

fignewton · August 10, 2009, 5:36pm

This isn’t easy stuff, and I’m not pretending to comprehend the details (“item response theory” among other things). I think pckeller is starting to get it; I’m pretty sure I wouldn’t have understood this stuff when I took the SAT. None of these details matter to an SAT taker unless they come to an incorrect conclusion. Unfortunately, to understand equating/scaling even at a basic level requires reading the white paper, perhaps several times, or some other detailed description of the process. A few sentences on the CB web site (none of which I disagree with) about the consequences of the process could lead to different ideas about the process itself, unless you read the technical docs (specifically, see the second column on page 2 of the white paper).

Let’s take an example which I assume is what people are worried about: student X (an average math student, say, who robotically gets 500 or so every time) takes a new SAT when it just so happens a large group of strong math students take it, unlike the last time X took it. I have heard many people, on CC as well, say that this will cause X’s score to be lower than if those good math students hadn’t taken the new test. If this happened on a non-standardized test, say in high school, scores might very well be adjusted (curved) to fit a bell curve centered on a C, and X might now get a D instead of a C: those darn math students have pushed X’s score down! But the SAT doesn’t work that way: the test is not scored to fit a bell curve centered on 500 or to allow only a certain number of people to score above 750. In the above case, X will still get a 500 and there will simply be an unusually large number of high scores. The CB statement is saying exactly that of course: X’s score (or any other person’s score on that test) does not depend on or is affected by those high scores by the good math group.

Does the above follow from the white paper and the process I’ve detailed? Yes. Without the equating sections, all the CB would know was that a lot of people had high raw scores (say, 50-54 on the math) on the new test. They wouldn’t be able to tell whether the test was too easy, or whether a large bunch of good math students took the test, since both would result in a lot of high raw scores compared to previous tests. And if they assume “new test was too easy”, they would have to use a harsh curve, which pushes X’s score down. That would be unfair and not in agreement with CB’s web site!

But the equating sections and scaling process allow the CB to avoid this problem. The raw scores on the equating questions (which are the same as on the previous test) will be on average higher on the new test (since the good math students will get good raw scores on the equating section as well), and the CB says (after the new test administration): “OK, we had a large bunch of good math students taking the test. If the new test is identical in difficulty to the previous test, we will see the same proportion of high raw scores on the scored math sections”. If the scored part of the test is the same difficulty as the previous test, the CB is not fooled by the unusually large number of high raw scores on the scored math sections, assigns a typical curve, and X gets a 500 as usual, and there are lots of 700s, 800s. No bell curve here!

The CB can’t assume that the new test is the same difficulty, but now they can figure that part out too since they can separate test difficulty from quality of the test takers. Let’s say the test were too hard: X would get a raw score of 20, say, instead of 27 and even the good math students would look more normal; perhaps the distribution of raw scores looks just like the previous test (which had no large math bunch). The CB might otherwise assign a typical curve since the distribution of raw scores is the same, and X’s score would be a 400 (say). But with the equating section results, the CB now knows the quality of the test takers is high and expects a distribution of new raw scores that is on average better than before. Since it isn’t, they know “Oops, test was way too hard”, assign a very generous curve which gives X’s raw score of 20 a scaled score of 500.

Are lots of comparisons being made between raw scores on this test and the previous one? Yes. The equating sections are compared to determine the ability level of the test takers. Using that information, the raw scores on the scored sections of the new test vs. the previous one are then compared to determine test difficulty. Is a comparison being made between X’s raw score on this test and the raw scores of the large math bunch on the same test? In a distributional sense, yes. Figuring out averages, standard deviations, skews and so forth imply that these same-test comparisons are being made. But, in the end, X’s score doesn’t depend on or is affected by the existence of the large math bunch.

Whew! Too much typing. I agree, Xiggi, what really matters is that we aren’t at odds about the stuff that’s important for test takers to know. Namely, tests are of random difficulty (the difficulty doesn’t matter anyway thanks to the curve), and your score isn’t affected by other peoples’ scores on that test.

112358 · August 10, 2009, 7:01pm

Summary of this thread: It doesn’t matter when you take the SAT.

Bigb14 · August 10, 2009, 7:03pm

^Well said.

println · August 10, 2009, 9:03pm

What do you guys think about these statistics: <a href=“http://www.erikthered.com/tutor/SAT-Released-Test-Curves.pdf[/url]”>http://www.erikthered.com/tutor/SAT-Released-Test-Curves.pdf</a>

And the graphs on that page say that from left to right = easiest to hardest. Isn’t it the other way around?

pckeller · August 10, 2009, 9:15pm

Statistically, I don’t believe in easy SATs or hard SATs. But that does not mean I am completely without superstition/mistrust. I usually recommend that whenever possible, take the SAT on a date when QAS is available, even if you don’t plan on ordering it. My demented logic is that since the test will eventually be made public, they can’t do anything screwy with it. (But exactly what screwiness I am worrying about I can’t actually say. So this is really as pointless as worrying about the curve. But still…)

orange_peel · August 10, 2009, 9:34pm

So rather than read 5000 words of this. have u guys basically come to the conclusion: There is NO easier test date. Right?

1253729 · August 10, 2009, 10:23pm

Yes, there is NO easier test date. It’s a standardized test. A 750 in October will be just as hard to get as a 750 in March and a 200 in November will be just as easy to get as a 200 in June.

QuantMech · August 11, 2009, 12:14am

I read the CB material the way that fignewton does, and I posted something similar to pckeller’s comments on an earlier thread when the curve issue came up. I agree, what you would really like is to have a group of fellow testers who do really well on the equating section and then (comparatively) bomb the part that counts! I went beyond this to guess that students who had extensive test prep courses might have that pattern–after all, the equating questions must have appeared on the SAT before, so the prep companies should be on to them, even if they can’t use them verbatim. So, if prepped students preferred some particular dates . . . The SAT-Released-Test-Curves data look fairly random as to easier/harder months (at a quick glance), but they don’t cover all administrations during the year.

Thread		Replies	Views
Curve on the SAT - which test date is better? Test Preparation	38	391	April 14, 2021
"Expert" Consensus: There is NO easy day or hard day to take the SAT SAT Preparation	14	33	April 16, 2021
Best SAT curve date? SAT Preparation	39	2232	April 16, 2021
SAT/ACT Is there a sitting where more "average" kids take the test? Parent Cafe	26	177	April 14, 2021
Is Any Month's SAT Curve Easier? A Mini-Study and an Answer SAT Preparation	11	260	April 16, 2021

So what was the historically easy SAT administration date? Comments?

Related topics

CONNECT WITH US