Just as with the inaccurate percentiles that were released for the PSAT, the SAT percentiles for March administration of the new exam are derived from a “study sample,” not from the actual scores of test-takers. If you click on some of the information bubbles on the score reports you find these statements:
User group percentiles are derived via research study samples of U.S. college bound students in 11th and 12th grades, weighted to represent students who typically take the SAT last as 11th or 12th graders.
Nationally representative percentiles are derived via research study samples of U.S. students in 11th and 12th grades, weighted to represent all 11th and 12th grade U.S. students, regardless of whether they typically take the SAT.
I understand that College Board was in the practice of using the previous year’s test percentiles to norm the current year’s tests so that slight differences could be compensated for, and no one would have an advantage or disadvantage from choosing a particular test date. So I understand that, since this is a new test, they don’t have previous years’ tests to use, and are using these study group results to provide percentiles for this purpose. Nevertheless, I somehow have the feeling that the real percentiles of the actual test takers would have been a better choice.
That’s correct. I understand why the College Board tries to norm across a series of tests when they have the history to show that the test scores should generally be similar, but when they use a single sample study it has the opposite effect. As I said, they did that with the October PSAT and the published percentiles were significantly off-base.
@Agentninetynine I’ll have to see if I can “Get Smart” enough to answer this. I’m not sure, but I think it has to do with differences between groups of testers. There might be certain test dates that attract more able test takers (or less able ones), so the percentiles for one particular test date group might differ from another and need to be normalized. I’m not sure, but maybe it also makes things fair if one version of the test is found to be slightly easier or harder than another in the same year.
My son was actually in this test sample. Kids in his high school were paid 50 to take the new SAT in Dec 2015. The only requirement was that they had already taken the old SAT in Nov or Dec. So he actually ended up taking both SATs 5 days apart. His math score went from 780 to 750 and his reading score rose from 720 to 730. He did much better in the now optional writing section. The kids knew they wouldn’t get their scores till March so I wonder if the seniors who took the test really did there best because it really wouldn’t count for admissions for them. This could account for some of the inaccuracies in the percentiles.
My daughter was also in the test sample. It sounds like the same deal as @robincorn.
Within 3 weeks, DD took the old SAT, new SAT, and as it happens, the ACT.
DD did best on the ACT with a 28.
Reading score went from 510reading/640writing to 700 which is more in line with ACT (31, 29 on ACT.) The 510 on the “old” reading was an outlier to any standardized test dd has taken in the past.
Math score went from 570 to 580 - this is in line with the ACT.
Essay went from 10 out of 12 to 8/8/8.
$50 went to the mall stores.
Personal sampling is far different from statistical sampling. In research, the sample group is identified randomly because random sampling most often results in a group that matches the entire population of testees for whom the test is designed. Statistically speaking, researchers can calculate how well the sample matches the whole population. Results from random samples are far better than trying to be too cute by selecting specific members of the population who match population characteristics. Through statistics, researchers can determine the size of the sample needed to result in solid information about the population.
Using a sample allows researchers to recognize small variations between testees on different test dates, declines or increases in test scores overtime, when tests need to be re-normed because current scores are out-dated. For example, IQ tests are re-normed when children have learned more information–intellectually more sophisticated that in past populations. Samples are used to set interpretative data such as scores, percentile ranks, the normal curve, mean (average), standard deviation (variation on a normal curve) and so on and to verify the validity (what the test says it measures) and reliability (accuracy of scores).
College Board could not use prior norms because of a significant change in the test–moving from paper-based to computer-based exams. Instead, CB researchers determined the consistency in scores from one format to another, If the rank order of scores paper and pencil version differed significantly from the computer version, CB would have developed a new admissions tests that needed further research before marketing.
You may be confident about the norms between the old and new versions, meaning that the change in format did not result in modification of scores. For example, if you score at the 90%ile on the paper version, you should earn a comparable (not exact) percentile rank on the computer version. Statistical sampling is very different than our usual sampling such as tasting various spoonfuls of ice cream to decide which flavor we prefer at Baskin Robbins.
Personal sampling is far different from statistical sampling. In research, the sample group is identified randomly because random sampling most often results in a group that matches the entire population of testees for whom the test is designed. Statistically speaking, researchers can calculate how well the sample matches the whole population. Results from random samples are far better than trying to be too cute by selecting specific members of the population who match population characteristics. Through statistics, researchers can determine the size of the sample needed to result in solid information about the population.
Using a sample allows researchers to recognize small variations between testees on different test dates, declines or increases in test scores overtime, when tests need to be re-normed because current scores are out-dated because of changes in the population. For example, IQ tests are re-normed when children have learned more information–intellectually more sophisticated that in past populations. Samples are used to set interpretative data such as scores, percentile ranks, the normal curve, mean (average), standard deviation (variation on a normal curve) and so on and to verify the validity (what the test says it measures) and reliability (accuracy of scores).
College Board could not use prior norms because of a significant change in the test–moving from paper-based to computer-based exams. Instead, CB researchers determined the consistency in scores from one format to another, If the rank order of scores paper and pencil version differed significantly from the computer version, CB would have developed a new admissions tests that needed further research before marketing.
You may be confident about the norms between the old and new versions, meaning that the change in format did result in modification of scores. For example, if you score at the 90%ile on the paper version, you should earn a comparable (not exact) percentile rank on the computer version. Statistical sampling is very different than our usual sampling such as tasting various spoonfuls of ice cream to decide which we prefer at Baskin Robbins.
Thank you @hebegebe for the explanation of why the tests are curved. I suspected as much but wasn’t sure.
And thanks to @BunnyBlue and @zannah for your insights.
If you object to this, you may want to read about the item development and selection for the SATs and ACTs. It would be an eye opener. People’s objections are typically about steps that are pretty far into the process. Look at the test development!