HA,HA, A Google Ngram of Amherst, Williams and Wesleyan (The Little Three)

<p>Substitute your own case sensitive, comma separated phrases:
Google</a> Ngram Viewer
[thanks, to anonymous <em>Wesleying</em> shoutbox poster]</p>

<p>That’s pretty interesting. I wonder why there’s such a big difference.</p>

<p>So is this saying how many books in each time period mentioned the schools?</p>

<p>^Yes.
There are some stray references to <em>other</em> Wesleyan Universities, but, not as many as you might think. The real culprit seems to be the imprimatur of The Wesleyan University Press backlist after about 1960. :p</p>

<p>With HYP added in, </p>

<p>[Google</a> Ngram Viewer](<a href=“http://ngrams.googlelabs.com/graph?content=Wesleyan+University,Amherst+College,Williams+College,+Harvard+University,+Princeton+University,+Yale+University&year_start=1800&year_end=2000&corpus=0&smoothing=3]Google”>http://ngrams.googlelabs.com/graph?content=Wesleyan+University,Amherst+College,Williams+College,+Harvard+University,+Princeton+University,+Yale+University&year_start=1800&year_end=2000&corpus=0&smoothing=3)</p>

<p>Goggle finds ten Wesleyan Universities!</p>

<p>Dakota Wesleyan University
Illinois Wesleyan University
Indiana Wesleyan University
Kansas Wesleyan University
Nebraska Wesleyan University
Ohio Wesleyan University
Oklahoma Wesleyan University
Southern Wesleyan University
Texas Wesleyan University
Wesleyan University, Connecticut</p>

<p>There are also 9 Wesleyan Colleges.</p>

<p>[Google</a> Ngram Viewer](<a href=“http://ngrams.googlelabs.com/graph?content=Wesleyan+University,Amherst+College,Williams+College,Harvard+University,Princeton+University,Yale+University,Jesus&year_start=1800&year_end=2000&corpus=0&smoothing=3]Google”>http://ngrams.googlelabs.com/graph?content=Wesleyan+University,Amherst+College,Williams+College,Harvard+University,Princeton+University,Yale+University,Jesus&year_start=1800&year_end=2000&corpus=0&smoothing=3)</p>

<p>With Jesus added in :)</p>

<p>I like how Princeton has overtaken Yale in the last 20 years.</p>

<p>Well…Jesus is the winner</p>

<p>With “Brown University” (and, minus Jesus)
[Google</a> Ngram Viewer](<a href=“http://ngrams.googlelabs.com/graph?content=Wesleyan+University%2CAmherst+College%2CWilliams+College%2CHarvard+University%2CPrinceton+University%2CYale+University%2C+Brown+University&year_start=1800&year_end=2000&corpus=0&smoothing=3]Google”>http://ngrams.googlelabs.com/graph?content=Wesleyan+University%2CAmherst+College%2CWilliams+College%2CHarvard+University%2CPrinceton+University%2CYale+University%2C+Brown+University&year_start=1800&year_end=2000&corpus=0&smoothing=3)</p>

<p>Well, there is ONE that can beat Jesus.
[Google</a> Ngram Viewer](<a href=“http://ngrams.googlelabs.com/graph?content=Wesleyan+University%2CAmherst+College%2CWilliams+College%2CHarvard+University%2CPrinceton+University%2CYale+University%2CJesus%2CGod&year_start=1800&year_end=2000&corpus=0&smoothing=3]Google”>http://ngrams.googlelabs.com/graph?content=Wesleyan+University%2CAmherst+College%2CWilliams+College%2CHarvard+University%2CPrinceton+University%2CYale+University%2CJesus%2CGod&year_start=1800&year_end=2000&corpus=0&smoothing=3)</p>

<p>And yeah, there’s a (relatively) lot of Us with Wesleyan in the name,</p>

<p>[Google</a> Ngram Viewer](<a href=“http://ngrams.googlelabs.com/graph?content=God,United+States&year_start=1800&year_end=2000&corpus=0&smoothing=3]Google”>http://ngrams.googlelabs.com/graph?content=God,United+States&year_start=1800&year_end=2000&corpus=0&smoothing=3)</p>

<p>Im actually baffled. Of course, God came out on top. But around 1920, 1940, and 1970, the good ol USA was on top.</p>

<p>With Wes Press + “Swarthmore College”:
<a href=“http://ngrams.googlelabs.com/graph?content=Wesleyan+University+Press%2CAmherst+College%2CWilliams+College%2C+Swarthmore+College&year_start=1800&year_end=2000&corpus=0&smoothing=3[/url]”>http://ngrams.googlelabs.com/graph?content=Wesleyan+University+Press%2CAmherst+College%2CWilliams+College%2C+Swarthmore+College&year_start=1800&year_end=2000&corpus=0&smoothing=3&lt;/a&gt;&lt;/p&gt;

<p>

</p>

<p>Not exactly. I believe it is showing how many times the queried words/phrases occured during each time period in the selected collection of books. I believe it is counting occurences not books. Occurences in specific books, that is.</p>

<p>Possible Issues</p>

<p>false positives (some of the “Wesleyan” hits might not be the Wesleyan you mean)
false negatives (some occurences you expect may be missed due to spelling, usage,or orthographic variation)
coverage of the collections (the books chosen to comprise the collection may not be appropriate for what you are trying to analyze; for example, if the “English” collection consisted of nothing but church hymnals, or nothing but automobile repair manuals, you probably would not get good coverage of school names. We can see from the link that there is some coverage of school names … but how well does it represent what’s been on the minds of people we care about?)</p>

<p>You can experiment with expressions that you expect would not have entered the language before a certain date/period. Example: “under represented minority”, “hippie”, “bra burning” … “Stanford University”. In most such cases I’ve tried, the curve seems to take off about where I’d expect … although, I also seem to be getting some patterns we might not expect if the samples were randomly spread across the entire 200 year period.</p>

<p>Actually the other schools with the name Wesleyan in them account for very little… Wesleyan University really is indeed more popular:</p>

<p>[Google</a> Ngram Viewer](<a href=“http://ngrams.googlelabs.com/graph?content=Wesleyan+University,Ohio+Wesleyan+University,Indiana+Wesleyan+University,Illinois+Wesleyan+University,Nebraska+Wesleyan+University,Southern+Wesleyan+University,Kansas+Wesleyan+University,Oklahoma+Wesleyan+University&year_start=1793&year_end=2008&corpus=0&smoothing=3]Google”>http://ngrams.googlelabs.com/graph?content=Wesleyan+University,Ohio+Wesleyan+University,Indiana+Wesleyan+University,Illinois+Wesleyan+University,Nebraska+Wesleyan+University,Southern+Wesleyan+University,Kansas+Wesleyan+University,Oklahoma+Wesleyan+University&year_start=1793&year_end=2008&corpus=0&smoothing=3)</p>

<p>I think that Johnwesley’s astute observation about “Wes Press” is that Wesleyan University has its own publishing house that publishes books and periodicals. References to Williams, Amherst, et al are actual references to those colleges, but the words “Wesleyan University” appear in every book and periodical that Wesleyan University Press publishes. Wes Press started in 1957; you’ll notice that that’s when the huge spike in literary references to Wesleyan University began.</p>

<p>

</p>

<p>Well, yea and no. If it were just a matter of twelve or thirteen books being published a year, it wouldn’t add up to very much. However, as Tk21769 points out @post#13, it’s probably the <em>references</em> to the books and articles by <em>other</em> books and articles that act as a multiplier. In essence, any reference to Wes Press is a reference to a signature part of Wesleyan’s academic output. :)</p>

<p>Keep in mind that we are running these n-gram queries against a specific “corpus”. How many Wesleyan University Press publications are likely to be included in the Google “English” corpus (collection)? </p>

<p>I suspect that when you get your hits back on “Wesleyan University Press”, chances are, you are getting a lot of hits on bibliographic citations … but perhaps not so many on the colophon pages of actual Wes Press publications.</p>

<p>In other words, in building its text collections, it’s unlikely that Google would pile on every single Wes Press publication even if they were all available in digital form. References to Wes Press just happen to show up in other books. </p>

<p>You have to be aware that for each hit on any string of N word-elements, it is possible that the hit is actually an unanticipated substring of N+M word-elements … and that, in such a context, it is really a false positive for your purposes. But as N gets larger, the probability of its being a false positive tends to decrease.</p>

<p>^^I think you’re right. If you look at the upper left hand side of the Googlebooks search results for “Wesleyan University Press” you’ll see a hit count of 400,000. They can’t possibly all be from the title page of Wes Press publications :stuck_out_tongue: <a href=“http://ngrams.googlelabs.com/graph?content=Wesleyan+University+Press%2CAmherst+College%2CWilliams+College%2C+Swarthmore+College&year_start=1800&year_end=2000&corpus=0&smoothing=3[/url]”>http://ngrams.googlelabs.com/graph?content=Wesleyan+University+Press%2CAmherst+College%2CWilliams+College%2C+Swarthmore+College&year_start=1800&year_end=2000&corpus=0&smoothing=3&lt;/a&gt;&lt;/p&gt;