The Economist asked OpenAI’s ChatGRT and Google’s LaMDA 10 reading questions from the SAT and 10 math questions from the AMC (American Mathematics Competitions). ChatGRT answered 9 SAT reading questions and 3 AMC math questions correctly, while LaMDA got 7 SAT reading questions and 5 AMC math questions correctly:
With the help of an engineer at Google, we asked ChatGPT, based on an Open<AI model called GPT-3.5, and Google’s yet-to-be launched chatbot, built upon one called LaMDA, a broad array of questions. These included ten problems from an American mathematics competition (“Find the number of ordered pairs of prime numbers that sum to 60”), and ten reading questions from the SAT, an American school-leavers’ exam (“Read the passage and determine which choice best describes what happens in it”). To spice things up, we also asked each model for some dating advice (“Given the following conversation from a dating app, what is the best way to ask someone out on a first date?”).
Neither AI was clearly superior. Google’s was slightly better at maths, answering five questions correctly, compared with three for ChatGPT. Their dating advice was uneven: fed some actual exchanges in a dating app each gave specific suggestions on one occasion, and generic platitudes such as “be open minded” and “communicate effectively” on another. ChatGPT, meanwhile, answered nine SAT questions correctly compared with seven for its Google rival. It also appeared more responsive to our feedback and got a few questions right on a second try. Another test by Riley Goodside of Scale AI, an AI startup, suggests Anthropic’s chatbot, Claude, might perform better than ChatGPT at realistic-sounding conversation, though it performs worse at generating computer code.
The race of the AI labs heats up | The Economist
Subscribe-walled - I’m interested in what “AMC math” questions means. If it’s the first 10 questions of an AMC 10, I would actually expect better.
The Economist article doesn’t say. But from the example it gives, the math questions were probably from AMC 12.