Analyzing Wordle (may contain spoilers)

After seeing the other thread, I started playing Wordle recently. I enjoy the game and analyzed portions of the game using a modified version of a simulator I found on github. This post summarizes some of the results and contains some spoilers that could make the game less enjoyable for some.

STARTING WORD
Among all 5 letter words allowed on Wordle, the most commonly used letters are:

  1. s
  2. e
  3. a
  4. o
  5. r
  6. i
  7. l
  8. t
  9. n
  10. u

The list above using all valid Wordle words. However, the NYT Wordle words are not a random selection of all available words. For example, they do not use plurals of 4 letter words that end in ‘s’, or words with similar types of ‘ed’ or ‘ing’ extensions. They also choose more commonly used known/words and have various other patterns. If I restrict the 5 letter words to NYT type Wordle words, places a strong weight on recent actual words, then the list changes to.

  1. e
  2. a
  3. r
  4. o
  5. t
  6. l
  7. s
  8. i
  9. c
  10. n

In general choosing any word with 5 unique letters among the frequently used letters above will do well. I found the lowest average number of guesses with the following starting words, assuming answers are limited to NYT-type Wordle words.

  1. ROATE
  2. PRATE
  3. SLATE
  4. TRACE
  5. ORATE
  6. CRATE
  7. REAST
  8. CARET
  9. STARE
  10. TRAPE

There are clearly some patterns. All 10 of the words include the letters, ‘e’, ‘a’, and ‘t’. 9 of the 10 also include the letter ‘r’. The remaining letter is more variable. There are also patterns in letter placement. For example, 9 of the 10 have the A in the middle position. The top ranked word ROATE makes sense. It uses the 5 most commo letters listed above, in the most logical order. ORATE and OATER also use these letters, but have clearly less optimal order since there are fewer words that begin with O than R. However, the 2nd word PRATE may be less obvious. The starting P is slightly beyond the top 10. I suspect PRATE partially does well because of a combination of letter positioning and the P being especially useful for some specific troublesome NYT words that would be 4+ guesses without knowledge of the P.

The difference between the top ranked word and 10th ranked word above is only a negligible 0.03 difference in average number of guesses with optimal strategy. With such slight differences in average number of guesses by starting word, the lowest average varies depending on minor details in strategy and allowed words, which relates to why different persons who have done a starting word analysis have come to different conclusions. Some of the top ranked starting words by others who have done this type of simulation include ROATE, TRACE, REAST, CRATE, CRANE, SALET, and TRAPE (others only seem to rank TRAPE top for hard mode). Rather than optimal starting word, the more critical part is making good later guesses.

LATER GUESSES / STRATEGY
As noted above, strategy for later guesses depends on what set of solutions you are using. I am assuming the NYT-type word list mentioned above. Wordle solutions may only be on this list of words. I am also assuming optimizing for lowest average number of guesses. If the goal is to have highest chance of getting a 2 guess solution or lowest chance of getting a >4 guess outcome, the strategy differs.

This more important part of the strategy is also the one that is more complicated and difficult to describe or optimize, which contributes to what makes the game interesting. There is not a simple solution. In the future I plan to compare some more straightforward to describe non-optimal strategies for real world usage. Some possible optimal 2nd words for ROATE are below. While these 2nd words produce the lowest average number of guesses on average, many are not obvious.

For example, if you only get an ‘a’, I expect most people would choose a 2nd word that used an ‘a’ and some common letters besides the ones in ROATE. I expect very few would consider using LYSIN as a 2nd word, even if they had a high vocabulary level. LYSIN is a good choice because the most commonly used letters in the available remaining solutions that have an ‘a’ and no ‘ROTE’ are ‘l’, ‘y’, ‘n’, ‘i’, and ‘s’. LYSIN hits all 5 of these desired letters. This is slightly more optimal than trying to also include the ‘a’ with INLAY or LAYIN.

If nothing hits, 2nd word is SLIMY

If only r and wrong position, 2nd word is SCULK
If only o and wrong position, 2nd word is SNOOL
If only a and wrong position, 2nd word is LYSIN
If only t and wrong position, 2nd word is SHUNT
If only e and wrong position, 2nd word is FIELD

3 Likes

@Data10: I love your analysis and logic.

I’ve been using TEARS as my first guess and, depending on results, WOULD as my second guess. That takes care of all the vowels except Y and I, and I almost always get the word in four tries, and not infrequently in three tries.

The other thing to consider is what I call “consonant pairs” – eg, The combo of TH of PH or CH, or combos at the end such as NT or RT or ST.

The thing that trips me up – and, I know from reading our threads, other people as well – is when there are several words that could fit what I’ve figured out. For example, if I am sure of

_LO_E

then the answer could be:
CLONE
GLOVE
GLOBE
BLOKE
ELOPE

and undoubtedly some others.

2 Likes

Thanks @Data10 That analysis is interesting! I personally find that ruling in/out consonants is more valuable to me than vowels - thoughts? I agree with @VeryHappy, once I know a letter or two I definitely look for consonant pairs. Also agree hardest is when multiple words can fit and/or when words have 2 double letters. Given your info. I may change my approach - my first word is always smart (and m is not on either list) and, if no letters from smart are in, I go to blend (no b or d on either list) and I’m missing c and n from your second list. Always love your statistical analyses - thanks!

OK - stick with the NYT y’all.
i downloaded a wordle app - and then deleted it.
it is not the same; it told me the double letters in the word i guessed were both right; but they couldn’t have been both in the word. And after hints, only one of the double letters was in the word even though both showed as a yes. SO . . . that was confusing and it is now GONE. sticking with NYT.

It’s all about letter combinations. The easier ones are two letter combos…but sometimes you need a three letter one rts, spr, str, nts, and so on.

1 Like

Some of the top 10 lowest average number of guesses starting words from the first post used the same 5 letters as TEARS, but the order is different since NYT doesn’t include solutions that are plurals ending in S. If you are instead playing Wordle on a platform that randomly chooses from any possible 5-letter words including plurals, then TEARS is probably a better choice than the words from the first post.

The way I usually handle this type of situation is try to find a word that distinguishes between the possible solutions, rather than try to guess the solution. For example, rather than guessing a word with the listed correct L, O, and E; I might instead guess BINGE. BINGE should determine which of the 6 possible solutions is correct, so I am guaranteed to get the result in 2 guesses (assuming solution is one of the listed words).

if ‘n’ hits, solution is CLONE
if ‘g’ and not ‘b’, solution is GLOVE
if ‘g’ and ‘b’, solution is GLOBE
if ‘b’ and not ‘g’, solution is BLOKE
if no hits, solution is ELOPE

However, guessing GLOBE is also a good option… 1/5 chance of guess in 1, 3/5 chance of guess in 2, and 1/5 chance of guess in 3. It has the same average of 2 guesses as BINGE, but there is greater variance.

1 Like

I know all this is true, but it seems to me you are optimizing for getting it in three, and ensuring you will never get it in two.

I think of it more as ruling out letters that are likely to be among the solutions. This is often commonly used letters, which may be vowels or may be consonants . Using a real example, suppose someone played today’s NYT Wordle with first guess of ROATE. The hits would be:

?O?T?

There are dozens of possible 5 letter words that meet this criteria, but most of them are not likely NYT Wordle solutions, such as plurals or obscure words. The more likely options include the following 9 words. In this example, a ‘u’ vowel or 2nd ‘o’ could help, but an ‘i’ would not. (‘a’ and ‘e’ were ruled out). The most frequently used unknown letters in these remain words are ‘h’, ‘y’, ‘u’, ‘m’, and ‘s’. ‘o’ and ‘t’ are also useful in different positions. Some of the listed possible words use these letters. So MOUTH is a good guess. Or one could rule out more options by guessing something like MUSHY. Both of these include a mix of a ‘u’ + consonants

booth
lofty
motto
mouth
month
sooty
south
tooth
youth

Yes, I tend to play it safe and get it in 3 rather than go for 2 and risk 4+ guesses. The optimal strategy depends on whether your goal is lowest average number of guesses, best chance of 2, or least chance of 4+. My natural style is to avoid risking the especially poor outcome. I sometimes find myself erring in this direction in other games, unless I work through the odds.

1 Like

Oh, I totally get it. I am very risk adverse IRL, but much less so for games. I play a card game we call “Screw your neighbor” (but is called “Oh hell” by most of the rest of the world) with Caltech friends, who sometimes take it way too seriously! I keep saying that one of these days I am going to review probability and figure out the odds for some of the very common situations, but I haven’t gotten around to it.

So is everyone here not playing in hard mode?

Initially I had not realized there was a hard vs non-hard mode but started out playing it the hard mode way. Once someone pointed out the option, I tried playing in non-hard mode but couldn’t shake the hard mode approach.

What is the difference between hard mode and regular?

1 Like

If a letter you guessed in your first round of play is correct, you must use that letter in all subsequent guesses. So…if you have a green T in the first position and a yellow A in the second position, your next round word must start with a T and include an A.

Editing to add that you cannot use any incorrect letters in subsequent rounds. Using the example above, round two could not include the three incorrect letter choices from round one.

1 Like

Thanks! I guess I play the same way as @CT1417 - once there are known letters, my next guess will include them and, if in the correct spot, then only in that spot. So I would only consider MOUTH, not MUSHY. I do find it frustrating if I have to “waste” an extra guess but more if I pick HEARD, for example, and it was HEART and I had both in my head but just guessed wrong. Thanks - I find the discussion interesting. We are big game players and I find it interesting to see strategy and risk aversion in gaming!

1 Like

I’ve always played in hard mode also. I guess I never considered the strategy of blowing a guess in order to help solve when the puzzle is one of those combo of letters that could easily have multiple (wrong) choices.

I also did not realize the NYT puzzle does not use plurals. I don’t know that I’ve ever guessed a word that was in plural form, but at least I know now not to.

2 Likes

Great analysis, Data10! Seems like Data10 is playing not in “hard” mode. I think it’s a bit of a misnomer since it’s “harder” in terms of adding a restriction, but also saves careless people like me from forgetting about a certain yellow or green letter. The statistical analysis is fascinating but my love for the game definitely includes the dopamine hit aspect (go for a lucky two instead of a safe road to three…) and I like thinking about words more than I like thinking about statistics! Grateful there are people like Data10 to bring the objectivity and evidence-based technique; I save that for my day job and live my fantasy life on Wordle (I’m not normally a gambler…)

1 Like

This is exactly what I did! ROATE and MUSHY. Got it in 3 guesses.

My strategy is the same - be cautious and get it right in 3 rather than 4 or more guesses. I consider getting it in 2 as a matter of luck more than anything else, and that doesn’t excite me as much as strategy.

The 2nd guess is the most important to me. I use it to strategize and eliminate as many words as possible with the intention of getting it right in 3. I’ll go for 2 only if I know in my first row that there are only 2 answers possible; then I’ll take a guess.

Nice discussion here!

1 Like

I don’t have hard mode turned on, but I basically play in hard mode.

5 Likes

I didn’t have it turned on for the first couple of weeks of January b/c I didn’t realize there was another way to play.