After seeing the other thread, I started playing Wordle recently. I enjoy the game and analyzed portions of the game using a modified version of a simulator I found on github. This post summarizes some of the results and contains some spoilers that could make the game less enjoyable for some.
STARTING WORD
Among all 5 letter words allowed on Wordle, the most commonly used letters are:
- s
- e
- a
- o
- r
- i
- l
- t
- n
- u
The list above using all valid Wordle words. However, the NYT Wordle words are not a random selection of all available words. For example, they do not use plurals of 4 letter words that end in ‘s’, or words with similar types of ‘ed’ or ‘ing’ extensions. They also choose more commonly used known/words and have various other patterns. If I restrict the 5 letter words to NYT type Wordle words, places a strong weight on recent actual words, then the list changes to.
- e
- a
- r
- o
- t
- l
- s
- i
- c
- n
In general choosing any word with 5 unique letters among the frequently used letters above will do well. I found the lowest average number of guesses with the following starting words, assuming answers are limited to NYT-type Wordle words.
- ROATE
- PRATE
- SLATE
- TRACE
- ORATE
- CRATE
- REAST
- CARET
- STARE
- TRAPE
There are clearly some patterns. All 10 of the words include the letters, ‘e’, ‘a’, and ‘t’. 9 of the 10 also include the letter ‘r’. The remaining letter is more variable. There are also patterns in letter placement. For example, 9 of the 10 have the A in the middle position. The top ranked word ROATE makes sense. It uses the 5 most commo letters listed above, in the most logical order. ORATE and OATER also use these letters, but have clearly less optimal order since there are fewer words that begin with O than R. However, the 2nd word PRATE may be less obvious. The starting P is slightly beyond the top 10. I suspect PRATE partially does well because of a combination of letter positioning and the P being especially useful for some specific troublesome NYT words that would be 4+ guesses without knowledge of the P.
The difference between the top ranked word and 10th ranked word above is only a negligible 0.03 difference in average number of guesses with optimal strategy. With such slight differences in average number of guesses by starting word, the lowest average varies depending on minor details in strategy and allowed words, which relates to why different persons who have done a starting word analysis have come to different conclusions. Some of the top ranked starting words by others who have done this type of simulation include ROATE, TRACE, REAST, CRATE, CRANE, SALET, and TRAPE (others only seem to rank TRAPE top for hard mode). Rather than optimal starting word, the more critical part is making good later guesses.
LATER GUESSES / STRATEGY
As noted above, strategy for later guesses depends on what set of solutions you are using. I am assuming the NYT-type word list mentioned above. Wordle solutions may only be on this list of words. I am also assuming optimizing for lowest average number of guesses. If the goal is to have highest chance of getting a 2 guess solution or lowest chance of getting a >4 guess outcome, the strategy differs.
This more important part of the strategy is also the one that is more complicated and difficult to describe or optimize, which contributes to what makes the game interesting. There is not a simple solution. In the future I plan to compare some more straightforward to describe non-optimal strategies for real world usage. Some possible optimal 2nd words for ROATE are below. While these 2nd words produce the lowest average number of guesses on average, many are not obvious.
For example, if you only get an ‘a’, I expect most people would choose a 2nd word that used an ‘a’ and some common letters besides the ones in ROATE. I expect very few would consider using LYSIN as a 2nd word, even if they had a high vocabulary level. LYSIN is a good choice because the most commonly used letters in the available remaining solutions that have an ‘a’ and no ‘ROTE’ are ‘l’, ‘y’, ‘n’, ‘i’, and ‘s’. LYSIN hits all 5 of these desired letters. This is slightly more optimal than trying to also include the ‘a’ with INLAY or LAYIN.
If nothing hits, 2nd word is SLIMY
If only r and wrong position, 2nd word is SCULK
If only o and wrong position, 2nd word is SNOOL
If only a and wrong position, 2nd word is LYSIN
If only t and wrong position, 2nd word is SHUNT
If only e and wrong position, 2nd word is FIELD