Wrong Wordle
The idea
The problem presented in the video You should play Wordle wrong by Ellie Rasmussen goes as follows:
“In the game Wordle, if we were to assign a score of 10 points for each yellow letter and 20 points for each green letter received while playing, what is the lowest score we could receive?”
In the video, she codes a solution that uses a heuristic for solving the problem and reaches the conclusion that the lowest possible score is 10 points with only one yellow letter. But, as some commenters suggested, this heuristic does not always give the best solution; some commenters even reached full solutions for the problem.
So what did I do?
So then, what’s left for me to analyze?
Well, I like to be thorough, so I decided to code a dynamic programming solution that calculated the lowest possible score for all possible answers! And there are many. From the list of answers I found, there are 439 possible answers that can be solved getting no yellow or green letters.
But the coolest thing I found is this: there is actually one single answer that has 7 possible guesses where no letters are repeated even once! For the word CIVIC you could guess:
- WYNNS
- QAJAQ
- PHPHT
- KUDZU
- GRRRL
- EXEEM
- BOFFO
Yeah, I don’t know what any of those words mean either, but Wordle accepts them!
If anyone is interested in seeing all of the possible guess lists for each answer or checking the code that generated it, you can check the following GitHub repo:
Development Details
WARNINGOnly cool people will check this part
Now I will talk about the whole development process and all the cool stuff that went into it!
Source of the word list
There are multiple Wordle lists floating around, but the video didn’t specify which list was used.
The only answers list I could find was this one from cfreshman.
For the guess list, the most popular ones I saw were this one from cfreshman too and this one from dracos.
I decided to only go with the ones from cfreshman since they were from the same source, it made sense to me to use them together.
NOTEAfter doing the whole process, I realized some of the words that appear in the video do not appear on the guess list that I used. I suspected that the video might have used draco’s list. I checked and all of the words on cfreshman’s list appear on dracos’ list too, so my solutions are also valid in that group; it’s just that if I had used that list, I would have gotten even more solutions.
Reframing of the problem
Instead of trying to check for every possible sequence of words, I had to reframe the problem.
One interesting thing is that if we have the word BABAB and DODOD as guesses, this situation is the same as having guessed DADAD and BOBOB because we don’t really care about the words themselves, we only care about the letters they have used.
So, we can reframe the problem as follows:
Given that we have used the letters A, B, C… what is the maximum number of guesses we have left?
This problem turns recursive very nicely!
Let’s say that the solution for a set of letters is .
Now, we can calculate by trying to add one valid guess and seeing what happens. Let’s say is composed by the set of letters , then the largest answer that has in it will be .
So, we can calculate the best answer by trying each valid guess and keeping the largest one!
We can use Dynamic Programming for keeping the largest answer possible for each set of letters.
We have possible sets of letters and for each one we have to check every guess and there are possible guesses in my list.
Small cool optimization
For all possible sets of letters, we have to check for each guess if it’s valid, but I got a little clever for this!
If a set of letters had already used the letter ‘e’, it would be convenient to only have to check all of the guesses that don’t have the letter ‘e’ in them, as that list should be small because ‘e’ is the most common english letter.
So I precalculated all of the guesses available where no ‘e’ was present!
Nah, I went even harder, I precalculated all of the guesses available for every subset of the 10 most common letters of the English language.
I don’t really know how much faster it made it, but I felt cool doing it.
The average length of the group ended up being , but I’m not really sure if that translates into a complexity times smaller.
Coding
I coded it all in C++ because I find it easier to work with subsets in C++. I really should learn how to work with subsets in Python or Rust or something.
Conclusion
I was thinking if searching for the highest possible score in Wordle is interesting, but it actually isn’t.
The idea would be to almost guess correctly times and then guess correctly at the end, and this Horsle meme already did that:
It’s just doing that but without using the first word FORCE and replace it with HORSE at the end.
I just hope we all learned not to trust heuristics today.