969 words
5 minutes
Playing Wordle even worse

Wrong Wordle#

The idea#

The problem presented in the video You should play Wordle wrong by Ellie Rasmussen goes as follows:

“In the game Wordle, if we were to assign a score of 10 points for each yellow letter and 20 points for each green letter received while playing, what is the lowest score we could receive?”

In the video, she codes a solution that uses a heuristic for solving the problem and reaches the conclusion that the lowest possible score is 10 points with only one yellow letter. But, as some commenters suggested, this heuristic does not always give the best solution; some commenters even reached full solutions for the problem.

Comment from @danzibob9452 that reads \

So what did I do?#

So then, what’s left for me to analyze?

Well, I like to be thorough, so I decided to code a dynamic programming solution that calculated the lowest possible score for all possible answers! And there are many. From the list of answers I found, there are 439 possible answers that can be solved getting no yellow or green letters.

But the coolest thing I found is this: there is actually one single answer that has 7 possible guesses where no letters are repeated even once! For the word CIVIC you could guess:

  • WYNNS
  • QAJAQ
  • PHPHT
  • KUDZU
  • GRRRL
  • EXEEM
  • BOFFO

Yeah, I don’t know what any of those words mean either, but Wordle accepts them!

If anyone is interested in seeing all of the possible guess lists for each answer or checking the code that generated it, you can check the following GitHub repo:

dabaez
/
WorseWordle
Waiting for api.github.com...
00K
0K
0K
Waiting...

Development Details#

WARNING

Only cool people will check this part

Now I will talk about the whole development process and all the cool stuff that went into it!

Source of the word list#

There are multiple Wordle lists floating around, but the video didn’t specify which list was used.

The only answers list I could find was this one from cfreshman.

For the guess list, the most popular ones I saw were this one from cfreshman too and this one from dracos.

I decided to only go with the ones from cfreshman since they were from the same source, it made sense to me to use them together.

NOTE

After doing the whole process, I realized some of the words that appear in the video do not appear on the guess list that I used. I suspected that the video might have used draco’s list. I checked and all of the words on cfreshman’s list appear on dracos’ list too, so my solutions are also valid in that group; it’s just that if I had used that list, I would have gotten even more solutions.

Reframing of the problem#

Instead of trying to check for every possible sequence of words, I had to reframe the problem.

One interesting thing is that if we have the word BABAB and DODOD as guesses, this situation is the same as having guessed DADAD and BOBOB because we don’t really care about the words themselves, we only care about the letters they have used.

So, we can reframe the problem as follows:

Given that we have used the letters A, B, C… what is the maximum number of guesses we have left?

This problem turns recursive very nicely!

Let’s say that the solution for a set of letters XX is f(X)f(X).

Now, we can calculate f(X)f(X) by trying to add one valid guess gg and seeing what happens. Let’s say gg is composed by the set of letters GG, then the largest answer that has gg in it will be f(XG)+1f(X \cup G)+1.

So, we can calculate the best answer by trying each valid guess and keeping the largest one!

We can use Dynamic Programming for keeping the largest answer possible for each set of letters.

We have 2262^{26} possible sets of letters and for each one we have to check every guess and there are  104~10^4 possible guesses in my list.

Small cool optimization#

For all possible sets of letters, we have to check for each guess if it’s valid, but I got a little clever for this!

If a set of letters had already used the letter ‘e’, it would be convenient to only have to check all of the guesses that don’t have the letter ‘e’ in them, as that list should be small because ‘e’ is the most common english letter.

So I precalculated all of the guesses available where no ‘e’ was present!

Nah, I went even harder, I precalculated all of the guesses available for every subset of the 10 most common letters of the English language.

I don’t really know how much faster it made it, but I felt cool doing it.

The average length of the group ended up being  103~10^3, but I’m not really sure if that translates into a complexity 1010 times smaller.

Coding#

I coded it all in C++ because I find it easier to work with subsets in C++. I really should learn how to work with subsets in Python or Rust or something.

Conclusion#

I was thinking if searching for the highest possible score in Wordle is interesting, but it actually isn’t.

The idea would be to almost guess correctly 55 times and then guess correctly at the end, and this Horsle meme already did that:

\

It’s just doing that but without using the first word FORCE and replace it with HORSE at the end.

I just hope we all learned not to trust heuristics today.