My Exploration in Getting DeepSeek and o3-mini to Solve Sudoku
I had previously explored and shown that o1-preview was not able to solve Sudoku. Now that new LLMs like o3 and DeepSeek have emerged, I wanted to see if this has been solved.
The difficulty in solving Sudoku was that the solution tree quickly branches out into numerous possible paths, and each iteration required the searching for familiar patterns, analysing the cell contents, reviewing peer cells in selected rows, columns and blocks, then determining what was the resultant grid. It was a lot of work involving many steps, assuming we were using the human-like thinking and reasoning approach, rather than an algorithmic approach, e.g. using bit-masking and back-tracking to comb the solution space.
This article is split into 2 parts. In Part 1 I present the same sudoku puzzle to DeepSeek and o3-mini to see their approach in solving. In Part 2, I explore the curious behaviour of information loss or "forgetfulness" - despite all the information available, the LLM did not "see" or chose to ignore or "forget" the given data.
Part 1 - Solving Sudoku: DeepSeek vs o3-mini
Prompt
I had used the following prompt previously:
Can you help me solve this sudoko puzzle, in which blanks are represented by _. The rows of _ and numbers correspond to the rows in the Sudoku puzzle.
_ _ _ 1 _ _ _ _ _
_ _ 2 _ _ _ _ _ _
6 5 _ _ 9 _ _ _ _
_ _ 7 _ _ _ _ 9 _
8 _ _ _ 2 _ _ _ _
2 _ _ _ 8 _ _ 5 1
5 _ _ _ _ 7 _ 3 _
_ _ 9 3 _ _ _ 7 8
_ 7 3 _ 6 8 _ 2 5
The solution I had worked out:
9 3 8 | 1 7 5 | 2 4 6
7 4 2 | 8 3 6 | 5 1 9
6 5 1 | 4 9 2 | 3 8 7
---------------------
3 6 7 | 5 4 1 | 8 9 2
8 1 5 | 7 2 9 | 4 6 3
2 9 4 | 6 8 3 | 7 5 1
---------------------
5 8 6 | 2 1 7 | 9 3 4
1 2 9 | 3 5 4 | 6 7 8
4 7 3 | 9 6 8 | 1 2 5
First Up: DeepSeek
When DeepSeek was first announced some days before the Chinese New Year in 2025, there as a lot of buzz and excitement for it being a LLM that managed to perform better than other popular LLMs, at a much cheaper cost.
After the prompt was submitted, the reasoning process started. DeepSeek tried to build the board, reviewed the rules, then examined the blanks and tried to fill in the numbers by reviewing each row and column.
Mid-way, it attempted to do Brute Force and fill in cells and see if there were contradictions.
It was able to backtrack when it found conflicts.
Interestingly, it did not want to use advanced techniques, but in the end it became too complex but it decided to press on.
Eventually it just decided to come up with the best solution so far after it was unable to resolve the contradictions too far down the paths.
After 802 seconds (about 13 minutes), it gave a few steps to the "solution" before presenting the final grid:
Reproducing the "solved" grid here:
3 9 1 | 4 5 2 | 7 8 6
7 4 2 | 6 3 1 | 5 9 8
6 5 8 | 7 9 3 | 2 1 4
------+------+------
9 6 7 | 5 4 8 | 3 1 2
8 3 4 | 1 2 6 | 9 5 7
2 8 6 | 7 9 3 | 4 5 1
------+------+------
5 8 2 | 9 1 7 | 4 3 6
1 2 9 | 3 4 5 | 6 7 8
4 7 3 | 9 6 8 | 1 2 5
The grid was not the right solution as multiple blocks had duplicate digits. "8" is duplicated in Block 4, "1" is duplicated in Block 6, "2" in Block 7, "9" in Block 8, and "6" in Block 9.
Next Up: o3-mini
o3-mini has been recently released. Since it was available via the chat interface, I pasted the same prompt.
I loved the very bubbly and cheerful start to the analysis - "Let's solve this!":
While DeepSeek tries to dump out the train of thought as it happened, o3-mini-high would hide the thinking behind summary "Blocks of Thoughts" (I coined this) with a sub-heading.
After 13 minutes and 43 seconds, it suddenly stopped and didn't continue to provide a solution. I was not sure what happened - I guessed it could have hit a timeout.
I asked it to provide the final solution, but instead it decided to redo its analysis. After another 9 minutes 44 seconds, it decided it still did not have the final solution.
The "solution" had missing digits. As a last resort it tried to use an online solver as the final note, but the solution was not complete as well.
Recommended by LinkedIn
Part 2 - Information Capture and Information Forgetfulness
By now it seemed pretty clear that Sudoku puzzles were not one of the problems that LLM could solve, or at least solve quickly within 14 minutes. This led me to think deeper - could the LLMs have misinterpreted the data provided, or used it wrongly?
To investigate this, I needed a way to represent a Sudoku board. In a previous project, I had gotten o3-mini-high to propose a way to represent a Sudoku board in JSON by generating the function that creates the JSON and another function to fill in the candidate lists.
1. Read a text file that contains a sudoku puzzle:
4 7 3 | 9 _ _ | 6 _ _
5 8 2 | 7 _ 6 | _ _ 9
6 9 1 | _ _ 3 | _ _ _
---------------------
_ _ _ | _ 9 1 | 2 _ 7
2 _ _ | 4 _ _ | _ _ _
_ 5 9 | _ 3 _ | 1 _ _
---------------------
_ _ 4 | _ 5 _ | _ _ _
_ 6 _ | _ _ _ | 7 _ _
_ 2 _ | 6 7 _ | 9 _ _
The function returns a JSON that captures the table. Use notations to reference the rows and columns, e.g. R1C1 is row 1 column 1, and B1 denotes the 3x3 block containing R1C1-R1C3, R2C1-R2C3 and R3C1-R3C3. A unit identifies the row, column or 3x3 block that a cell belongs to, and can be denoted by R1-R9, C1-C9 or B1-B9.
2. A function that takes in a JSON of a Sudoku puzzle, computes the pencilled candidate lists for each unsolved cell, and returns a JSON of the puzzle with the candidate lists.
...
The LLM proposed a JSON structure consisting of the cell reference as the key, a value for the cell's digit, and a candidate list of possible digits that could be possible for each cell.
The Python program generated JSON looked like the following:
{"R1C1": {"value": 4, "candidates": []}, "R1C2": {"value": 7, "candidates": []}, "R1C3": {"value": 3, "candidates": []}, "R1C4": {"value": 9, "candidates": []}, "R1C5": {"value": null, "candidates": [1, 2, 8]}, "R1C6": {"value": null, "candidates": [2, 5, 8]}, "R1C7": {"value": 6, "candidates": []}, "R1C8": {"value": null, "candidates": [1, 2, 5, 8]}, "R1C9": {"value": null, "candidates": [1, 2, 5, 8]}, "R2C1": {"value": 5, "candidates": []}, "R2C2": {"value": 8, "candidates": []}, "R2C3": {"value": 2, "candidates": []}, "R2C4": {"value": 7, "candidates": []}, "R2C5": {"value": null, "candidates": [1, 4]}, "R2C6": {"value": 6, "candidates": []}, "R2C7": {"value": null, "candidates": [3, 4]}, "R2C8": {"value": null, "candidates": [1, 3, 4]}, "R2C9": {"value": 9, "candidates": []}, "R3C1": {"value": 6, "candidates": []}, "R3C2": {"value": 9, "candidates": []}, "R3C3": {"value": 1, "candidates": []}, "R3C4": {"value": null, "candidates": [2, 5, 8]}, "R3C5": {"value": null, "candidates": [2, 4, 8]}, "R3C6": {"value": 3, "candidates": []}, "R3C7": {"value": null, "candidates": [4, 5, 8]}, "R3C8": {"value": null, "candidates": [2, 4, 5, 7, 8]}, "R3C9": {"value": null, "candidates": [2, 4, 5, 8]}, "R4C1": {"value": null, "candidates": [3, 8]}, "R4C2": {"value": null, "candidates": [3, 4]}, "R4C3": {"value": null, "candidates": [6, 8]}, "R4C4": {"value": null, "candidates": [5, 8]}, "R4C5": {"value": 9, "candidates": []}, "R4C6": {"value": 1, "candidates": []}, "R4C7": {"value": 2, "candidates": []}, "R4C8": {"value": null, "candidates": [3, 4, 5, 6, 8]}, "R4C9": {"value": 7, "candidates": []}, "R5C1": {"value": 2, "candidates": []}, "R5C2": {"value": null, "candidates": [1, 3]}, "R5C3": {"value": null, "candidates": [6, 7, 8]}, "R5C4": {"value": 4, "candidates": []}, "R5C5": {"value": null, "candidates": [6, 8]}, "R5C6": {"value": null, "candidates": [5, 7, 8]}, "R5C7": {"value": null, "candidates": [3, 5, 8]}, "R5C8": {"value": null, "candidates": [3, 5, 6, 8, 9]}, "R5C9": {"value": null, "candidates": [3, 5, 6, 8]}, "R6C1": {"value": null, "candidates": [7, 8]}, "R6C2": {"value": 5, "candidates": []}, "R6C3": {"value": 9, "candidates": []}, "R6C4": {"value": null, "candidates": [2, 8]}, "R6C5": {"value": 3, "candidates": []}, "R6C6": {"value": null, "candidates": [2, 7, 8]}, "R6C7": {"value": 1, "candidates": []}, "R6C8": {"value": null, "candidates": [4, 6, 8]}, "R6C9": {"value": null, "candidates": [4, 6, 8]}, "R7C1": {"value": null, "candidates": [1, 3, 7, 8, 9]}, "R7C2": {"value": null, "candidates": [1, 3]}, "R7C3": {"value": 4, "candidates": []}, "R7C4": {"value": null, "candidates": [1, 2, 3, 8]}, "R7C5": {"value": 5, "candidates": []}, "R7C6": {"value": null, "candidates": [2, 8, 9]}, "R7C7": {"value": null, "candidates": [3, 8]}, "R7C8": {"value": null, "candidates": [1, 2, 3, 6, 8]}, "R7C9": {"value": null, "candidates": [1, 2, 3, 6, 8]}, "R8C1": {"value": null, "candidates": [1, 3, 8, 9]}, "R8C2": {"value": 6, "candidates": []}, "R8C3": {"value": null, "candidates": [5, 8]}, "R8C4": {"value": null, "candidates": [1, 2, 3, 8]}, "R8C5": {"value": null, "candidates": [1, 2, 4, 8]}, "R8C6": {"value": null, "candidates": [2, 4, 8, 9]}, "R8C7": {"value": 7, "candidates": []}, "R8C8": {"value": null, "candidates": [1, 2, 3, 4, 5, 8]}, "R8C9": {"value": null, "candidates": [1, 2, 3, 4, 5, 8]}, "R9C1": {"value": null, "candidates": [1, 3, 8]}, "R9C2": {"value": 2, "candidates": []}, "R9C3": {"value": null, "candidates": [5, 8]}, "R9C4": {"value": 6, "candidates": []}, "R9C5": {"value": 7, "candidates": []}, "R9C6": {"value": null, "candidates": [4, 8]}, "R9C7": {"value": 9, "candidates": []}, "R9C8": {"value": null, "candidates": [1, 3, 4, 5, 8]}, "R9C9": {"value": null, "candidates": [1, 3, 4, 5, 8]}}
The JSON was meant to capture the following Sudoku board:
I wanted to know if the LLM was able to correctly pick out cells from a row, column and block, determine if a digit is assigned, determine unsolved cells and whether a digit is in any of the candidate lists. I felt these were basic operations that would be required to analyse a Sudoku board.
I created the following prompt:
Below is the current 9x9 Sudoku board represented as JSON string. Each cell is referenced by a cell reference \"RxCy\" which denotes Row x and Column y.
```
{"R1C1": {"value": 4, "candidates": []}, "R1C2": {"value": 7, "candidates": []}, "R1C3": {"value": 3, "candidates": []}, "R1C4": {"value": 9, "candidates": []}, "R1C5": {"value": null, "candidates": [1, 2, 8]}, "R1C6": {"value": null, "candidates": [2, 5, 8]}, "R1C7": {"value": 6, "candidates": []}, "R1C8": {"value": null, "candidates": [1, 2, 5, 8]}, "R1C9": {"value": null, "candidates": [1, 2, 5, 8]}, "R2C1": {"value": 5, "candidates": []}, "R2C2": {"value": 8, "candidates": []}, "R2C3": {"value": 2, "candidates": []}, "R2C4": {"value": 7, "candidates": []}, "R2C5": {"value": null, "candidates": [1, 4]}, "R2C6": {"value": 6, "candidates": []}, "R2C7": {"value": null, "candidates": [3, 4]}, "R2C8": {"value": null, "candidates": [1, 3, 4]}, "R2C9": {"value": 9, "candidates": []}, "R3C1": {"value": 6, "candidates": []}, "R3C2": {"value": 9, "candidates": []}, "R3C3": {"value": 1, "candidates": []}, "R3C4": {"value": null, "candidates": [2, 5, 8]}, "R3C5": {"value": null, "candidates": [2, 4, 8]}, "R3C6": {"value": 3, "candidates": []}, "R3C7": {"value": null, "candidates": [4, 5, 8]}, "R3C8": {"value": null, "candidates": [2, 4, 5, 7, 8]}, "R3C9": {"value": null, "candidates": [2, 4, 5, 8]}, "R4C1": {"value": null, "candidates": [3, 8]}, "R4C2": {"value": null, "candidates": [3, 4]}, "R4C3": {"value": null, "candidates": [6, 8]}, "R4C4": {"value": null, "candidates": [5, 8]}, "R4C5": {"value": 9, "candidates": []}, "R4C6": {"value": 1, "candidates": []}, "R4C7": {"value": 2, "candidates": []}, "R4C8": {"value": null, "candidates": [3, 4, 5, 6, 8]}, "R4C9": {"value": 7, "candidates": []}, "R5C1": {"value": 2, "candidates": []}, "R5C2": {"value": null, "candidates": [1, 3]}, "R5C3": {"value": null, "candidates": [6, 7, 8]}, "R5C4": {"value": 4, "candidates": []}, "R5C5": {"value": null, "candidates": [6, 8]}, "R5C6": {"value": null, "candidates": [5, 7, 8]}, "R5C7": {"value": null, "candidates": [3, 5, 8]}, "R5C8": {"value": null, "candidates": [3, 5, 6, 8, 9]}, "R5C9": {"value": null, "candidates": [3, 5, 6, 8]}, "R6C1": {"value": null, "candidates": [7, 8]}, "R6C2": {"value": 5, "candidates": []}, "R6C3": {"value": 9, "candidates": []}, "R6C4": {"value": null, "candidates": [2, 8]}, "R6C5": {"value": 3, "candidates": []}, "R6C6": {"value": null, "candidates": [2, 7, 8]}, "R6C7": {"value": 1, "candidates": []}, "R6C8": {"value": null, "candidates": [4, 6, 8]}, "R6C9": {"value": null, "candidates": [4, 6, 8]}, "R7C1": {"value": null, "candidates": [1, 3, 7, 8, 9]}, "R7C2": {"value": null, "candidates": [1, 3]}, "R7C3": {"value": 4, "candidates": []}, "R7C4": {"value": null, "candidates": [1, 2, 3, 8]}, "R7C5": {"value": 5, "candidates": []}, "R7C6": {"value": null, "candidates": [2, 8, 9]}, "R7C7": {"value": null, "candidates": [3, 8]}, "R7C8": {"value": null, "candidates": [1, 2, 3, 6, 8]}, "R7C9": {"value": null, "candidates": [1, 2, 3, 6, 8]}, "R8C1": {"value": null, "candidates": [1, 3, 8, 9]}, "R8C2": {"value": 6, "candidates": []}, "R8C3": {"value": null, "candidates": [5, 8]}, "R8C4": {"value": null, "candidates": [1, 2, 3, 8]}, "R8C5": {"value": null, "candidates": [1, 2, 4, 8]}, "R8C6": {"value": null, "candidates": [2, 4, 8, 9]}, "R8C7": {"value": 7, "candidates": []}, "R8C8": {"value": null, "candidates": [1, 2, 3, 4, 5, 8]}, "R8C9": {"value": null, "candidates": [1, 2, 3, 4, 5, 8]}, "R9C1": {"value": null, "candidates": [1, 3, 8]}, "R9C2": {"value": 2, "candidates": []}, "R9C3": {"value": null, "candidates": [5, 8]}, "R9C4": {"value": 6, "candidates": []}, "R9C5": {"value": 7, "candidates": []}, "R9C6": {"value": null, "candidates": [4, 8]}, "R9C7": {"value": 9, "candidates": []}, "R9C8": {"value": null, "candidates": [1, 3, 4, 5, 8]}, "R9C9": {"value": null, "candidates": [1, 3, 4, 5, 8]}}
```
A solved cell is represented by its digit under the key "value".
An unsolved cell has null for value but has a candidate list under the key "candidates", which contains a list of the possible digits that can be likely for this cell.
Answer the following questions.
1. Which cells are in Row 4?
2. Which cells are in Column 7?
3. Which cells are in Block 4?
4. How many unsolved cells are there in Block 3? What are the solved numbers in this block?
5. Which cells have "4" in their candidate list?
6. Is the previous JSON Sudoku table a well-formed table?
Next, I tried it on various models.
The following were the responses.
o3-mini-high
Score: 6/6
GPT-4o
Score: 3/6 (Q2, Q5, Q6 wrong)
o1
Score: 6/6
4o-mini
Score: 4/6 (Q4 and Q5 wrong)
DeepSeek
Score: 6/6
Review
It was interesting to see that despite such straightforward questions, not all the LLMs were able to get it right. In particular, only o3, o1 and DeepSeek got all the questions right.
Why did some models miss out on some data which are obvious? I think this would be hard to figure out as the models were black boxes to us and we might never know what happened in the backend. Of course we could argue we would use a RAG if we really wanted to store data in a vector database for future retrieval - but this might not work well with a Sudoku board.
The results here were also not exhaustive and deep - there would be a possibility that a model that scored 6/6 would fail with another given JSON Sudoku table.
The results implied that if the models were to be used programmatically, not all models would respond the same way with the same data. Certain models seemed more likely to give false results. It would be useful to verify that the models were able to capture and retrieve samples of the datasets we provided in the prompt, as the first step to check if the model was working correctly.
#sudoku #codegenerator #generativeAI #LLM #openai #deepseek