My Exploration in Getting DeepSeek and o3-mini to Solve Sudoku
Image generated by LinkedIn Microsoft Designer

My Exploration in Getting DeepSeek and o3-mini to Solve Sudoku

I had previously explored and shown that o1-preview was not able to solve Sudoku. Now that new LLMs like o3 and DeepSeek have emerged, I wanted to see if this has been solved.

The difficulty in solving Sudoku was that the solution tree quickly branches out into numerous possible paths, and each iteration required the searching for familiar patterns, analysing the cell contents, reviewing peer cells in selected rows, columns and blocks, then determining what was the resultant grid. It was a lot of work involving many steps, assuming we were using the human-like thinking and reasoning approach, rather than an algorithmic approach, e.g. using bit-masking and back-tracking to comb the solution space.

This article is split into 2 parts. In Part 1 I present the same sudoku puzzle to DeepSeek and o3-mini to see their approach in solving. In Part 2, I explore the curious behaviour of information loss or "forgetfulness" - despite all the information available, the LLM did not "see" or chose to ignore or "forget" the given data.

Part 1 - Solving Sudoku: DeepSeek vs o3-mini

Prompt

I had used the following prompt previously:

Can you help me solve this sudoko puzzle, in which blanks are represented by _. The rows of _ and numbers correspond to the rows in the Sudoku puzzle.

_ _ _ 1 _ _ _ _ _
_ _ 2 _ _ _ _ _ _
6 5 _ _ 9 _ _ _ _
_ _ 7 _ _ _ _ 9 _
8 _ _ _ 2 _ _ _ _ 
2 _ _ _ 8 _ _ 5 1
5 _ _ _ _ 7 _ 3 _ 
_ _ 9 3 _ _ _ 7 8
_ 7 3 _ 6 8 _ 2 5        

The solution I had worked out:

9 3 8 | 1 7 5 | 2 4 6
7 4 2 | 8 3 6 | 5 1 9
6 5 1 | 4 9 2 | 3 8 7
---------------------
3 6 7 | 5 4 1 | 8 9 2
8 1 5 | 7 2 9 | 4 6 3
2 9 4 | 6 8 3 | 7 5 1
---------------------
5 8 6 | 2 1 7 | 9 3 4
1 2 9 | 3 5 4 | 6 7 8
4 7 3 | 9 6 8 | 1 2 5        

First Up: DeepSeek

When DeepSeek was first announced some days before the Chinese New Year in 2025, there as a lot of buzz and excitement for it being a LLM that managed to perform better than other popular LLMs, at a much cheaper cost.

After the prompt was submitted, the reasoning process started. DeepSeek tried to build the board, reviewed the rules, then examined the blanks and tried to fill in the numbers by reviewing each row and column.

Article content
DeepSeek attempted to build the board and fill in the numbers.
Article content
DeepSeek looking at missing numbers in every row and column.

Mid-way, it attempted to do Brute Force and fill in cells and see if there were contradictions.

Article content
DeepSeek attempting to drop a digit into a cell and see if it worked.

It was able to backtrack when it found conflicts.

Article content
DeepSeek attempting to backtrack and recheck its steps when it found a conflict.

Interestingly, it did not want to use advanced techniques, but in the end it became too complex but it decided to press on.

Article content
DeepSeek chose a path that did not use advanced techniques. Maybe it was regretting it.

Eventually it just decided to come up with the best solution so far after it was unable to resolve the contradictions too far down the paths.

Article content
DeepSeek admitted some assumptions were still incorrect, but it was too late to turn back now.

After 802 seconds (about 13 minutes), it gave a few steps to the "solution" before presenting the final grid:

Article content
DeepSeek's solution after much thought.

Reproducing the "solved" grid here:

3 9 1 | 4 5 2 | 7 8 6
7 4 2 | 6 3 1 | 5 9 8
6 5 8 | 7 9 3 | 2 1 4
------+------+------
9 6 7 | 5 4 8 | 3 1 2
8 3 4 | 1 2 6 | 9 5 7
2 8 6 | 7 9 3 | 4 5 1
------+------+------
5 8 2 | 9 1 7 | 4 3 6
1 2 9 | 3 4 5 | 6 7 8
4 7 3 | 9 6 8 | 1 2 5        

The grid was not the right solution as multiple blocks had duplicate digits. "8" is duplicated in Block 4, "1" is duplicated in Block 6, "2" in Block 7, "9" in Block 8, and "6" in Block 9.

Next Up: o3-mini

o3-mini has been recently released. Since it was available via the chat interface, I pasted the same prompt.

I loved the very bubbly and cheerful start to the analysis - "Let's solve this!":

Article content
Loving the cheery start of o3-mini's analysis.

While DeepSeek tries to dump out the train of thought as it happened, o3-mini-high would hide the thinking behind summary "Blocks of Thoughts" (I coined this) with a sub-heading.

Article content
o3-mini explaining the process in "blocks of thoughts" with a sub-heading.

After 13 minutes and 43 seconds, it suddenly stopped and didn't continue to provide a solution. I was not sure what happened - I guessed it could have hit a timeout.

Article content
o3-mini just stopped and didn't return an output solution.

I asked it to provide the final solution, but instead it decided to redo its analysis. After another 9 minutes 44 seconds, it decided it still did not have the final solution.

Article content
After redoing its analysis, o3-mini provided the final incomplete table.

The "solution" had missing digits. As a last resort it tried to use an online solver as the final note, but the solution was not complete as well.

Article content
o3-mini resorted to an online solver but did not manage to give a complete solution.

Part 2 - Information Capture and Information Forgetfulness

By now it seemed pretty clear that Sudoku puzzles were not one of the problems that LLM could solve, or at least solve quickly within 14 minutes. This led me to think deeper - could the LLMs have misinterpreted the data provided, or used it wrongly?

To investigate this, I needed a way to represent a Sudoku board. In a previous project, I had gotten o3-mini-high to propose a way to represent a Sudoku board in JSON by generating the function that creates the JSON and another function to fill in the candidate lists.

1. Read a text file that contains a sudoku puzzle:
4 7 3 | 9 _ _ | 6 _ _
5 8 2 | 7 _ 6 | _ _ 9
6 9 1 | _ _ 3 | _ _ _
---------------------
_ _ _ | _ 9 1 | 2 _ 7
2 _ _ | 4 _ _ | _ _ _
_ 5 9 | _ 3 _ | 1 _ _
---------------------
_ _ 4 | _ 5 _ | _ _ _
_ 6 _ | _ _ _ | 7 _ _
_ 2 _ | 6 7 _ | 9 _ _

The function returns a JSON that captures the table. Use notations to reference the rows and columns, e.g. R1C1 is row 1 column 1, and B1 denotes the 3x3 block containing R1C1-R1C3, R2C1-R2C3 and R3C1-R3C3. A unit identifies the row, column or 3x3 block that a cell belongs to, and can be denoted by R1-R9, C1-C9 or B1-B9.

2. A function that takes in a JSON of a Sudoku puzzle, computes the pencilled candidate lists for each unsolved cell, and returns a JSON of the puzzle with the candidate lists.

...        

The LLM proposed a JSON structure consisting of the cell reference as the key, a value for the cell's digit, and a candidate list of possible digits that could be possible for each cell.

Article content
o3-mini-high created some functions to generate JSON text for Sudoku boards in a previous project.

The Python program generated JSON looked like the following:

{"R1C1": {"value": 4, "candidates": []}, "R1C2": {"value": 7, "candidates": []}, "R1C3": {"value": 3, "candidates": []}, "R1C4": {"value": 9, "candidates": []}, "R1C5": {"value": null, "candidates": [1, 2, 8]}, "R1C6": {"value": null, "candidates": [2, 5, 8]}, "R1C7": {"value": 6, "candidates": []}, "R1C8": {"value": null, "candidates": [1, 2, 5, 8]}, "R1C9": {"value": null, "candidates": [1, 2, 5, 8]}, "R2C1": {"value": 5, "candidates": []}, "R2C2": {"value": 8, "candidates": []}, "R2C3": {"value": 2, "candidates": []}, "R2C4": {"value": 7, "candidates": []}, "R2C5": {"value": null, "candidates": [1, 4]}, "R2C6": {"value": 6, "candidates": []}, "R2C7": {"value": null, "candidates": [3, 4]}, "R2C8": {"value": null, "candidates": [1, 3, 4]}, "R2C9": {"value": 9, "candidates": []}, "R3C1": {"value": 6, "candidates": []}, "R3C2": {"value": 9, "candidates": []}, "R3C3": {"value": 1, "candidates": []}, "R3C4": {"value": null, "candidates": [2, 5, 8]}, "R3C5": {"value": null, "candidates": [2, 4, 8]}, "R3C6": {"value": 3, "candidates": []}, "R3C7": {"value": null, "candidates": [4, 5, 8]}, "R3C8": {"value": null, "candidates": [2, 4, 5, 7, 8]}, "R3C9": {"value": null, "candidates": [2, 4, 5, 8]}, "R4C1": {"value": null, "candidates": [3, 8]}, "R4C2": {"value": null, "candidates": [3, 4]}, "R4C3": {"value": null, "candidates": [6, 8]}, "R4C4": {"value": null, "candidates": [5, 8]}, "R4C5": {"value": 9, "candidates": []}, "R4C6": {"value": 1, "candidates": []}, "R4C7": {"value": 2, "candidates": []}, "R4C8": {"value": null, "candidates": [3, 4, 5, 6, 8]}, "R4C9": {"value": 7, "candidates": []}, "R5C1": {"value": 2, "candidates": []}, "R5C2": {"value": null, "candidates": [1, 3]}, "R5C3": {"value": null, "candidates": [6, 7, 8]}, "R5C4": {"value": 4, "candidates": []}, "R5C5": {"value": null, "candidates": [6, 8]}, "R5C6": {"value": null, "candidates": [5, 7, 8]}, "R5C7": {"value": null, "candidates": [3, 5, 8]}, "R5C8": {"value": null, "candidates": [3, 5, 6, 8, 9]}, "R5C9": {"value": null, "candidates": [3, 5, 6, 8]}, "R6C1": {"value": null, "candidates": [7, 8]}, "R6C2": {"value": 5, "candidates": []}, "R6C3": {"value": 9, "candidates": []}, "R6C4": {"value": null, "candidates": [2, 8]}, "R6C5": {"value": 3, "candidates": []}, "R6C6": {"value": null, "candidates": [2, 7, 8]}, "R6C7": {"value": 1, "candidates": []}, "R6C8": {"value": null, "candidates": [4, 6, 8]}, "R6C9": {"value": null, "candidates": [4, 6, 8]}, "R7C1": {"value": null, "candidates": [1, 3, 7, 8, 9]}, "R7C2": {"value": null, "candidates": [1, 3]}, "R7C3": {"value": 4, "candidates": []}, "R7C4": {"value": null, "candidates": [1, 2, 3, 8]}, "R7C5": {"value": 5, "candidates": []}, "R7C6": {"value": null, "candidates": [2, 8, 9]}, "R7C7": {"value": null, "candidates": [3, 8]}, "R7C8": {"value": null, "candidates": [1, 2, 3, 6, 8]}, "R7C9": {"value": null, "candidates": [1, 2, 3, 6, 8]}, "R8C1": {"value": null, "candidates": [1, 3, 8, 9]}, "R8C2": {"value": 6, "candidates": []}, "R8C3": {"value": null, "candidates": [5, 8]}, "R8C4": {"value": null, "candidates": [1, 2, 3, 8]}, "R8C5": {"value": null, "candidates": [1, 2, 4, 8]}, "R8C6": {"value": null, "candidates": [2, 4, 8, 9]}, "R8C7": {"value": 7, "candidates": []}, "R8C8": {"value": null, "candidates": [1, 2, 3, 4, 5, 8]}, "R8C9": {"value": null, "candidates": [1, 2, 3, 4, 5, 8]}, "R9C1": {"value": null, "candidates": [1, 3, 8]}, "R9C2": {"value": 2, "candidates": []}, "R9C3": {"value": null, "candidates": [5, 8]}, "R9C4": {"value": 6, "candidates": []}, "R9C5": {"value": 7, "candidates": []}, "R9C6": {"value": null, "candidates": [4, 8]}, "R9C7": {"value": 9, "candidates": []}, "R9C8": {"value": null, "candidates": [1, 3, 4, 5, 8]}, "R9C9": {"value": null, "candidates": [1, 3, 4, 5, 8]}}        

The JSON was meant to capture the following Sudoku board:

Article content
The Sudoku board I used to generate the JSON string representation.

I wanted to know if the LLM was able to correctly pick out cells from a row, column and block, determine if a digit is assigned, determine unsolved cells and whether a digit is in any of the candidate lists. I felt these were basic operations that would be required to analyse a Sudoku board.

I created the following prompt:

Below is the current 9x9 Sudoku board represented as JSON string. Each cell is referenced by a cell reference \"RxCy\" which denotes Row x and Column y.
```
{"R1C1": {"value": 4, "candidates": []}, "R1C2": {"value": 7, "candidates": []}, "R1C3": {"value": 3, "candidates": []}, "R1C4": {"value": 9, "candidates": []}, "R1C5": {"value": null, "candidates": [1, 2, 8]}, "R1C6": {"value": null, "candidates": [2, 5, 8]}, "R1C7": {"value": 6, "candidates": []}, "R1C8": {"value": null, "candidates": [1, 2, 5, 8]}, "R1C9": {"value": null, "candidates": [1, 2, 5, 8]}, "R2C1": {"value": 5, "candidates": []}, "R2C2": {"value": 8, "candidates": []}, "R2C3": {"value": 2, "candidates": []}, "R2C4": {"value": 7, "candidates": []}, "R2C5": {"value": null, "candidates": [1, 4]}, "R2C6": {"value": 6, "candidates": []}, "R2C7": {"value": null, "candidates": [3, 4]}, "R2C8": {"value": null, "candidates": [1, 3, 4]}, "R2C9": {"value": 9, "candidates": []}, "R3C1": {"value": 6, "candidates": []}, "R3C2": {"value": 9, "candidates": []}, "R3C3": {"value": 1, "candidates": []}, "R3C4": {"value": null, "candidates": [2, 5, 8]}, "R3C5": {"value": null, "candidates": [2, 4, 8]}, "R3C6": {"value": 3, "candidates": []}, "R3C7": {"value": null, "candidates": [4, 5, 8]}, "R3C8": {"value": null, "candidates": [2, 4, 5, 7, 8]}, "R3C9": {"value": null, "candidates": [2, 4, 5, 8]}, "R4C1": {"value": null, "candidates": [3, 8]}, "R4C2": {"value": null, "candidates": [3, 4]}, "R4C3": {"value": null, "candidates": [6, 8]}, "R4C4": {"value": null, "candidates": [5, 8]}, "R4C5": {"value": 9, "candidates": []}, "R4C6": {"value": 1, "candidates": []}, "R4C7": {"value": 2, "candidates": []}, "R4C8": {"value": null, "candidates": [3, 4, 5, 6, 8]}, "R4C9": {"value": 7, "candidates": []}, "R5C1": {"value": 2, "candidates": []}, "R5C2": {"value": null, "candidates": [1, 3]}, "R5C3": {"value": null, "candidates": [6, 7, 8]}, "R5C4": {"value": 4, "candidates": []}, "R5C5": {"value": null, "candidates": [6, 8]}, "R5C6": {"value": null, "candidates": [5, 7, 8]}, "R5C7": {"value": null, "candidates": [3, 5, 8]}, "R5C8": {"value": null, "candidates": [3, 5, 6, 8, 9]}, "R5C9": {"value": null, "candidates": [3, 5, 6, 8]}, "R6C1": {"value": null, "candidates": [7, 8]}, "R6C2": {"value": 5, "candidates": []}, "R6C3": {"value": 9, "candidates": []}, "R6C4": {"value": null, "candidates": [2, 8]}, "R6C5": {"value": 3, "candidates": []}, "R6C6": {"value": null, "candidates": [2, 7, 8]}, "R6C7": {"value": 1, "candidates": []}, "R6C8": {"value": null, "candidates": [4, 6, 8]}, "R6C9": {"value": null, "candidates": [4, 6, 8]}, "R7C1": {"value": null, "candidates": [1, 3, 7, 8, 9]}, "R7C2": {"value": null, "candidates": [1, 3]}, "R7C3": {"value": 4, "candidates": []}, "R7C4": {"value": null, "candidates": [1, 2, 3, 8]}, "R7C5": {"value": 5, "candidates": []}, "R7C6": {"value": null, "candidates": [2, 8, 9]}, "R7C7": {"value": null, "candidates": [3, 8]}, "R7C8": {"value": null, "candidates": [1, 2, 3, 6, 8]}, "R7C9": {"value": null, "candidates": [1, 2, 3, 6, 8]}, "R8C1": {"value": null, "candidates": [1, 3, 8, 9]}, "R8C2": {"value": 6, "candidates": []}, "R8C3": {"value": null, "candidates": [5, 8]}, "R8C4": {"value": null, "candidates": [1, 2, 3, 8]}, "R8C5": {"value": null, "candidates": [1, 2, 4, 8]}, "R8C6": {"value": null, "candidates": [2, 4, 8, 9]}, "R8C7": {"value": 7, "candidates": []}, "R8C8": {"value": null, "candidates": [1, 2, 3, 4, 5, 8]}, "R8C9": {"value": null, "candidates": [1, 2, 3, 4, 5, 8]}, "R9C1": {"value": null, "candidates": [1, 3, 8]}, "R9C2": {"value": 2, "candidates": []}, "R9C3": {"value": null, "candidates": [5, 8]}, "R9C4": {"value": 6, "candidates": []}, "R9C5": {"value": 7, "candidates": []}, "R9C6": {"value": null, "candidates": [4, 8]}, "R9C7": {"value": 9, "candidates": []}, "R9C8": {"value": null, "candidates": [1, 3, 4, 5, 8]}, "R9C9": {"value": null, "candidates": [1, 3, 4, 5, 8]}}
```

A solved cell is represented by its digit under the key "value".
An unsolved cell has null for value but has a candidate list under the key "candidates", which contains a list of the possible digits that can be likely for this cell.

Answer the following questions.
1. Which cells are in Row 4?
2. Which cells are in Column 7?
3. Which cells are in Block 4?
4. How many unsolved cells are there in Block 3? What are the solved numbers in this block?
5. Which cells have "4" in their candidate list?
6. Is the previous JSON Sudoku table a well-formed table?        

Next, I tried it on various models.

The following were the responses.

o3-mini-high

  1. R4C1, R4C2, R4C3, R4C4, R4C5, R4C6, R4C7, R4C8, R4C9
  2. R1C7, R2C7, R3C7, R4C7, R5C7, R6C7, R7C7, R8C7, R9C7
  3. Row 4: R4C1, R4C2, R4C3, Row 5: R5C1, R5C2, R5C3, Row 6: R6C1, R6C2, R6C3
  4. Unsolved cells: 9 total − 2 solved = 7 unsolved cells. Solved cells: 2 (R1C7 and R2C9, with values 6 and 9)
  5. R2C5, R2C7, R2C8, R3C5, R3C7, R3C8, R3C9, R4C2, R4C8, R6C8, R6C9, R8C5, R8C6, R8C8, R8C9, R9C6, R9C8, R9C9
  6. Yes, the JSON Sudoku table is well-formed. Here's why: Complete Structure: The JSON object contains 81 key-value pairs corresponding to each cell on a 9×9 board (from R1C1 to R9C9). Consistent Key Format: Each cell is identified using a clear cell reference ("RxCy" where x is the row number and y is the column number). Standardised Cell Representation: Each cell's value is an object with two keys: Valid JSON Format: The JSON string adheres to proper syntax with matching braces, quotes around keys, and correct array formatting. All these points confirm that the table is both structurally and syntactically well-formed.

Score: 6/6

Article content
Perfect score for o3-mini-high.

GPT-4o

  1. ['R4C1', 'R4C2', 'R4C3', 'R4C4', 'R4C5', 'R4C6', 'R4C7', 'R4C8', 'R4C9']
  2. ['R1C7', 'R2C7', 'R3C7', 'R4C7', 'R5C7', 'R6C7']
  3. ['R4C1', 'R4C2', 'R4C3', 'R5C1', 'R5C2', 'R5C3', 'R6C1', 'R6C2', 'R6C3']
  4. 7 unsolved cells. Solved numbers in Block 3: [6, 9]
  5. ['R2C5', 'R2C7', 'R2C8', 'R3C5', 'R3C7', 'R3C8', 'R3C9', 'R4C2', 'R4C8', 'R6C8', 'R6C9']
  6. No, because there are fewer than 81 unique valid cells in the JSON structure.

Score: 3/6 (Q2, Q5, Q6 wrong)

Article content
Q2, Q5 and Q6 were wrong for GPT-4o. It did not manage to get number of cells right.

o1

  1. R4C1, R4C2, R4C3, R4C4, R4C5, R4C6, R4C7, R4C8, R4C9
  2. R1C7, R2C7, R3C7, R4C7, R5C7, R6C7, R7C7, R8C7, R9C7
  3. R4C1: {"value": null, "candidates": [3, 8]}, R4C2: {"value": null, "candidates": [3, 4]}, R4C3: {"value": null, "candidates": [6, 8]} , R5C1: {"value": 2, "candidates": []}, R5C2: {"value": null, "candidates": [1, 3]} , R5C3: {"value": null, "candidates": [6, 7, 8]}, R6C1: {"value": null, "candidates": [7, 8]} , R6C2: {"value": 5, "candidates": []}, R6C3: {"value": 9, "candidates": []}
  4. Number of unsolved cells: 7. Solved numbers: 6 and 9
  5. R2C5: [1, 4], R2C7: [3, 4], R2C8: [1, 3, 4] , R3C5: [2, 4, 8], R3C7: [4, 5, 8], R3C8: [2, 4, 5, 7, 8], R3C9: [2, 4, 5, 8], R4C2: [3, 4], R4C8: [3, 4, 5, 6, 8], R6C8: [4, 6, 8], R6C9: [4, 6, 8], R8C5: [1, 2, 4, 8], R8C6: [2, 4, 8, 9], R8C8: [1, 2, 3, 4, 5, 8], R8C9: [1, 2, 3, 4, 5, 8], R9C6: [4, 8], R9C8: [1, 3, 4, 5, 8], R9C9: [1, 3, 4, 5, 8]
  6. Yes. Each cell is referenced exactly once (from R1C1 through R9C9), and each cell object has the required keys: "value" (either a digit or null) and "candidates" (an array). There are exactly 81 cell entries, so the JSON structure is valid and complete for a standard 9×9 Sudoku.

Score: 6/6

Article content
Perfect score and well structured explanation for o1.

4o-mini

  1. R4C1: value = null, candidates = [3, 8], R4C2: value = null, candidates = [3, 4], R4C3: value = null, candidates = [6, 8], R4C4: value = null, candidates = [5, 8], R4C5: value = 9, candidates = [], R4C6: value = 1, candidates = [], R4C7: value = 2, candidates = [], R4C8: value = null, candidates = [3, 4, 5, 6, 8], R4C9: value = 7, candidates = []
  2. R1C7: value = 6, candidates = [], R2C7: value = null, candidates = [3, 4], R3C7: value = null, candidates = [4, 5, 8], R4C7: value = 2, candidates = [], R5C7: value = null, candidates = [3, 5, 8], R6C7: value = 1, candidates = [], R7C7: value = null, candidates = [3, 8], R8C7: value = 7, candidates = [], R9C7: value = 9, candidates = []
  3. R4C1: value = null, candidates = [3, 8], R4C2: value = null, candidates = [3, 4], R4C3: value = null, candidates = [6, 8], R5C1: value = 2, candidates = [], R5C2: value = null, candidates = [1, 3], R5C3: value = null, candidates = [6, 7, 8], R6C1: value = null, candidates = [7, 8], R6C2: value = 5, candidates = [], R6C3: value = 9, candidates = []
  4. Unsolved cells in Block 3: R3C4: value = null, candidates = [2, 5, 8], R3C5: value = null, candidates = [2, 4, 8], R3C6: value = 3, candidates = [], R3C7: value = null, candidates = [4, 5, 8], R3C8: value = null, candidates = [2, 4, 5, 7, 8], R3C9: value = null, candidates = [2, 4, 5, 8]; There are 6 unsolved cells in Block 3. Solved numbers in this block: R3C6: value = 3
  5. R4C2: candidates = [3, 4], R4C3: candidates = [6, 8], R5C3: candidates = [6, 7, 8], R5C5: candidates = [6, 8], R5C6: candidates = [5, 7, 8], R5C7: candidates = [3, 5, 8], R6C3: candidates = [2, 8], R6C6: candidates = [2, 7, 8], R7C4: candidates = [1, 2, 3, 8], R8C4: candidates = [1, 2, 3, 8], R8C5: candidates = [1, 2, 4, 8], R8C6: candidates = [2, 4, 8, 9], R9C8: candidates = [1, 3, 4, 5, 8], R9C9: candidates = [1, 3, 4, 5, 8]
  6. Yes, the JSON structure is well-formed. Each cell has either a value or a candidate list, and the keys follow a consistent format (e.g., "R1C1", "R2C3"). There are no missing or malformed elements in the provided Sudoku table.

Score: 4/6 (Q4 and Q5 wrong)

Article content
GPT-4o got Q4 and Q5 wrong.

DeepSeek

  1. R4C1, R4C2, R4C3, R4C4, R4C5, R4C6, R4C7, R4C8, R4C9.
  2. R1C7, R2C7, R3C7, R4C7, R5C7, R6C7, R7C7, R8C7, R9C7.
  3. R4C1, R4C2, R4C3, R5C1, R5C2, R5C3, R6C1, R6C2, R6C3.
  4. 7 unsolved cells. Solved Numbers: 6, 9.
  5. R2C5, R2C7, R2C8, R3C5, R3C7, R3C8, R3C9, R4C2, R4C8, R6C8, R6C9, R8C5, R8C6, R8C8, R8C9, R9C6, R9C8, R9C9.
  6. Yes. The JSON structure is valid, and no duplicates exist in rows, columns, or blocks among solved cells.

Score: 6/6

Article content
Perfect score for DeepSeek.

Review

It was interesting to see that despite such straightforward questions, not all the LLMs were able to get it right. In particular, only o3, o1 and DeepSeek got all the questions right.

Why did some models miss out on some data which are obvious? I think this would be hard to figure out as the models were black boxes to us and we might never know what happened in the backend. Of course we could argue we would use a RAG if we really wanted to store data in a vector database for future retrieval - but this might not work well with a Sudoku board.

The results here were also not exhaustive and deep - there would be a possibility that a model that scored 6/6 would fail with another given JSON Sudoku table.

The results implied that if the models were to be used programmatically, not all models would respond the same way with the same data. Certain models seemed more likely to give false results. It would be useful to verify that the models were able to capture and retrieve samples of the datasets we provided in the prompt, as the first step to check if the model was working correctly.

#sudoku #codegenerator #generativeAI #LLM #openai #deepseek

To view or add a comment, sign in

More articles by Gerald Yong

Insights from the community

Others also viewed

Explore topics