My First Exploration with OpenAI o1-preview
Can o1-preview handle Sudoku, where the generalised Sudoku is known to be an NP-complete problem?
There is a lot of excitement and online buzz relating to OpenAI's o1-preview model, which is supposed to be able to "think" and formulate its route of reasoning on the fly. That is really interesting, because this new ability can allow it to handle more complex tasks.
What better way to try this out by asking it to solve something that takes a lot of reasoning and steps, like a Sudoku game? I regularly play Sudoku games on my mobile phone via a popular Sudoku app, though I won't say it gives me many minutes of pure joy (more like frustration, in fact!), but it does help to keep my brain activated for a while, and kill some (or many) brain cells in the process. Now I am so happy that I can finally outsource thinking to an AI and save my brain cells!
I fired up my Sudoku app and picked an "Expert" level game. Next I transposed the game board manually into text, so that ChatGPT can pick it up from a prompt. This was the eventual prompt I sent to ChatGPT using the o1-preview model:
Can you help me solve this sudoko puzzle, in which blanks are represented by _. The rows of _ and numbers correspond to the rows in the Sudoku puzzle.
_ _ _ 1 _ _ _ _ _
_ _ 2 _ _ _ _ _ _
6 5 _ _ 9 _ _ _ _
_ _ 7 _ _ _ _ 9 _
8 _ _ _ 2 _ _ _ _
2 _ _ _ 8 _ _ 5 1
5 _ _ _ _ 7 _ 3 _
_ _ 9 3 _ _ _ 7 8
_ 7 3 _ 6 8 _ 2 5
For comparison, I also sent the same prompt to ChatGPT using the GPT-4o model, so that I could compare the responses.
And as a further comparison, I attempted to solve it myself... took me a good half hour with some mistakes along the way.
Solving via o1-preview
o1 started by taking some time to think. I have observed sometimes it just gives the solution directly, and sometimes it actually took time to list out the reasoning steps. In this case it explained its reasoning after 35s... that is a feat for solving an Expert Sudoku puzzle.
The steps it listed were:
It started off on a promising track:
But then it realised that there were inconsistencies while trying to solve Row 9.
And then it decided that there were errors in my grid.
So the model blamed me for typo errors because the puzzle wasn't solvable.
So, I decided to help it a little. I told it the solution to the first row (which I had worked out manually).
The first row answer should be
9 3 8 1 7 5 2 4 6
It took another 57s, then started to solve it.
And it came to the conclusion there was still an error.
Now, I was getting a little impatient, so I tried it help it along more. I gave it the solution to the first column, which I had worked out.
There is no issue with the puzzle.
The first column answer is
9
7
6
3
8
2
5
1
4
This time, it spent another 94s to work it out, and finally came out with a solution.
Which unfortunately, was still not correct. If you look at the middle 3 grids, each of them had a duplicate number - 9 was repeated twice in the top middle grid, "1" was repeated in the middle grid, and "2" was repeated in the bottom middle grid.
So, I decided to offer it the solution that I had worked out myself and asked if it knew why it was wrong.
Your answer is still wrong. There are duplicate numbers in each square cell.
The correct answer should be:
Row 1: 9 3 8 | 1 7 5 | 2 4 6
Row 2: 7 4 2 | 8 3 6 | 5 1 9
Row 3: 6 5 1 | 4 9 2 | 3 8 7
------------------------------
Row 4: 3 6 7 | 5 4 1 | 8 9 2
Row 5: 8 1 5 | 7 2 9 | 4 6 3
Row 6: 2 9 4 | 6 8 3 | 7 5 1
------------------------------
Row 7: 5 8 6 | 2 1 7 | 9 3 4
Row 8: 1 2 9 | 3 5 4 | 6 7 8
Row 9: 4 7 3 | 9 6 8 | 1 2 5
Do you know why you are wrong?
It sounded apologetic and explained how it could have done better.
Here's the final result:
Human: 1
OpenAI: 0
Solving via GPT-4o
I sent the same prompt to GPT-4o and the answer came up almost immediately. It seemed that it was supposed to use an internal Sudoku solver module but somehow that didn't get initiated. I am guessing its Sudoku minion is on maintenance vacation.
That was a good start! Until I realised that the last column had 2 "8"s.
To be consistent, I offered the solution to the first row, as I did with o1-preview. Here was its reply.
Again, the solution came up almost immediately. Until I realised some of the 3x3 grid blocks had duplicate numbers. e.g. the middle bottom grid had 2 "8"s and the last column had 2 "1"s, both in the middle right grid.
To be consistent, I offered the solution to the first column, as I did with o1-preview. Here was its reply.
This time the solution came almost immediately, but... the last column still had 2 "9"s and the second last row had 2 "8"s.
So, exasperated, I offered it the solution, as I did with o1-preview.
So it said it knew it had to ensure the solution adhered to the rules, yet it didn't do it. Why? It That sounds almost human-like - I remembered the times people were upset at us when we know there's a process or there are rules, yet we didn't follow them. AI shouldn't become complacent like us, even though it's modelled after our thinking.
Anyway, here's the final result:
Human: 2
OpenAI: 0
Conclusion
It's good to know that there are still some things that humans can still win over AI, at least for now. As AIs get better, solving Sudoku puzzles should neither be a hard nor impossible feat.
It's still good to be wary not to trust AI's answers blindly, even when their response sounds confident (especially when they are absolutely convinced you made a typo error on your end - has AI perfected gaslighting too?). We will still need to do our due diligence to verify that the results are credible, stress-testing it with various test cases. I think it's inevitable that we may wish to outsource our thinking to AI in the future - even the promotional videos for o1-preview show a developer asking it to code a complete game - but even then we should possibly do some thinking to verify the results. We shouldn't stop all thinking just because we have AI to think for us.
I had done some online search to see if others had success with Sudoku, and I did see some successful ones. Perhaps, I am lucky (or unlucky) to catch OpenAI on a bad day.
#PromptEngineering #genAi #generativeAI #chatGPT #o1-preview #sudoku