My First Exploration with OpenAI o1-preview
Image generated by OpenAI Dall-E.

My First Exploration with OpenAI o1-preview

Can o1-preview handle Sudoku, where the generalised Sudoku is known to be an NP-complete problem?

There is a lot of excitement and online buzz relating to OpenAI's o1-preview model, which is supposed to be able to "think" and formulate its route of reasoning on the fly. That is really interesting, because this new ability can allow it to handle more complex tasks.

What better way to try this out by asking it to solve something that takes a lot of reasoning and steps, like a Sudoku game? I regularly play Sudoku games on my mobile phone via a popular Sudoku app, though I won't say it gives me many minutes of pure joy (more like frustration, in fact!), but it does help to keep my brain activated for a while, and kill some (or many) brain cells in the process. Now I am so happy that I can finally outsource thinking to an AI and save my brain cells!

I fired up my Sudoku app and picked an "Expert" level game. Next I transposed the game board manually into text, so that ChatGPT can pick it up from a prompt. This was the eventual prompt I sent to ChatGPT using the o1-preview model:

Can you help me solve this sudoko puzzle, in which blanks are represented by _. The rows of _ and numbers correspond to the rows in the Sudoku puzzle.

_ _ _ 1 _ _ _ _ _
_ _ 2 _ _ _ _ _ _
6 5 _ _ 9 _ _ _ _
_ _ 7 _ _ _ _ 9 _
8 _ _ _ 2 _ _ _ _ 
2 _ _ _ 8 _ _ 5 1
5 _ _ _ _ 7 _ 3 _ 
_ _ 9 3 _ _ _ 7 8
_ 7 3 _ 6 8 _ 2 5        

For comparison, I also sent the same prompt to ChatGPT using the GPT-4o model, so that I could compare the responses.

And as a further comparison, I attempted to solve it myself... took me a good half hour with some mistakes along the way.

Solving via o1-preview

o1 started by taking some time to think. I have observed sometimes it just gives the solution directly, and sometimes it actually took time to list out the reasoning steps. In this case it explained its reasoning after 35s... that is a feat for solving an Expert Sudoku puzzle.

The steps it listed were:

  1. Fill in the obvious numbers
  2. Analyze row 9.
  3. Analyze conflicts.
  4. Identify the inconsistency.
  5. Conclusion

It started off on a promising track:

Article content

But then it realised that there were inconsistencies while trying to solve Row 9.

Article content

And then it decided that there were errors in my grid.

Article content

So the model blamed me for typo errors because the puzzle wasn't solvable.

Article content

So, I decided to help it a little. I told it the solution to the first row (which I had worked out manually).

The first row answer should be 
9 3 8 1 7 5 2 4 6        

It took another 57s, then started to solve it.

Article content

And it came to the conclusion there was still an error.

Article content

Now, I was getting a little impatient, so I tried it help it along more. I gave it the solution to the first column, which I had worked out.

There is no issue with the puzzle.
The first column answer is 
9
7 
6
3
8
2
5
1
4        

This time, it spent another 94s to work it out, and finally came out with a solution.

Article content

Which unfortunately, was still not correct. If you look at the middle 3 grids, each of them had a duplicate number - 9 was repeated twice in the top middle grid, "1" was repeated in the middle grid, and "2" was repeated in the bottom middle grid.

So, I decided to offer it the solution that I had worked out myself and asked if it knew why it was wrong.

Your answer is still wrong. There are duplicate numbers in each square cell.

The correct answer should be:
Row 1: 9 3 8 | 1 7 5 | 2 4 6
Row 2: 7 4 2 | 8 3 6 | 5 1 9
Row 3: 6 5 1 | 4 9 2 | 3 8 7
------------------------------
Row 4: 3 6 7 | 5 4 1 | 8 9 2
Row 5: 8 1 5 | 7 2 9 | 4 6 3
Row 6: 2 9 4 | 6 8 3 | 7 5 1
------------------------------
Row 7: 5 8 6 | 2 1 7 | 9 3 4
Row 8: 1 2 9 | 3 5 4 | 6 7 8
Row 9: 4 7 3 | 9 6 8 | 1 2 5

Do you know why you are wrong?        

It sounded apologetic and explained how it could have done better.

Article content

Here's the final result:

Human: 1
OpenAI: 0        

Solving via GPT-4o

I sent the same prompt to GPT-4o and the answer came up almost immediately. It seemed that it was supposed to use an internal Sudoku solver module but somehow that didn't get initiated. I am guessing its Sudoku minion is on maintenance vacation.

Article content

That was a good start! Until I realised that the last column had 2 "8"s.

To be consistent, I offered the solution to the first row, as I did with o1-preview. Here was its reply.

Article content

Again, the solution came up almost immediately. Until I realised some of the 3x3 grid blocks had duplicate numbers. e.g. the middle bottom grid had 2 "8"s and the last column had 2 "1"s, both in the middle right grid.

To be consistent, I offered the solution to the first column, as I did with o1-preview. Here was its reply.

Article content

This time the solution came almost immediately, but... the last column still had 2 "9"s and the second last row had 2 "8"s.

So, exasperated, I offered it the solution, as I did with o1-preview.

Article content

So it said it knew it had to ensure the solution adhered to the rules, yet it didn't do it. Why? It That sounds almost human-like - I remembered the times people were upset at us when we know there's a process or there are rules, yet we didn't follow them. AI shouldn't become complacent like us, even though it's modelled after our thinking.

Anyway, here's the final result:

Human: 2
OpenAI: 0        

Conclusion

It's good to know that there are still some things that humans can still win over AI, at least for now. As AIs get better, solving Sudoku puzzles should neither be a hard nor impossible feat.

It's still good to be wary not to trust AI's answers blindly, even when their response sounds confident (especially when they are absolutely convinced you made a typo error on your end - has AI perfected gaslighting too?). We will still need to do our due diligence to verify that the results are credible, stress-testing it with various test cases. I think it's inevitable that we may wish to outsource our thinking to AI in the future - even the promotional videos for o1-preview show a developer asking it to code a complete game - but even then we should possibly do some thinking to verify the results. We shouldn't stop all thinking just because we have AI to think for us.

I had done some online search to see if others had success with Sudoku, and I did see some successful ones. Perhaps, I am lucky (or unlucky) to catch OpenAI on a bad day.

#PromptEngineering #genAi #generativeAI #chatGPT #o1-preview #sudoku

To view or add a comment, sign in

More articles by Gerald Yong

Explore topics