The biggest issue with vibe coding
Most experienced software engineers will tell you that majority of time they spend is not about creating new code.
It is debugging the existing one.
Vibe coding is an amazing way to create new code effectively, but AI based systems are still struggling with debugging.
So the real vibe would be vibe debugging not vibe coding!
I wanted to understand little more the challenges of self debugging systems.
But instead of highly polished perfect code (it is never like that), I tried to play with something that is closer to real life example of software development:
I will try to automate step 3 and see what challenges AI will encounter (and what effective strategies I find to resolve them).
(stick with me—even if you're not a software engineer—because I'll share interesting insights that will help you craft better AI prompts, even for non-coding tasks!)
The simple self-debugging sandbox I created looked like this:
So let's play with Debug and Create New Version of Code box.
Here are the experiments and results
Experiment 1: Just ask simple prompt
I started with just simple prompt asking AI to debug a code and update it:
Debug provided code and create updated version of code that will solve issues from failing tests.
here is code:
{code}
here is failed test results of function 'segment_text_pro'
{failed_test}
{code} is entire code I'm running and {failed_test} contains info about input parameters, expected result and actual result of failing test case.
Unfortunately updated versions were not even close to fixing issues. AI followed completely wrong path. It claims that it need to fix code inside branch of 'if' condition even there was no way that path was executed.
n = 3
if n > 3:
<I want to fix this code !!!!>
else:
<defect is definitely not here>
I even tried more advanced model but it won't helped at all.
Experiment 2: Give me some steps
I decided to give AI some steps it can follow while debugging. Here is the prompt I used:
What return statement actually return value in failing case?
Follow these steps:
1. return only the line that finishes the code
2. propose code changes that will fix that specific issue
here is code:
{code}
here are tests results of function 'segment_text_pro'
{tests}
that did not help at all but I've noticed that it finally started to focus on right part of the code.
So the next experiment was:
Experiment 3: Here is the solution
I gave AI the exact location of the code that it should focus on
This element of the code is wrong:
if n == len(segments['text']):
return segment_simple_text(input_string, n)
That was a significant progress. Even the test was still failing AI made some progress.
It fixed right place in code. Yet the solution was not perfect - it would require another iteration of debug.
The fact that I literally gave AI the answer where to look is also not a scalable solution but the insight was important - divide action to simple steps instead of 1 big task "fix it".
And if you think about that for a second that is what software engineer will do. Start with understanding what path was executed, why that one and where to look for the rootcause.
Experiment 4: Baby steps
Now debugging is divided into 2 steps which would look like:
Step 1: Identify what return statement was executed in failed case.
Step 2: Knowing that program executed to that path and returned wrong value, focus on that part of code and fix the bug
Bug was still not fixed but identification of executed path worked perfectly.
Experiment 5: From baby steps to walking
Step 1: Identify what return statement was executed in failed case.
Step 2: Explain why that code was executed for the failing test
Step 3: Based on that explanation fix the issue for the failing case
Dividing the flow into 3 separate steps (each was separate prompt) significantly improved the outcome.
That structure was able to get similar results like experiment 3. But instead of case 3 it was able to do this automatically.
Recommended by LinkedIn
However the results were not always right. In some cases it fixes bug in some cases the solution was wrong.
Experiment 6: Rethink what you just did
Knowing that the flow from experiment 5 gives quite good results I tried to see what will happen if it would iterate and repeat same process until issue is fixed.
If new code fixes bug - we are done!
If new code causes regression (passrate went down) - revert code to previous version
If new code does not break anything but bug is not fixed - run tests with new code, debug and create new fix.
The results were amazing - after no more than 3 iterations it was able to fix the issue correctly.
I've mentioned before that initial fix was exposing another issue but AI diagnosed it in next iteration and also fixed it.
Test passed but there was one significant issue with the code. But I will get back to this later. There is one more experiment I tried:
Experiment 7: Let's make it harder now
Knowing that this specific test case is passing now I created new one. This was really tricky (yes after spending so many years in testing software I'm always looking for a new corner case).
So far the AI has to debug only the main function. For this new test case it had to find out that one of the subfunctions is returning incorrect value and move its debugging efforts to different function. On top of that, even thought the fix was very simple, it required to connect two facts from different places of the code.
It failed miserably.
It continues to focus on main function and either produces the same code changes over and over or just trying to rewrite entire code.
It was clear that it needs some hint to also consider how other functions behave.
Here is updated flow:
Step 1: Identify what return statement was executed in failed case.
Step 2: Explain why that code was executed for the failing test
Things to consider while debugging:
- what if functions used for preparing data are not working correctly?
Step 3: Based on that explanation proposed 3 ideas how code can be updated to produce correct result for this failing case:
Step 4: select one of the ideas for code fixes and fix the issue
2 things changed here:
These changes led to correct identification of the issue. AI was able to identify and explain where there is an issue in code but it was not able to create correct fix.
I decided to stop at this point and summarize what I've learned so far—it's already a lot, don't you think?
So here they are.
Insights
Limitations
Do you want to play with this self debugging AI?
Reply 'yes' in comments
I can put it online and give you a private access to it so you can try that experimental code.
🦾🦾 Cognitive AI, Edge Case Simulation, V&V for ADS and ADAS with cogniPROVE + Synthetic Data Creation in Human Behavior Modeling Traffic environments @cogniBIT.ai
1moHelpful as always, Jan, thank you!
Director at Macnica
1moVery helpful