LLM Use Case Investigated: Debugging & Code Quality. How good is Sonnet 3.7 Extended Thinking as a Debugging Partner?
Credit given to ChatGPT

LLM Use Case Investigated: Debugging & Code Quality. How good is Sonnet 3.7 Extended Thinking as a Debugging Partner?


While I have found that LLMs make good junior developers, I have not reached the point where they are producing good code consistently (even when building on a strong predefined architecture) where I would unleash them on my codebase and have them start adding new features - and I say this as someone who WANTS to do this. LLMs are still too prone to "sidequesting" (where the LLM introduces features or functionality that you didn't ask for) and "death spirals" (where the LLM takes the code base and tries to beat into submission by changing unrelated code which breaks other things). So while my thought experiment to build an enterprise-grade application entirely with an LLM continues - I decided to stop and examine a use case which is more in-line with a reasoning model; how well can Sonnet 3.7 Extended Thinking handle finding issues in a codebase.

In this particular case I wanted to use Claude to eat its own dogfood. I had a git version of code that I had "abandoned" because Claude Sonnet 3.5 had gone into a death spiral trying to find and fix a particular issue. Typically I will give the LLM some period of time to find the issue before I abandon a branch and create a new branch - taking some of the lessons learned from the previous branch to update the prompt and starting the process again. For this example, Sonnet 3.5 kept death spiraling - and in one case actually introduced a fix that hid the error.

The error is in this small block of code where the system should be outputting two types of nodes - layout nodes (handle position and orientation information) and definition nodes (handle node properties and other business logic). This is a section of code that Sonnet 3.5 wrote earlier in the project before there was a clean separation of the two types of nodes and Sonnet 3.5 didn't realize that this section of code was missing this.

Article content
The defect

As I wanted to determine where the death spiral was coming from, I started researching where the actual issue was myself. I didn't intend to change the code, I just wanted to see if there would be a way to build a prompt so that Claude would be able to isolate the issue itself. Given the size of the codebase (about 80 files in this particular front end section of the code), it took some time to actually find the issue myself - several hours spread across a few days. I'll give myself the benefit of the doubt and assume that since I put in logging, that made it statistically easier for Sonnet to find the issue; nevertheless it was a non-trivial amount of time to locate the defect.

Once I found it, I wanted to see how long it would take for Sonnet 3.7 with extended thinking to find it and compare that to what Sonnet 3.5 would do.

So I pasted the log into Claude.ai and here is the result:


Article content
The first prompt

Note that this is a new session. There is nothing else in the context window except for the log file output as it comes directly from the browser. So this is about 10 seconds.

The internal log of the thought process is:

I need to analyze these log messages from the FlowEditor.tsx file and understand what they're indicating about the project.
Looking at these logs, here's what I can observe:
The logs are from a React component called FlowEditor.tsx and contain information about a flow state management.
The flow appears to be part of a graph-based system with nodes and edges (connections), likely using something like React Flow or a similar library.
There's a lifecycle of operations happening: Flow after addNode (line 405) Flow after setNodes (line 424) Updating flow in project callback (line 87) Flow after updateFlowInProject (line 429) Node changes callback (line 210) Processing node changes (line 225) Update flow due to something other than removal (line 285)
The flow structure contains: ID: 'flow-1741661470132' Name: 'asdadad' Type: 'flow' Purpose: '' (empty) Definition with nodes and connections Layout with nodes and edges Metadata with creation/update timestamps
A single node was added with: ID: 'a351931c-c997-4d07-86d1-af2696983e41' Type: 'knowledge-base' Has config and data properties
There seems to be an inconsistency in the flow: Initially a node is added to the flow The flow is updated with this node But in the last part of the logs, the nodes array suddenly becomes empty (length: 0) Yet the layout still shows a node in its array
The timestamps show this all happened on March 11, 2025 (very close to the current date).
The main issue I'm seeing is that there's a discrepancy between the flow definition and layout. At the end, the flow's definition.nodes array is empty, but the layout.nodes array still contains the node. This suggests there might be a synchronization issue between the flow definition and layout, which could lead to inconsistent state in the application.
This could be a bug where nodes are being removed from the definition but not from the layout, or vice versa. It's also possible that there's some expected behavior where nodes move between these structures during different operations, but the logs suggest this might not be working as intended.

That by itself is really interesting as it is able to determine very specifically:

  1. there is an issue - I did NOT tell it that there was an issue
  2. what the type of issue is
  3. where in the codebase the issue might be

Previously I had given this same code to Sonnet 3.5 and it tried to fix the issue and it had a well meaning approach to validate project structures and "fix the missing data" - i.e. it would create new objects where the project failed validation. Now that was both obviously stupid and downright dangerous. In fact, one of the reasons that this bug went undetected for so long was that Sonnet 3.5 was trying to fix the project through validation instead of just saying "hey, there is an error". This is something that companies using LLMs for large datacentric projects MUST be on the lookout for. If you ask an LLM to fix an error - it may fix it in a way that masks a REAL issue.

So I went forward to see if it could complete the loop and get to the same level of understanding that I had.


Article content
Sonnet 3.7 finds the issue

The thought process is as interesting to read as it gives some insight into its reasoning process - and that took 14 seconds.


To isolate the source of the discrepancy without changing the code, I need to carefully analyze the sequence of events in the logs and identify where the flow definition and layout become out of sync.
Let's trace through the logs chronologically:
Initially, a node is added:
FlowEditor.tsx:405 Flow after addNode        
FlowEditor.tsx:405 Flow after addNode
At this point, both the flow definition and layout are in sync. The definition contains a node of type 'knowledge-base' with ID 'a351931c-c997-4d07-86d1-af2696983e41', and the layout also contains this node.
Then we see:
FlowEditor.tsx:424 Flow after setNodes        
FlowEditor.tsx:424 Flow after setNodes
The flow still looks good here - definition and layout both have the node.
Next, there's a call to update the flow in the project:
FlowEditor.tsx:87 Updating flow in project callback: 
{flowId: 'flow-1741661470132'}        
FlowEditor.tsx:87 Updating flow in project callback: {flowId: 'flow-1741661470132'}
The flow still seems consistent here.
Then we have:
FlowEditor.tsx:429 Flow after updateFlowInProject        
FlowEditor.tsx:429 Flow after updateFlowInProject
At this point, the flow still has the node in both definition and layout.
Next, a node change event occurs:
FlowEditor.tsx:210 Node changes callback: 
[{…}]
0: {id: 'a351931c-c997-4d07-86d1-af2696983e41', type: 'dimensions', dimensions: {…}}        
FlowEditor.tsx:210 Node changes callback: [{…}] 0: {id: 'a351931c-c997-4d07-86d1-af2696983e41', type: 'dimensions', dimensions: {…}}
This shows a node change of type 'dimensions' for the same node ID.
Then:
FlowEditor.tsx:225 Processing node changes 
{changes: Array(1)}        
FlowEditor.tsx:225 Processing node changes {changes: Array(1)}
Finally, the critical point:
FlowEditor.tsx:285 Update flow due to something other than removal
FlowEditor.tsx:286 
{
  ...
  definition: {
    connections: [],
    id: "flow-1741661470132",
    metadata: {...},
    name: "asdadad",
    nodes: Array(0),  // <-- This is now empty!
    version: "1.0.0"
  },
  ...
  layout: {
    id: 'flow-1741661470132', 
    nodes: Array(1), // <-- This still has 1 node
    edges: Array(0)
  },
  ...
}        
FlowEditor.tsx:285 Update flow due to something other than removal FlowEditor.tsx:286 { ... definition: { connections: [], id: "flow-1741661470132", metadata: {...}, name: "asdadad", nodes: Array(0), // <-- This is now empty! version: "1.0.0" }, ... layout: { id: 'flow-1741661470132', nodes: Array(1), // <-- This still has 1 node edges: Array(0) }, ... }
At this point, the flow definition's nodes array is empty, but the layout still contains the node. This happens after processing the node dimensions change.
Looking at the specific log messages, there are a few key clues:
Line 285 says "Update flow due to something other than removal" - which suggests the code has a branch for handling node removal vs. other changes.
The node change that triggered this was of type 'dimensions', not a removal.
The issue appears to be in how the FlowEditor handles node changes of type 'dimensions'. After receiving this change, it seems to be updating the flow with an empty nodes array in the definition, while preserving the node in the layout.
Based on the logs, I can narrow down the source of the discrepancy to the code that handles node changes, specifically:
The node changes callback (around line 210)
The processing of those changes (around line 225)
The flow update logic for "something other than removal" (around line 285)
The problem is most likely in the logic that updates the flow definition after a dimensions change. It seems to be either:
Starting with a fresh flow definition that doesn't have the nodes
Not properly preserving the existing nodes when updating for dimension changes
Having some race condition where an outdated flow state (before the node was added) is being used
Without seeing the actual code, my best assessment is that the issue is in how the flow definition is constructed or updated in the branch that handles node dimension changes - somewhere around line 285 in FlowEditor.tsx.

So the last step is to actually have it validate that the conclusion is correct:


Article content
Reading the actual project files on the filesystem


Article content
Analyzing them to validate the solution

So what does all of this mean?

First, it means that Sonnet 3.7 Extended Thinking is very very good at being a debugging buddy. So now when I have issues, I will turn them over to Sonnet and see if it can determine what the issue is. In less than 30 seconds it was able to bring me to the root cause of an issue. Don't know if it will actually fix it, but its ability to find it is significant in itself.

Second, it STILL means that LLMs can create some truly suspect code as the original bug was written by and glossed over by Sonnet 3.5 while being patched by another section of code that actually hid this discrepancy until I start having it take out extra dubious code.

Thus you still need a human in the loop (and I suspect we will for some time yet). Many of these systems mean well and try to do what you ask, but if you just accept what they give you at face value - you could be opening yourself to a whole new level of punishment. You wouldn't blindly copy/paste from StackOverflow, you shouldn't do it here either. At least - not yet :)


#claude #sonnet #anthropic

Marty Smith

Driving AI & Digital Innovation | Scaling SaaS & Cloud | Cybersecurity/Risk/AML | Transforming Data into Growth | Award-Winning CIO/CTO | High-Growth Exec | Keynote Speaker | Board Advisor

1mo

Great article, love all the detail. Thanks for sharing.

To view or add a comment, sign in

More articles by Gregory Pierce, MBA

Insights from the community

Others also viewed

Explore topics