Example of Agentic AI in Action to help L1 engineer to solve the issue

Example of Agentic AI in Action to help L1 engineer to solve the issue

The Prompt:

"Guide me step by step through troubleshooting a slow application performance issue for a critical business application. You will tell me one action at a time to take. After I take the action, I will write or copy/paste a description of what happened in as my next prompt.

Tell me the first step to take."

The Interaction:

Agent: Okay, let's start by gathering some preliminary information.

Action 1: Please describe the application in question (name, type), when the slowness started to be observed, and who is experiencing the issue (all users, specific locations, etc.). Please be as specific as possible.

(My Response):

The application is called "OrderPro," it's a web-based order processing system. Users started reporting slowness around 10:00 AM today. It seems to be affecting all users, regardless of their location.

Agent: Thank you for that information.

Action 2: Can you please check the server resource utilization (CPU, memory, disk I/O, network I/O) for the server(s) hosting the OrderPro application during the period when the slowness was reported (around 10:00 AM today)? Provide the average and peak utilization values for each resource.

(My Response):

I logged into the primary application server for OrderPro. I checked the performance monitoring tools. The CPU utilization was averaging around 95% between 10:00 AM and 11:00 AM, with peaks hitting 100%. Memory usage was at 80%. Disk I/O was elevated but not saturated, averaging about 60% utilization. Network I/O was normal, around 20%.

Agent: High CPU utilization is a strong indicator.

Action 3: Please identify the top processes consuming CPU on the server during that time (10:00 AM - 11:00 AM). List the process names and their average CPU consumption percentage. This will help us pinpoint the source of the CPU bottleneck.

(My Response):

Using the server's performance monitoring tools, I identified the following top CPU consuming processes during the 10:00 AM - 11:00 AM timeframe:

OrderPro.exe: Averaging 75% CPU

SQLServer.exe: Averaging 15% CPU

w3wp.exe (IIS Worker Process): Averaging 5% CPU

Other system processes: Remaining 5%

Agent: OrderPro.exe consuming the majority of the CPU points toward the application itself as the likely cause.

Action 4: Now, let's examine the OrderPro application logs for any errors, warnings, or unusual activity that coincides with the period of high CPU utilization (10:00 AM - 11:00 AM). Look for anything that might suggest a specific function or process within the application that is causing the issue. Please provide any relevant log entries.

(My Response):

I accessed the OrderPro application logs. I found repeated entries like this between 10:00 and 11:00:

ERROR: Database query timeout - Orders table. OrderID: 12345, Time: 60 seconds

WARNING: Slow processing detected for OrderID: 67890, Customer: ABC Corp

These errors and warnings are significantly more frequent than usual.

Agent: The log entries indicate database query timeouts and slow processing, particularly related to the "Orders" table.

Action 5: Let's focus on the database side. Run a query against the database server (where SQLServer.exe is running) to identify long-running queries that access the "Orders" table. Use a tool like SQL Server Management Studio or your preferred database monitoring tool to identify queries exceeding a reasonable execution time (e.g., > 5 seconds). Please provide the text of the long-running queries.

(My Response):

I connected to the SQL Server and ran a query to identify long-running queries. I found the following query running for an extended period:

SELECT * FROM Orders WHERE CustomerID = 'ABC Corp' AND OrderDate < '2024-01-01' ORDER BY OrderDate DESC

It was running for over 30 seconds.

Agent: That long-running query appears to be the culprit.

Action 6: Analyze the query execution plan for the long-running query. Look for any missing indexes, inefficient joins, or other performance bottlenecks. Provide a summary of your findings from the execution plan analysis.

(My Response):

After reviewing the execution plan for the long-running query, it appears a full table scan is occurring on the 'Orders' table because there's no index on the CustomerID column. The query optimizer isn't able to efficiently narrow down the rows being searched, resulting in the slow performance.

Agent: The missing index on CustomerID is the root cause.

Action 7: Create an index on the CustomerID column of the Orders table. The syntax would be something like: CREATE INDEX IX_Orders_CustomerID ON Orders (CustomerID); After creating the index, re-run the long-running query and provide the execution time.

(My Response):

I executed the following SQL command: CREATE INDEX IX_Orders_CustomerID ON Orders (CustomerID); The index was created successfully. I then re-ran the problematic query: SELECT * FROM Orders WHERE CustomerID = 'ABC Corp' AND OrderDate < '2024-01-01' ORDER BY OrderDate DESC. This time, it executed in less than 1 second.

Agent: Excellent! Creating the index significantly improved query performance.

Action 8: Monitor the OrderPro application and server resource utilization (CPU, memory) for the next hour to confirm that the slowness issue is resolved and CPU utilization has returned to normal levels. Report your findings.

(My Response):

I monitored the OrderPro application and server resources. After an hour of monitoring, the CPU utilization on the server is now averaging around 20%, with occasional peaks to 40% during high traffic periods. The application is responding much faster, and users are no longer reporting slowness.

Agent: Great. It appears the issue has been resolved.

Action 9: Document the incident, the troubleshooting steps taken, and the resolution (creating the index) in your IT incident management system. This will help with future troubleshooting efforts.

(My Response):

I've documented the entire incident, including the initial reports of slowness, the troubleshooting steps we followed, the root cause (missing index), and the resolution (creating the index) in our IT incident management system.

Agent: The incident is now successfully closed.

To view or add a comment, sign in

More articles by Nikhil A.

Insights from the community

Others also viewed

Explore topics