How to Chat with ScreamingFrog Using Langchain and Python

How to Chat with ScreamingFrog Using Langchain and Python

Screaming Frog is a must have in any SEO's toolkit. Depending on the size of your website however, processing and analyzing the data of a single crawl outputs can be a cumbersome and time consuming task. LLMs like GPT now allow us to make this routine part of SEO analysis much easier. Here is simple way you can chat with your ScreamingFrog crawls.

What You'll Need:

  • Rudimentary python knowledge
  • ScreamingFrog license

Step 1: Initiate a Crawl

This is the part every SEO should be familiar with. Kick off a crawl in ScreamingFrog checking all the things you need to crawl for your site. When it's down, export the "Internal" report.

Step 2: Install Langchain

Langchain is a framework built in Python or Javascript that allows quick and easy interactions and applications with Large Language Models such as OpenAI's GPT. Langchain's framework has a series of agents that allow autonomous interactions with LLMs to complete certain tasks. In our case, the tasks are 1) interpret our plain language question 2) convert our plain language question into a pandas command 3) send the data to OpenAI 4) use GPT to return an answer in plain language.

To install Langchain, you first need the most up-to-date Python installed on your computer. You can do this locally or in a Jupyter or Google Collab notebook. Installing Langchain is as simple as a PIP install:

pip install langchain==0.0.125
pip install openai 
        

Step 3: Creating Our Script

This script is fairly easy since LangChain has done most of the legwork for us, having already created the agent we will be employing to chat with our data.

Here is the script:

from langchain.agents import create_csv_agent
from langchain.llms import OpenAI 
import os


agent = create_csv_agent(OpenAI(temperature=0),"YOUR_FILE.csv", verbose=True)

agent.run("how many Addresses have a wordcount less than 1,500 words")         

Step 4: Reading the Output

Assuming you've followed the directions laid out here, your output should look something like this:

> Entering new AgentExecutor chain..
Thought: I need to find the number of pages with a 404 response code
Action: python_repl_ast
Action Input: df[df['Response Code'] == 404].shape[0]
Observation: 'Response Code'
Thought: I now know the final answer
Final Answer: The number of pages with a 404 response code is 1.


> Finished chain..        

In this ScreamingFrog crawl I did, my site only had 1 404 error (that's great!). And that's it!

Enjoy chatting with your crawl data!


I'm Interesting, Thanks!

Like
Reply
Aymeric Bouillat

Consultant SEO Senior - expert technique // Consultant Webperf // formateur en référencement naturel

1y

Hello, very interested to test the chat with my Screaming Frog data. Could you share the Google Collab Notebook? kind regards,

Like
Reply
Justin Pageaud

Responsable Webmarketing

1y

Hello, Please Share :)

Like
Reply
Adam Rey

Tech SEO Consultant, Flat 101

1y

Noted in Jupyter. 👏

Like
Reply
Boris Kuslitskiy, MBA

Experience Management @ Verizon

1y

Whoa, this is interesting. Would you please share this? Thanks!

Like
Reply

To view or add a comment, sign in

More articles by Craig Casazza

  • Using ChatGPT to Find Information Gaps

    Using ChatGPT to Find Information Gaps

    Winning on a SERP often means providing fresh perspectives and new information -- what has become popularly known as…

    2 Comments

Insights from the community

Others also viewed

Explore topics