How to Chat with ScreamingFrog Using Langchain and Python
Screaming Frog is a must have in any SEO's toolkit. Depending on the size of your website however, processing and analyzing the data of a single crawl outputs can be a cumbersome and time consuming task. LLMs like GPT now allow us to make this routine part of SEO analysis much easier. Here is simple way you can chat with your ScreamingFrog crawls.
What You'll Need:
Step 1: Initiate a Crawl
This is the part every SEO should be familiar with. Kick off a crawl in ScreamingFrog checking all the things you need to crawl for your site. When it's down, export the "Internal" report.
Step 2: Install Langchain
Langchain is a framework built in Python or Javascript that allows quick and easy interactions and applications with Large Language Models such as OpenAI's GPT. Langchain's framework has a series of agents that allow autonomous interactions with LLMs to complete certain tasks. In our case, the tasks are 1) interpret our plain language question 2) convert our plain language question into a pandas command 3) send the data to OpenAI 4) use GPT to return an answer in plain language.
To install Langchain, you first need the most up-to-date Python installed on your computer. You can do this locally or in a Jupyter or Google Collab notebook. Installing Langchain is as simple as a PIP install:
pip install langchain==0.0.125
pip install openai
Recommended by LinkedIn
Step 3: Creating Our Script
This script is fairly easy since LangChain has done most of the legwork for us, having already created the agent we will be employing to chat with our data.
Here is the script:
from langchain.agents import create_csv_agent
from langchain.llms import OpenAI
import os
agent = create_csv_agent(OpenAI(temperature=0),"YOUR_FILE.csv", verbose=True)
agent.run("how many Addresses have a wordcount less than 1,500 words")
Step 4: Reading the Output
Assuming you've followed the directions laid out here, your output should look something like this:
> Entering new AgentExecutor chain..
Thought: I need to find the number of pages with a 404 response code
Action: python_repl_ast
Action Input: df[df['Response Code'] == 404].shape[0]
Observation: 'Response Code'
Thought: I now know the final answer
Final Answer: The number of pages with a 404 response code is 1.
> Finished chain..
In this ScreamingFrog crawl I did, my site only had 1 404 error (that's great!). And that's it!
Enjoy chatting with your crawl data!
Fattoretto Agency
1yI'm Interesting, Thanks!
Consultant SEO Senior - expert technique // Consultant Webperf // formateur en référencement naturel
1yHello, very interested to test the chat with my Screaming Frog data. Could you share the Google Collab Notebook? kind regards,
Responsable Webmarketing
1yHello, Please Share :)
Tech SEO Consultant, Flat 101
1yNoted in Jupyter. 👏
Experience Management @ Verizon
1yWhoa, this is interesting. Would you please share this? Thanks!