15.8 C
New York
Sunday, September 24, 2023

5 Python scripts for automating website positioning duties

Python is a robust programming language that has gained reputation within the website positioning business over the previous few years. 

With its comparatively easy syntax, environment friendly efficiency and abundance of libraries and frameworks, Python has revolutionized what number of SEOs strategy their work. 

Python gives a flexible toolset that may assist make the optimization course of quicker, extra correct and more practical. 

This text explores 5 Python scripts to assist increase your website positioning efforts.

The simplest strategy to get began with Python

If you happen to’re trying to dip your toes in Python programming, Google Colab is price contemplating. 

It’s a free, web-based platform that gives a handy playground for writing and operating Python code without having a posh native setup. 

Basically, it permits you to entry Jupyter Notebooks inside your browser and offers a number of pre-installed libraries for information science and machine studying. 

Plus, it’s constructed on high of Google Drive, so you may simply save and share your work with others.

To get began, comply with these steps:

Allow file uploads

When you open Google Colab, you’ll first must allow the flexibility to create a short lived file repository. It’s so simple as clicking the folder icon. 

This allows you to add momentary recordsdata after which obtain any outcomes recordsdata.

Access folder

Add supply information

A lot of our Python scripts require a supply file to work. To add a file, merely click on the add button.

File upload button

When you end the setup, you can begin testing the next Python scripts.

Script 1: Automate a redirect map

Creating redirect maps for big websites could be extremely time-consuming. Discovering methods to automate the method will help us save time and concentrate on different duties.

How this script works

This script focuses on analyzing the online content material to search out carefully matching articles. 

  • First, it imports two TXT recordsdata of URLs: one is for the redirected web site (source_urls.txt), and the opposite for the positioning absorbing the redirected web site (target_urls.txt).
  • Then, we use the Python library Stunning Soup to create an internet scraper to get the primary physique content material on the web page. This script ignores header and footer content material.
  • After it’s crawled the content material on all pages, it makes use of the Python library Polyfuzz to match content material between URLs with a similarity proportion.
  • Lastly, it prints the leads to a CSV file, together with the similarity proportion. 

From right here, you may manually overview any URLs with a low similarity proportion to search out the subsequent closest match.

Get the script

#import libraries
from bs4 import BeautifulSoup, SoupStrainer
from polyfuzz import PolyFuzz
import concurrent.futures
import csv
import pandas as pd
import requests

#import urls
with open("source_urls.txt", "r") as file:
    url_list_a = [line.strip() for line in file]

with open("target_urls.txt", "r") as file:
    url_list_b = [line.strip() for line in file]

#create a content material scraper through bs4
def get_content(url_argument):
    page_source = requests.get(url_argument).textual content
    strainer = SoupStrainer('p')
    soup = BeautifulSoup(page_source, 'lxml', parse_only=strainer)
    paragraph_list = [element.text for element in soup.find_all(strainer)]
    content material = " ".be part of(paragraph_list)
    return content material

#scrape the urls for content material
with concurrent.futures.ThreadPoolExecutor() as executor:
    content_list_a = checklist(executor.map(get_content, url_list_a))
    content_list_b = checklist(executor.map(get_content, url_list_b))

content_dictionary = dict(zip(url_list_b, content_list_b))

#get content material similarities through polyfuzz
mannequin = PolyFuzz("TF-IDF")
mannequin.match(content_list_a, content_list_b)
information = mannequin.get_matches()

#map similarity information again to urls
def get_key(argument):
    for key, worth in content_dictionary.gadgets():
        if argument == worth:
            return key
    return key
with concurrent.futures.ThreadPoolExecutor() as executor:
    end result = checklist(executor.map(get_key, information["To"]))

#create a dataframe for the ultimate outcomes
to_zip = checklist(zip(url_list_a, end result, information["Similarity"]))
df = pd.DataFrame(to_zip)
df.columns = ["From URL", "To URL", "% Identical"]

#export to a spreadsheet
with open("redirect_map.csv", "w", newline="") as file:
    columns = ["From URL", "To URL", "% Identical"]
    author = csv.author(file)
    for row in to_zip:

Whereas meta descriptions should not a direct rating issue, they assist us enhance our natural click-through charges. Leaving meta descriptions clean will increase the probabilities that Google will create its personal.

In case your website positioning audit exhibits a lot of URLs lacking a meta description, it might be troublesome to make time to jot down all of these by hand, particularly for ecommerce web sites. 

This script is aimed that can assist you save time by automating that course of for you.

How the script works

  • First, the script imports a listing of URLs from a TXT file (urls.txt).
  • Then, it parses the entire content material on the URLs.
  • As soon as the content material is parsed, it creates meta descriptions aiming to be under 155 characters. 
  • It exports the outcomes right into a CSV file.

Get the script

!pip set up sumy
from sumy.parsers.html import HtmlParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words
from sumy.summarizers.lsa import LsaSummarizer
import csv

#1) imports a listing of URLs from a txt file
with open('urls.txt') as f:
    urls = [line.strip() for line in f]

outcomes = []

# 2) analyzes the content material on every URL
for url in urls:
    parser = HtmlParser.from_url(url, Tokenizer("english"))
    stemmer = Stemmer("english")
    summarizer = LsaSummarizer(stemmer)
    summarizer.stop_words = get_stop_words("english")
    description = summarizer(parser.doc, 3)
    description = " ".be part of([sentence._text for sentence in description])
    if len(description) > 155:
        description = description[:152] + '...'
        'url': url,
        'description': description

# 4) exports the outcomes to a csv file
with open('outcomes.csv', 'w', newline="") as f:
    author = csv.DictWriter(f, fieldnames=['url','description'])

Script 3: Analyze key phrases with N-grams

N-grams should not a brand new idea however are nonetheless helpful for website positioning. They will help us perceive themes throughout giant units of key phrase information.


How this script works

This script outputs leads to a TXT file that breaks out the key phrases into unigrams, bigrams, and trigrams. 

  • First, it imports a TXT file of all of your key phrases (key phrase.txt).
  • Then it makes use of a Python library referred to as Counter to research and extract the N-grams.
  • Then it exports the leads to a brand new TXT file.

Get this script

#Import vital libraries
import re
from collections import Counter

#Open the textual content file and browse its contents into a listing of phrases
with open('key phrases.txt', 'r') as f:
    phrases = f.learn().break up()

#Use a daily expression to take away any non-alphabetic characters from the phrases
phrases = [re.sub(r'[^a-zA-Z]', '', phrase) for phrase in phrases]

#Initialize empty dictionaries for storing the unigrams, bigrams, and trigrams
unigrams = {}
bigrams = {}
trigrams = {}

#Iterate via the checklist of phrases and rely the variety of occurrences of every unigram, bigram, and trigram
for i in vary(len(phrases)):
    # Unigrams
    if phrases[i] in unigrams:
        unigrams[words[i]] += 1
        unigrams[words[i]] = 1
    # Bigrams
    if i < len(phrases)-1:
        bigram = phrases[i] + ' ' + phrases[i+1]
        if bigram in bigrams:
            bigrams[bigram] += 1
            bigrams[bigram] = 1
    # Trigrams
    if i < len(phrases)-2:
        trigram = phrases[i] + ' ' + phrases[i+1] + ' ' + phrases[i+2]
        if trigram in trigrams:
            trigrams[trigram] += 1
            trigrams[trigram] = 1

# Kind the dictionaries by the variety of occurrences
sorted_unigrams = sorted(unigrams.gadgets(), key=lambda x: x[1], reverse=True)
sorted_bigrams = sorted(bigrams.gadgets(), key=lambda x: x[1], reverse=True)
sorted_trigrams = sorted(trigrams.gadgets(), key=lambda x: x[1], reverse=True)

# Write the outcomes to a textual content file
with open('outcomes.txt', 'w') as f:
    f.write("Most typical unigrams:n")
    for unigram, rely in sorted_unigrams[:10]:
        f.write(unigram + ": " + str(rely) + "n")
    f.write("nMost widespread bigrams:n")
    for bigram, rely in sorted_bigrams[:10]:
        f.write(bigram + ": " + str(rely) + "n")
    f.write("nMost widespread trigrams:n")
    for trigram, rely in sorted_trigrams[:10]:
        f.write(trigram + ": " + str(rely) + "n")

Script 4: Group key phrases into subject clusters

With new website positioning tasks, key phrase analysis is at all times within the early phases. Generally we take care of hundreds of key phrases in a dataset, making grouping difficult. 

Python permits us to routinely cluster key phrases into comparable teams to determine development tendencies and full our key phrase mapping. 

How this script works

  • This script first imports a TXT file of key phrases (key phrases.txt).
  • Then the script analyzes the key phrases utilizing TfidfVectorizer and AffinityPropagation.
  • Then it assigns a numeric worth to every subject cluster.
  • The outcomes are then exported right into a csv file.

Get this script

import csv
import numpy as np
from sklearn.cluster import AffinityPropagation
from sklearn.feature_extraction.textual content import TfidfVectorizer

# Learn key phrases from textual content file
with open("key phrases.txt", "r") as f:
    key phrases = f.learn().splitlines()

# Create a Tf-idf illustration of the key phrases
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(key phrases)

# Carry out Affinity Propagation clustering
af = AffinityPropagation().match(X)
cluster_centers_indices = af.cluster_centers_indices_
labels = af.labels_

# Get the variety of clusters discovered
n_clusters = len(cluster_centers_indices)

# Write the clusters to a csv file
with open("clusters.csv", "w", newline="") as f:
    author = csv.author(f)
    author.writerow(["Cluster", "Keyword"])
    for i in vary(n_clusters):
        cluster_keywords = [keywords[j] for j in vary(len(labels)) if labels[j] == i]
        if cluster_keywords:
            for key phrase in cluster_keywords:
                author.writerow([i, keyword])
            author.writerow([i, ""])

Script 5: Match key phrase checklist to a listing of predefined subjects

That is much like the earlier script, besides this lets you match a listing of key phrases to a predefined set of subjects. 

That is nice for big units of key phrases as a result of it processes them in batches of 1,000 to stop system crashes.

How this script works

  • This script imports a key phrase checklist (key phrases.txt) and a subjects checklist (subjects.txt).
  • Then it analyzes the subjects and key phrase lists and matches them to the closest match. If it doesn’t discover a match, it categorizes it as different. 
  • The outcomes are then exported right into a CSV file.

Get this script

import pandas as pd
import spacy
from spacy.lang.en.stop_words import STOP_WORDS

# Load the Spacy English language mannequin
nlp = spacy.load("en_core_web_sm")

# Outline the batch dimension for key phrase evaluation

# Load the key phrases and subjects recordsdata as Pandas dataframes
keywords_df = pd.read_csv("key phrases.txt", header=None, names=["keyword"])
topics_df = pd.read_csv("subjects.txt", header=None, names=["topic"])

# Outline a operate to categorize a key phrase based mostly on the closest associated subject
def categorize_keyword(key phrase):
    # Tokenize the key phrase
    tokens = nlp(key phrase.decrease())
    # Take away cease phrases and punctuation
    tokens = [token.text for token in tokens if not token.is_stop and not token.is_punct]
    # Discover the subject that has essentially the most token overlaps with the key phrase
    max_overlap = 0
    best_topic = "Different"
    for subject in topics_df["topic"]:
        topic_tokens = nlp(subject.decrease())
        topic_tokens = [token.text for token in topic_tokens if not token.is_stop and not token.is_punct]
        overlap = len(set(tokens).intersection(set(topic_tokens)))
        if overlap > max_overlap:
            max_overlap = overlap
            best_topic = subject
    return best_topic

# Outline a operate to course of a batch of key phrases and return the outcomes as a dataframe
def process_keyword_batch(keyword_batch):
    outcomes = []
    for key phrase in keyword_batch:
        class = categorize_keyword(key phrase)
        outcomes.append({"key phrase": key phrase, "class": class})
    return pd.DataFrame(outcomes)

# Initialize an empty dataframe to carry the outcomes
results_df = pd.DataFrame(columns=["keyword", "category"])

# Course of the key phrases in batches
for i in vary(0, len(keywords_df), BATCH_SIZE):
    keyword_batch = keywords_df.iloc[i:i+BATCH_SIZE]["keyword"].tolist()
    batch_results_df = process_keyword_batch(keyword_batch)
    results_df = pd.concat([results_df, batch_results_df])

# Export the outcomes to a CSV file
results_df.to_csv("outcomes.csv", index=False)

Working with Python for website positioning

Python is an extremely highly effective and versatile instrument for website positioning professionals. 

Whether or not you’re a newbie or a seasoned practitioner, the free scripts I’ve shared on this article provide a fantastic start line for exploring the probabilities of Python in website positioning. 

With its intuitive syntax and huge array of libraries, Python will help you automate tedious duties, analyze advanced information, and acquire new insights into your web site’s efficiency. So why not give it a attempt?

Good luck, and blissful coding!

Opinions expressed on this article are these of the visitor writer and never essentially Search Engine Land. Workers authors are listed right here.

Related Articles


Please enter your comment!
Please enter your name here

Latest Articles