Automate PDF Renaming with Python

Introduction:

Whether you’re a researcher dealing with innumerable academic papers, a student with many lecture notes, or just someone with an expansive collection of PDF files, the process of organizing your PDF files can be quite overwhelming. One common approach to keeping files organized is to rename them according to their content. In the case of academic papers, for example, renaming the files with the title of the paper can significantly simplify the process of finding specific files.

In this blog post, I’ll show you how to automate renaming of PDF files using Python. We’ll be using the PyPDF2 library to extract metadata from PDF files and then rename the files accordingly.

Installation:

Before we delve into the process, ensure that both Python and PyPDF2 are installed on your system. If you lack either, follow the steps below for installation.

If you don’t have Python installed, you can download it from the official Python website at https://www.python.org/downloads/.

If PyPDF2 is not installed on your system, it can be added using the pip package installer. Just copy the following code into your command line interface:

pip install PyPDF2

Python Script:

Here is the Python code to rename all PDF files within the current directory based on their title metadata:

import os
from pathlib import Path

def rename_pdf_files():
    # Check if PyPDF2 is installed
    try:
        import PyPDF2
    except ImportError:
        print("PyPDF2 is not installed. Please install it with 'pip install PyPDF2'")
        return

    # Identify the current working directory
    directory = os.getcwd()

    # Determine max title length based on current directory length
    max_title_length = 260 - len(directory) - 1  # Subtract 1 for the slash between directory and filename

    # Enumerate all files in the directory
    for filename in os.listdir(directory):
        # Only process PDF files
        if filename.endswith(".pdf"):
            try:
                # Open the PDF file
                with open(os.path.join(directory, filename), "rb") as file:
                    reader = PyPDF2.PdfReader(file)

                    # Get the document info
                    info = reader.metadata

                    # Extract the title from the metadata
                    metadata_title = info.get('/Title', '')

                    # Remove forbidden characters in filenames
                    forbidden_chars = ['<', '>', ':', '"', '/', '\\', '|', '?', '*']
                    for char in forbidden_chars:
                        metadata_title = metadata_title.replace(char, '')

                    # Limit to max title length
                    metadata_title = metadata_title[:max_title_length]

                # Make sure the file is closed before renaming
                if metadata_title:
                    new_filename = f"{directory}/{metadata_title}.pdf"
                    if os.path.exists(new_filename):
                        print(f"A file with the name '{metadata_title}.pdf' already exists. Skipping this file.")
                    else:
                        Path(os.path.join(directory, filename)).rename(new_filename)
                        print(f"Renamed '{filename}' to '{metadata_title}.pdf'")
                else:
                    print(f"No title found in the metadata of '{filename}'. File name was not changed.")
            except PyPDF2.errors.PdfReadError:
                print(f"Could not read '{filename}'. Skipping this file.")
            except PermissionError:
                print(f"No permission to rename '{filename}'. Skipping this file.")
            except Exception as e:
                print(f"An error occurred while processing '{filename}': {str(e)}")

# Call the function
rename_pdf_files()

# Pause before exiting
input("Press enter to exit...")

You can also download the Python script from here.

Conclusion:

This script will rename each PDF file in the directory you run it from based on the title in the file’s metadata. If a PDF file does not have a title in its metadata or if the title is not extractable, the file will retain its original name. Additionally, the script takes care to confirm that the total path length (directory path plus file name) doesn’t surpass the constraints set by the Windows operating system.

This simple script can potentially save a lot of time if you’re managing a large numbers of PDF files. With a little modification, it could also be adapted to handle other types of documents or employ other pieces of metadata for renaming.

I hope this script will bring some convenience to your PDF files management. If you have any questions or suggestions, don’t hesitate to drop a comment below!

Automate PDF Renaming with Python

Introduction:

Installation:

Python Script:

Conclusion:

Comments

One response to “Automate PDF Renaming with Python”

Leave a Reply Cancel reply