Introduction:
Whether you’re a researcher dealing with innumerable academic papers, a student with many lecture notes, or just someone with an expansive collection of PDF files, the process of organizing your PDF files can be quite overwhelming. One common approach to keeping files organized is to rename them according to their content. In the case of academic papers, for example, renaming the files with the title of the paper can significantly simplify the process of finding specific files.
In this blog post, I’ll show you how to automate renaming of PDF files using Python. We’ll be using the PyPDF2 library to extract metadata from PDF files and then rename the files accordingly.
Installation:
Before we delve into the process, ensure that both Python and PyPDF2 are installed on your system. If you lack either, follow the steps below for installation.
If you don’t have Python installed, you can download it from the official Python website at https://www.python.org/downloads/.
If PyPDF2 is not installed on your system, it can be added using the pip package installer. Just copy the following code into your command line interface:
pip install PyPDF2
Python Script:
Here is the Python code to rename all PDF files within the current directory based on their title metadata:
import os
from pathlib import Path
def rename_pdf_files():
# Check if PyPDF2 is installed
try:
import PyPDF2
except ImportError:
print("PyPDF2 is not installed. Please install it with 'pip install PyPDF2'")
return
# Identify the current working directory
directory = os.getcwd()
# Determine max title length based on current directory length
max_title_length = 260 - len(directory) - 1 # Subtract 1 for the slash between directory and filename
# Enumerate all files in the directory
for filename in os.listdir(directory):
# Only process PDF files
if filename.endswith(".pdf"):
try:
# Open the PDF file
with open(os.path.join(directory, filename), "rb") as file:
reader = PyPDF2.PdfReader(file)
# Get the document info
info = reader.metadata
# Extract the title from the metadata
metadata_title = info.get('/Title', '')
# Remove forbidden characters in filenames
forbidden_chars = ['<', '>', ':', '"', '/', '\\', '|', '?', '*']
for char in forbidden_chars:
metadata_title = metadata_title.replace(char, '')
# Limit to max title length
metadata_title = metadata_title[:max_title_length]
# Make sure the file is closed before renaming
if metadata_title:
new_filename = f"{directory}/{metadata_title}.pdf"
if os.path.exists(new_filename):
print(f"A file with the name '{metadata_title}.pdf' already exists. Skipping this file.")
else:
Path(os.path.join(directory, filename)).rename(new_filename)
print(f"Renamed '{filename}' to '{metadata_title}.pdf'")
else:
print(f"No title found in the metadata of '{filename}'. File name was not changed.")
except PyPDF2.errors.PdfReadError:
print(f"Could not read '{filename}'. Skipping this file.")
except PermissionError:
print(f"No permission to rename '{filename}'. Skipping this file.")
except Exception as e:
print(f"An error occurred while processing '{filename}': {str(e)}")
# Call the function
rename_pdf_files()
# Pause before exiting
input("Press enter to exit...")
You can also download the Python script from here.
Conclusion:
This script will rename each PDF file in the directory you run it from based on the title in the file’s metadata. If a PDF file does not have a title in its metadata or if the title is not extractable, the file will retain its original name. Additionally, the script takes care to confirm that the total path length (directory path plus file name) doesn’t surpass the constraints set by the Windows operating system.
This simple script can potentially save a lot of time if you’re managing a large numbers of PDF files. With a little modification, it could also be adapted to handle other types of documents or employ other pieces of metadata for renaming.
I hope this script will bring some convenience to your PDF files management. If you have any questions or suggestions, don’t hesitate to drop a comment below!
Leave a Reply