Welcome to Pdfstruct⚓︎
A python module that builds upon the PyMuPDF library to extract the physical and logical structure a pdf file.
It mainly aims to detect section titles and table of contents, and also handles the splitting of aggregated pdf files.
This module is a tool used in the LIRIAe project (Projet de Liseuse et Recherche Intelligente pour les Autorités environnementales)