I do not think it's possible to do this with PyPDF2 (convert pdf to jpg), as I said I have not used this library but seeing the documentation seems to not have this capacity. If you can take the full page and add it as such to another pdf. It would also be possible to obtain images (that are saved as such in the document) of a pdf with it, but you do not really have images as such in your pdf.
To convert a pdf to images you can use ImageMagick . It is a multiplatform software suite that allows you to display, edit, create and convert a large number of image formats.
There are many bindings for Python, one of the most recent and the only one I've worked with is Wand .
You have to install Imagemagick according to your system , the vast majority of Linux distros also have their own package in the official repositories if you do not want to compile from sources. Then you just have to install Wand (via pip).
Done the above you can use something like this:
import os
from wand.image import Image, Color
def pdf_to_jpg(pdf_path, output_path = None, resolution = 200):
pdf_name = os.path.splitext(os.path.basename(pdf_path))[0]
if not output_path:
output_path = os.path.dirname(pdf_path)
with Image(filename=pdf_path, resolution=resolution) as pdf:
for n, page in enumerate(pdf.sequence):
with Image(page) as image:
image.format = 'jpg'
image.background_color = Color('white')
image.alpha_channel = 'remove'
image_name = os.path.join(output_path, '{}-{}.jpg'.format(pdf_name, n))
image.save(filename = image_name)
pdf_to_jpg("1.pdf")
I've tried it with your pdf and you get the image perfectly.
If you would like to obtain parts of the sheet separately (for example, get the graphs), if all your pdfs have exactly the same layout you can simply cut out the parts you want from the resulting image.
Depending on the system, it may be necessary to install GhostScrip , because it is a dependency of ImageMagick, if not is installed in the system. It is important to install the version appropriate to the version of Python and ImageMagick that we are using (32 or 64 bits).