how to convert a pdf to jpg

1

I need to make a program that takes a pdf file that has some data and two graphics and convert it to jpg to insert it into another pdf with additional information or add a second page with the additional information to the pdf.

I have been researching and found that I could do it with PyPDF2, but I have not managed it. I could not figure out the code correctly or add the pdf file.

This is my code:

import PyPDF2

pdf_file = open('C:.pdf')
read_pdf = PyPDF2.PdfFileReader(pdf_file)
number_of_pages = read_pdf.getNumPages()
page = read_pdf.getPage(0)
page_content = page.extractText()
print (page_content)

I get the following error:

Traceback (most recent call last):
  File "C:\Python34\pruebasya\manejo-pdf2.py", line 3, in <module>
    pdf_file = open('c:.pdf')
OSError: [Errno 22] Invalid argument: 'c:\x01.pdf'
    
asked by FREDDY LLANOS 24.10.2017 в 17:04
source

1 answer

1

I do not think it's possible to do this with PyPDF2 (convert pdf to jpg), as I said I have not used this library but seeing the documentation seems to not have this capacity. If you can take the full page and add it as such to another pdf. It would also be possible to obtain images (that are saved as such in the document) of a pdf with it, but you do not really have images as such in your pdf.

To convert a pdf to images you can use ImageMagick . It is a multiplatform software suite that allows you to display, edit, create and convert a large number of image formats.

There are many bindings for Python, one of the most recent and the only one I've worked with is Wand .

You have to install Imagemagick according to your system , the vast majority of Linux distros also have their own package in the official repositories if you do not want to compile from sources. Then you just have to install Wand (via pip).

Done the above you can use something like this:

import os
from wand.image import Image, Color


def pdf_to_jpg(pdf_path,  output_path = None, resolution = 200):
    pdf_name = os.path.splitext(os.path.basename(pdf_path))[0]
    if not output_path:
        output_path = os.path.dirname(pdf_path)

    with Image(filename=pdf_path, resolution=resolution) as  pdf:
        for n, page in enumerate(pdf.sequence):
            with Image(page) as image:
                image.format = 'jpg'
                image.background_color = Color('white')
                image.alpha_channel = 'remove'
                image_name = os.path.join(output_path, '{}-{}.jpg'.format(pdf_name, n))
                image.save(filename = image_name)

pdf_to_jpg("1.pdf")

I've tried it with your pdf and you get the image perfectly.

If you would like to obtain parts of the sheet separately (for example, get the graphs), if all your pdfs have exactly the same layout you can simply cut out the parts you want from the resulting image.

Depending on the system, it may be necessary to install GhostScrip , because it is a dependency of ImageMagick, if not is installed in the system. It is important to install the version appropriate to the version of Python and ImageMagick that we are using (32 or 64 bits).

    
answered by 27.10.2017 в 02:09