extract name of a pdf with python

Question

extract name of a pdf with python

Navigation

#1 by (2 votes)
#2 by (0 votes)

1

Good afternoon,

I hope you can help me.

I have a pdf that has the name "nombre_apellidop_apellidom_edad.pdf" What I need is to extract the name of the pdf and divide it in order to use the data separately, an example would be this:

Jose_Perez_Martinez_16.pdf

name: Jose
surname: Perez
apellidom: Martinez
age: 16

I am currently using the PyPDF2 module to read the content and it works super well but I do not know if with that same module I can read the title and do what I said above

I hope you can help me greetings

python python-3.x

asked by Memo 09.05.2017 в 21:03

source

2 answers

0

first import your pdf list with os.listdir("tu_directorio") , then make a list with the keys of your dictionary datos = ["nombre","apellidoP","apellidoM","edad"] , then each file name you remove the pdf replace(".pdf", "") , and divide it with string.split(cadena,"_") and what you become a dictionary with dict(zip(keys,values))

import os
import string

pdfs = os.listdir("c://")
datos = ["nombre","apellidoP","apellidoM","edad"]
info=[dict(zip(datos, string.split(x.replace(".pdf", ""), "_"))) for x in pdfs]
print info

[{'age': '16', 'name': 'Jose', 'surnameP': 'Perez', 'surnameM': 'Martinez'}]

answered by 09.05.2017 в 21:23

Rotate canvas image Submission of html form data

score 2 · Accepted Answer

If the names of your files always have the structure:

nombre_apellidop_apellidom_edad.pdf

you do not need anything special for that, use the file's own path next to str.split :

import os

ruta = "/Jose_Perez_Martinez_16.pdf"

nombre_pdf =os.path.splitext(os.path.basename(ruta))[0]
nombre, apellidop, apellidom, edad = nombre_pdf.split('_')

print('''
    nombre: {}
    apellidop: {}
    apellidom: {}
    edad: {}'''.format(nombre, apellidop, apellidom, edad))

Exit:

name: Jose
     Surnamep: Perez
     apellidom: Martinez
     Age: 16