extract name of a pdf with python

1

Good afternoon,

I hope you can help me.

I have a pdf that has the name "nombre_apellidop_apellidom_edad.pdf" What I need is to extract the name of the pdf and divide it in order to use the data separately, an example would be this:

Jose_Perez_Martinez_16.pdf

  • name: Jose
  • surname: Perez
  • apellidom: Martinez
  • age: 16

I am currently using the PyPDF2 module to read the content and it works super well but I do not know if with that same module I can read the title and do what I said above

I hope you can help me greetings

    
asked by Memo 09.05.2017 в 23:03
source

2 answers

2

If the names of your files always have the structure:

  

nombre_apellidop_apellidom_edad.pdf

you do not need anything special for that, use the file's own path next to str.split :

import os

ruta = "/Jose_Perez_Martinez_16.pdf"

nombre_pdf =os.path.splitext(os.path.basename(ruta))[0]
nombre, apellidop, apellidom, edad = nombre_pdf.split('_')

print('''
    nombre: {}
    apellidop: {}
    apellidom: {}
    edad: {}'''.format(nombre, apellidop, apellidom, edad))

Exit:

  

name: Jose
     Surnamep: Perez
     apellidom: Martinez
     Age: 16

    
answered by 09.05.2017 / 23:14
source
0

first import your pdf list with os.listdir("tu_directorio") , then make a list with the keys of your dictionary datos = ["nombre","apellidoP","apellidoM","edad"] , then each file name you remove the pdf replace(".pdf", "") , and divide it with string.split(cadena,"_") and what you become a dictionary with dict(zip(keys,values))

import os
import string

pdfs = os.listdir("c://")
datos = ["nombre","apellidoP","apellidoM","edad"]
info=[dict(zip(datos, string.split(x.replace(".pdf", ""), "_"))) for x in pdfs]
print info
  

[{'age': '16', 'name': 'Jose', 'surnameP': 'Perez', 'surnameM':   'Martinez'}]

    
answered by 09.05.2017 в 23:23