I have a PDF file with thousands of names. Is it possible to extract the content to a text file?
I have a PDF file with thousands of names. Is it possible to extract the content to a text file?
Quick response: Yes.
It is not necessary to use the "Linux" tag, since there are libraries in python, php, etc., that can help you in this task. There are even online tools such as this . But I'll assume you want to type the subject:)
But for asking for linux, there are tools like pdftotext
of the package poppler-utils
.
That is, run
$ sudo apt install poppler-utils
on your terminal and then you can convert a pdf file to text with
$ pdftotext <tu archivo pdf>.pdf archivo.txt
Another tool is ebook-convert
of the package calibre
, that is, install the package with
$ sudo apt install calibre
and then you run a command similar to the previous one
$ ebook-convert <tu archivo pdf>.pdf archivo.txt
Here are some comparatives that may interest you.
What I have not reviewed well is the part you mentioned of "Spanish", maybe problems such as accents, etc., if you do not print those characters well, they can be fixed with the iconv
tool. Maybe you can review it at the moment and update my response.