Under Linux, I can use grep to search a bunch of
files for a character string. Is there an equivalent
command for searching pdf files?
Under Linux, I can use grep to search a bunch of
files for a character string. Is there an equivalent
command for searching pdf files?
might do some typesetting "magic" (eg ligitures, etc.) that might make things
Text in PDFs is sometimes compressed. So one can either use
programs like "Agent Ransack" to search for text in PDFs or
tools like "pdftotext" to first create a text file for every
PDF file and then grep those text files.
ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted:
Text in PDFs is sometimes compressed. So one can either use programs
like "Agent Ransack" to search for text in PDFs or tools like
"pdftotext" to first create a text file for every PDF file and then grep >>those text files.
PS: "Agent Ransack" is Windows software. "pdftotext" is also available
for Linux. Converting all PDFs to text files needs to be done only
once, and then search operations on those text files are faster than
scanning the PDF files for text on every search!
On 3 Apr 2024 14:29:40 GMT, Stefan Ram wrote:
ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted:
Text in PDFs is sometimes compressed. So one can either use programs
like "Agent Ransack" to search for text in PDFs or tools like
"pdftotext" to first create a text file for every PDF file and then
grep those text files.
PS: "Agent Ransack" is Windows software. "pdftotext" is also
available for Linux. Converting all PDFs to text files needs to be
done only once, and then search operations on those text files are
faster than scanning the PDF files for text on every search!
I should maybe have elaborated a bit. Sometimes I remember a certain
phrase or word but forget which pdf it is in. With text files I can do
grep blabla *.txt and I wanted an equivalent. Using pdftotext would mean using it for every suspect pdf. Since a lot of pdf files are searchable,
I figured that such a command might exist.
But if there really is a pdfgrep command, that might do the job. I will
do some googling, thanks.
pdfgrep --helpterminate called after throwing an instance of 'std::runtime_error'
I installed pdfgrep in my Kubuntu system, but it is
not happy. Although the man file is there, even help
doesn't work:
On 04/04/2024 10:50, db wrote:
[...]
I installed pdfgrep in my Kubuntu system, but it is not happy. Although
the man file is there, even help doesn't work:
I just installed pdfgrep_2.1.2-1build1_amd64.deb in my Mint 20.1 and it
seems to work OK. What version is the Kubuntu one?
Peter
| Sysop: | Amessyroom |
|---|---|
| Location: | Fayetteville, NC |
| Users: | 70 |
| Nodes: | 6 (0 / 6) |
| Uptime: | 00:26:37 |
| Calls: | 949 |
| Calls today: | 1 |
| Files: | 1,325 |
| Messages: | 281,479 |