Blog by Sumana Harihareswara, Changeset founder

05 May 2022, 13:00 p.m.

PDFtk, qpdf, And Dealing With Password-Protected PDFs

I slice, dice, and transform documents often enough that I rely frequently on pandoc and pdftk. I often use pandoc to turn Markdown, HTML, wiki syntax, reStructuredText, etc. into each other or into LibreOffice, MS Office, etc. And I use pdftk (pdftk-java technically) to say, "turn pages 1 & 5 from this PDF into a new one" or "concatenate these 4 PDFs into a new one".

Recently I wanted to take a password-protected file, select several page ranges from it, and emit a new non-protected file. I ran into a problem while trying to do this:

Error: Invalid PDF: unknown.encryption.type.r

Error: Failed to open input PDF file:

redacted-name-of-file.pdf

Errors encountered. No output created.

Done. Input errors, so no output created.

I looked around and found this GitLab issue -- more recent PDFs are often protected with AES256 which isn't yet supported. Thanks to ergo mesh in that thread who pointed to a solution: qpdf. I am more familiar with pdftk's syntax so I did it in two steps:

qpdf redacted-name-of-file.pdf --replace-input --password=SECRETPASSWORD --decrypt # note to self: remove this entry from history later

pdftk redacted-name-of-file.pdf cat 3-8 10-16 output /tmp/redacted-file-unprotected-edited.pdf

but I'm likely to learn qpdf's page selection syntax soon and switch to it entirely, in which case I could have done this in one step. I also could have saved the password in a file and then read it into qpdf instead of typing it on the command line, which I'll probably start doing. But here's how I remove a specific item from my bash history:

history -a # append recent history

history # print out history lines, so user can look for the offset number of the relevant line, e.g., 301

history -d 301 # now it's gone from the current history list!

history -w # actually seal the deal & write the deletion to the history file

Hope this helps! Also, the qpdf documentation on how PDF encryption/password protection works is pretty enlightening.

(By the way: the pdftk in Debian Linux is a Java port, because the original pdftk written in C++ depended on GCJ which Debian removed. Plus the original pdftk's code hasn't been updated since version 2.02 in 2013.)

Comments

Magnus
06 May 2022, 18:30 p.m.

Another way to handle the history thing is to start the command with a space. That way bash won't save it to history in the first place.