• Understanding pdfseparate error messages

    From Richard Owlett@rowlett@access.net to alt.os.linux.debian on Thu Jul 24 10:16:27 2025
    From Newsgroup: alt.os.linux.debian

    I'm running Debian 12.8.
    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file for later editing.

    I'm focusing on poppler-utils as it appears to offer tools for current
    and future goals.

    Doing "pdftotext -layout -f 116 -l 116 TFP2021.pdf jul24-a.txt" comes
    very close to what I want.

    Having been surrounded by TECO-buffs in the 70's, comparing the output
    of "pdftotext -f 116 -l 116 TFP2021.pdf jul24-b.txt" to the above
    suggests an approach to resolving.

    It involves being able to edit a *SINGLE* rather than all 100+ companion pages.

    I tried "pdfseparate -f 116 -l 116 TFP2021.pdf dianostic.pdf" and got
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length Syntax Error (3557294): Missing 'endstream' or incorrect stream length
    [multiple repetitions of those 2 lines
    Syntax Error (3556857): Bad FCHECK in flate stream
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length Syntax Error (3866517): Bad FCHECK in flate stream

    How/where do I find interpretation of those?

    TIA

    *A postscript

    I had originally composed this message before discovering "pdfseparate"
    had created output files that that appear to be what I intended.

    I'm still interested in the meaning of the error messages as it may hint
    as why ""pdftotext" wasn't *exactly* what I hoped for.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Richard Kettlewell@invalid@invalid.invalid to alt.os.linux.debian on Thu Jul 24 21:55:07 2025
    From Newsgroup: alt.os.linux.debian

    Richard Owlett <rowlett@access.net> writes:
    I tried "pdfseparate -f 116 -l 116 TFP2021.pdf dianostic.pdf" and got
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length
    Syntax Error (3557294): Missing 'endstream' or incorrect stream length
    [multiple repetitions of those 2 lines
    Syntax Error (3556857): Bad FCHECK in flate stream
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length
    Syntax Error (3866517): Bad FCHECK in flate stream

    I think the PDF is malformed, although not necessarily fatally so.

    https://superuser.com/questions/1383547/when-i-am-trying-extract-text-from-pdf-using-pdftotext-command-in-linux-i-got
    --
    https://www.greenend.org.uk/rjk/
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From William Unruh@unruh@invalid.ca to alt.os.linux.debian on Fri Jul 25 05:46:02 2025
    From Newsgroup: alt.os.linux.debian


    Why not use pdfseparate to extract the two pages you want, and then use pdftotext on each of the two pages?

    On 2025-07-24, Richard Owlett <rowlett@access.net> wrote:
    I'm running Debian 12.8.
    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file for later editing.

    I'm focusing on poppler-utils as it appears to offer tools for current
    and future goals.

    Doing "pdftotext -layout -f 116 -l 116 TFP2021.pdf jul24-a.txt" comes
    very close to what I want.

    Having been surrounded by TECO-buffs in the 70's, comparing the output
    of "pdftotext -f 116 -l 116 TFP2021.pdf jul24-b.txt" to the above
    suggests an approach to resolving.

    It involves being able to edit a *SINGLE* rather than all 100+ companion pages.

    I tried "pdfseparate -f 116 -l 116 TFP2021.pdf dianostic.pdf" and got
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length Syntax Error (3557294): Missing 'endstream' or incorrect stream length
    [multiple repetitions of those 2 lines
    Syntax Error (3556857): Bad FCHECK in flate stream
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length Syntax Error (3866517): Bad FCHECK in flate stream

    How/where do I find interpretation of those?

    TIA

    *A postscript

    I had originally composed this message before discovering "pdfseparate"
    had created output files that that appear to be what I intended.

    I'm still interested in the meaning of the error messages as it may hint
    as why ""pdftotext" wasn't *exactly* what I hoped for.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Richard Owlett@rowlett@access.net to alt.os.linux.debian on Fri Jul 25 08:30:28 2025
    From Newsgroup: alt.os.linux.debian

    On 7/24/25 3:55 PM, Richard Kettlewell wrote:
    Richard Owlett <rowlett@access.net> writes:
    I tried "pdfseparate -f 116 -l 116 TFP2021.pdf dianostic.pdf" and got
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length
    Syntax Error (3557294): Missing 'endstream' or incorrect stream length
    [multiple repetitions of those 2 lines
    Syntax Error (3556857): Bad FCHECK in flate stream
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length
    Syntax Error (3866517): Bad FCHECK in flate stream

    I think the PDF is malformed, although not necessarily fatally so.

    https://superuser.com/questions/1383547/when-i-am-trying-extract-text-from-pdf-using-pdftotext-command-in-linux-i-got


    Interesting read as is manpage for mutool {just installed}.

    I have no control of the source document[1] as it a USDA publication.
    Is there an independent site, which when given a URL will evaluate
    structural correctness [esp one whose evaluation would be worth
    reporting to USDA]?

    [1]https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Richard Owlett@rowlett@access.net to alt.os.linux.debian on Fri Jul 25 08:33:58 2025
    From Newsgroup: alt.os.linux.debian

    On 7/25/25 12:46 AM, William Unruh wrote:
    Why not use pdfseparate to extract the two pages you want, and then use pdftotext on each of the two pages?

    That's what I'd done.


    On 2025-07-24, Richard Owlett <rowlett@access.net> wrote:
    I'm running Debian 12.8.
    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file for later
    editing.

    I'm focusing on poppler-utils as it appears to offer tools for current
    and future goals.

    Doing "pdftotext -layout -f 116 -l 116 TFP2021.pdf jul24-a.txt" comes
    very close to what I want.

    Having been surrounded by TECO-buffs in the 70's, comparing the output
    of "pdftotext -f 116 -l 116 TFP2021.pdf jul24-b.txt" to the above
    suggests an approach to resolving.

    It involves being able to edit a *SINGLE* rather than all 100+ companion
    pages.

    I tried "pdfseparate -f 116 -l 116 TFP2021.pdf dianostic.pdf" and got
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length
    Syntax Error (3557294): Missing 'endstream' or incorrect stream length
    [multiple repetitions of those 2 lines
    Syntax Error (3556857): Bad FCHECK in flate stream
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length
    Syntax Error (3866517): Bad FCHECK in flate stream

    How/where do I find interpretation of those?

    TIA

    *A postscript

    I had originally composed this message before discovering "pdfseparate"
    had created output files that that appear to be what I intended.

    I'm still interested in the meaning of the error messages as it may hint
    as why ""pdftotext" wasn't *exactly* what I hoped for.



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Richard Owlett@rowlett@access.net to alt.os.linux.debian on Fri Jul 25 09:48:25 2025
    From Newsgroup: alt.os.linux.debian

    On 7/25/25 8:30 AM, Richard Owlett wrote:
    On 7/24/25 3:55 PM, Richard Kettlewell wrote:
    Richard Owlett <rowlett@access.net> writes:
    I tried "pdfseparate -f 116 -l 116 TFP2021.pdf dianostic.pdf" and got>>>> Syntax Error (3868069): Missing 'endstream' or incorrect stream length
    Syntax Error (3557294): Missing 'endstream' or incorrect stream length >>>> -a-a-a-a [multiple repetitions of those 2 lines
    Syntax Error (3556857): Bad FCHECK in flate stream
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length >>>> Syntax Error (3866517): Bad FCHECK in flate stream

    I think the PDF is malformed, although not necessarily fatally so.

    https://superuser.com/questions/1383547/when-i-am-trying-extract-text-from-pdf-using-pdftotext-command-in-linux-i-got



    Interesting read as is manpage for mutool {just installed}.

    I have no control of the source document[1] as it a USDA publication.
    Is there an independent site, which when given a URL will evaluate structural correctness [esp one whose evaluation would be worth
    reporting to USDA]?

    [1]https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf


    Just tried mutool for the first time.
    Got an interesting permutation of warnings, errors, and post-processing
    of the result(s) ;}!
    Is there a good newbie intro to mutool somewhere?
    TIA
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From William Unruh@unruh@invalid.ca to alt.os.linux.debian on Sun Jul 27 16:30:58 2025
    From Newsgroup: alt.os.linux.debian

    On 2025-07-25, Richard Owlett <rowlett@access.net> wrote:
    On 7/25/25 8:30 AM, Richard Owlett wrote:
    On 7/24/25 3:55 PM, Richard Kettlewell wrote:
    Richard Owlett <rowlett@access.net> writes:
    I tried "pdfseparate -f 116 -l 116 TFP2021.pdf dianostic.pdf" and got >>>>> Syntax Error (3868069): Missing 'endstream' or incorrect stream length >>>>> Syntax Error (3557294): Missing 'endstream' or incorrect stream length >>>>> -a-a-a-a [multiple repetitions of those 2 lines
    Syntax Error (3556857): Bad FCHECK in flate stream
    Syntax Error (3868069): Missing 'endstream' or incorrect stream length >>>>> Syntax Error (3866517): Bad FCHECK in flate stream

    I think the PDF is malformed, although not necessarily fatally so.

    Yes. If I open the pdf in Chrome, I can go to page 116 and print it to
    pdf without trouble. Similarly with xpdf. However okular uses the
    document's page numbr instead on the numer as the page location. (Thus
    you have to go to page 106 in okular to get to the Market Basket page).

    If I try pdfseparate, I also get that whole list of

    Missing 'endstream' or incorrect stream length

    error messages, but the correct single page is created (Ie, those error messages
    appear to be warnings, and can be ignored.) And they do not appear to
    slow things down (it takes much less than a second to create that output
    page).
    --- Synchronet 3.21a-Linux NewsLink 1.2