• Re: is_binary_file()

    From bart@bc@freeuk.com to comp.lang.c on Thu Dec 18 12:49:30 2025
    From Newsgroup: comp.lang.c

    On 18/12/2025 07:44, Bonita Montero wrote:
    Am 17.12.2025 um 20:35 schrieb Michael Sanders:
    Sigh, here we go...
    Everything in every thread I've read from you is faster, better,
    & you finished in two hours & yet here you are.
    *Very rich* Bontia. Thank you, but as others have pointed out:
    This is a C newsgroup, not a C++ newsgroup.
    That's not so important because it's about the general principle.
    Writing it in C would only be slightly different.

    So, why not post C versions? Or, I guess it would be extended C.

    static const char phrase[] = "I dont want or care about C++...";
    printf("%s\n", phrase);



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Dec 18 14:06:40 2025
    From Newsgroup: comp.lang.c

    Am 18.12.2025 um 13:49 schrieb bart:
    So, why not post C versions? Or, I guess it would be extended C.
    Becaue it's easier to write safe code in C++.
    F.e. I'm using a span of AVX2/AVX-512 words.
    While debugging I have bounds checking with that.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.c on Thu Dec 18 13:17:39 2025
    From Newsgroup: comp.lang.c

    In article <10i0u79$aa6d$1@raubtier-asyl.eternal-september.org>,
    Bonita Montero <Bonita.Montero@gmail.com> wrote:
    Am 18.12.2025 um 13:49 schrieb bart:
    So, why not post C versions? Or, I guess it would be extended C.

    Becaue it's easier to write safe code in C++.

    Wouldn't it be easier still to just not post at all?
    --

    "If God wanted us to believe in him, he'd exist."

    (Linda Smith on "10 Funniest Londoners", TimeOut, 23rd June, 2005.)
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Dec 18 16:03:24 2025
    From Newsgroup: comp.lang.c

    Am 18.12.2025 um 14:17 schrieb Kenny McCormack:
    Becaue it's easier to write safe code in C++.
    Wouldn't it be easier still to just not post at all?
    Some people here thought they could develop efficient code for that.
    I just wanted to show that this is possible a lot faster.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Sat Dec 27 03:13:50 2025
    From Newsgroup: comp.lang.c

    On Sun, 7 Dec 2025 19:01:02 +0000, Richard Harnden wrote:

    A text file is supposed to end with a '\n'

    PDF files end with that. The object index comes at the end, and each
    index entry is fixed in length and ends with \015\012.

    But the spec makes it very clear that PDF files are not supposed to be
    treated as text files.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Sat Dec 27 03:18:07 2025
    From Newsgroup: comp.lang.c

    On Sun, 7 Dec 2025 03:43:40 -0700, Louis Krupp wrote:

    This brings back memories, most of them fond.

    Many former users of Burroughs systems seem to feel the same. ;)

    I have an unflattering story about John McCarthy, the father of Lisp,
    who was an IBM man who took over a computing centre where there was
    a Burroughs machine ...
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Sat Dec 27 05:51:13 2025
    From Newsgroup: comp.lang.c

    On Mon, 8 Dec 2025 13:51:49 +0100, Bonita Montero wrote:

    From the glibc Reference Manual:

    rCLThe distinction between text and binary streams is only meaningful
    on systems where text files have a different internal
    representation. On Unix systems, there is no difference between the
    two; the rCybrCO is accepted but ignored.rCY

    However, you need to distinguish the two if you want, like Python
    does, to be able to have a rCLuniversal newlinerCY mode, where you can correctly handle line breaks in files written on any of the three main
    platform families: *nix/Unix, Windows, and macOS.

    This is such a useful idea IrCOm surprised no one has suggested that C
    should offer the option.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Paul@nospam@needed.invalid to comp.lang.c on Sat Dec 27 01:28:18 2025
    From Newsgroup: comp.lang.c

    On Fri, 12/26/2025 10:13 PM, Lawrence DrCOOliveiro wrote:
    On Sun, 7 Dec 2025 19:01:02 +0000, Richard Harnden wrote:

    A text file is supposed to end with a '\n'

    PDF files end with that. The object index comes at the end, and each
    index entry is fixed in length and ends with \015\012.

    But the spec makes it very clear that PDF files are not supposed to be treated as text files.


    The best you can do, is for the PDF to be entirely text except for
    some bytes near the top (second line). It's not exactly clear what they do,
    but I've seen at least one document that misses the binary line. That binary-thing could be a hash over the document.

    At least in this PDF, the document is 99% text. And Mutool can be
    used to convert a "mostly binary" PDF, into a "mostly text" PDF.

    If a PDF is encrypted, it is unlikely to have a textual representation
    when naively opening it.

    PDFs can be "anywhere from 99% binary to 99% text". It all depends.
    Generally, the ones that are mostly text are the simplest of documents.
    Rich media documents will have a lot more binary that cannot be
    simplified by simple transformations. You could start in the first place,
    by using different source materials that had closer-to-textual representation to fix that.

    ***********************************************************************************************************
    %PDF-1.4
    <=== these can "look like binary" "25 B8 9A 92 9D 0A"
    1 0 obj<</Type/Catalog/Pages 3 0 R>>
    endobj
    2 0 obj<</Producer(GemBox GemBox.Pdf 1.7 (17.0.35.1042; .NET Framework))/CreationDate(D:20211028151721+02'00')>>
    endobj
    3 0 obj<</Type/Pages/Kids[4 0 R]/Count 1/MediaBox[0 0 595.32 841.92]>>
    endobj
    4 0 obj<</Type/Page/Parent 3 0 R/Resources<</Font<</F0 6 0 R>>>>/Contents 5 0 R>>
    endobj
    5 0 obj<</Length 59>>stream
    BT
    /F0 12 Tf
    1 0 0 1 100 702.7366667 Tm
    (Hello World!)Tj
    ET
    endstream
    endobj
    6 0 obj<</Type/Font/Subtype/Type1/BaseFont/Helvetica/FirstChar 32/LastChar 114/Widths 7 0 R/FontDescriptor 8 0 R>>
    endobj
    7 0 obj[278 278 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 722 0 0 0 0 0 0 0 0 0 0 0 0 0 0 944 0 0 0 0 0 0 0 0 0 0 0 0 556 556 0 0 0 0 0 0 222 0 0 556 0 0 333]
    endobj
    8 0 obj<</Type/FontDescriptor/Flags 32/FontName/Helvetica/FontFamily(Helvetica)/FontWeight 500/ItalicAngle 0/FontBBox[-166 -225 1000 931]/CapHeight 718/XHeight 523/Ascent 718/Descent -207/StemH 76/StemV 88>>
    endobj
    xref
    0 9
    0000000000 65535 f
    0000000015 00000 n
    0000000059 00000 n
    0000000179 00000 n
    0000000257 00000 n
    0000000346 00000 n
    0000000451 00000 n
    0000000573 00000 n
    0000000773 00000 n
    trailer
    <</Root 1 0 R/ID[<9392A59F3BE7B840805D62746E8A4F29><9392A59F3BE7B840805D62746E8A4F29>]/Info 2 0 R/Size 9>>
    startxref
    988
    %%EOF ***********************************************************************************************************

    If "there has to be binary in it", it's on the second line.
    The other lines can be text... if the tools and print drivers
    wanted to do it that way.

    Paul
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Sat Dec 27 21:27:21 2025
    From Newsgroup: comp.lang.c

    On Sat, 27 Dec 2025 01:28:18 -0500, Paul wrote:

    The best you can do, is for the PDF to be entirely text except for
    some bytes near the top (second line). It's not exactly clear what
    they do ...

    The spec recommended the insertion of junk like that simply to
    dissuade file sniffers from concluding that the file is a text
    document.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Sun Dec 28 00:12:53 2025
    From Newsgroup: comp.lang.c

    On Sat, 6 Dec 2025 03:14:55 -0500, Paul wrote:

    .. with CRLF, NEL line terminators

    Who uses NEL? Only IBM, as far as I know.

    Also, the only difference between XML 1.0 and XML 1.1 is that the
    latter adds NEL as a permitted line terminator.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From richard@richard@cogsci.ed.ac.uk (Richard Tobin) to comp.lang.c on Sun Dec 28 00:43:12 2025
    From Newsgroup: comp.lang.c

    In article <10ipsm4$3ssi3$5@dont-email.me>,
    Lawrence D Oliveiro <ldo@nz.invalid> wrote:

    Also, the only difference between XML 1.0 and XML 1.1 is that the
    latter adds NEL as a permitted line terminator.

    There are some differences concerning control characters too.

    -- Richard
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Sun Dec 28 02:49:19 2025
    From Newsgroup: comp.lang.c

    On Tue, 9 Dec 2025 06:38:47 -0500, Paul wrote:

    I tested the "find.exe" in Cygwin64 and it did not finish. I used
    Process Monitor to see what it was doing, and there was a lot of
    registry activity. (There should not be registry activity by
    find.exe or file.exe )

    YourCOve got the source code, you can see where thatrCOs coming from.

    If itrCOs not coming from the Cygwin code, itrCOs something in Windows
    itself.

    I tried the file.exe command and it didn't provide output and the
    machine hung. My machine never hangs. It's a model citizen. Windows
    Defender did not trip. An offline scan with Windows Defender did not
    find anything.

    That sort of thing seems par for the course with Windows ...
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Sun Dec 28 05:46:34 2025
    From Newsgroup: comp.lang.c

    Paul <nospam@needed.invalid> wrote:
    On Fri, 12/26/2025 10:13 PM, Lawrence DrCOOliveiro wrote:
    On Sun, 7 Dec 2025 19:01:02 +0000, Richard Harnden wrote:

    A text file is supposed to end with a '\n'

    PDF files end with that. The object index comes at the end, and each
    index entry is fixed in length and ends with \015\012.

    But the spec makes it very clear that PDF files are not supposed to be
    treated as text files.


    The best you can do, is for the PDF to be entirely text except for
    some bytes near the top (second line). It's not exactly clear what they do, but I've seen at least one document that misses the binary line. That binary-thing could be a hash over the document.

    I did a little developement on PDF-s. For debugging is is convenient
    to have 100% printable form, such PDF-s are perfectly valid. Adobe
    encourages putting in a bunch of nonprintable characters to
    discourage silly tools from "converting text encoding", which
    would mangle PDF-s.
    --
    Waldek Hebisch
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Mon Dec 29 16:06:50 2025
    From Newsgroup: comp.lang.c

    Am 27.12.2025 um 06:51 schrieb Lawrence DrCOOliveiro:
    On Mon, 8 Dec 2025 13:51:49 +0100, Bonita Montero wrote:

    From the glibc Reference Manual:

    rCLThe distinction between text and binary streams is only meaningful
    on systems where text files have a different internal
    representation. On Unix systems, there is no difference between the
    two; the rCybrCO is accepted but ignored.rCY
    However, you need to distinguish the two if you want, like Python
    does, to be able to have a rCLuniversal newlinerCY mode, where you can correctly handle line breaks in files written on any of the three main platform families: *nix/Unix, Windows, and macOS.
    No, MacOS, not macOS; the latter is "MacOS" since macOS X.
    This is such a useful idea IrCOm surprised no one has suggested that C
    should offer the option.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From mjos_examine@m6502x64@gmail.com to comp.lang.c on Mon Dec 29 11:49:01 2025
    From Newsgroup: comp.lang.c

    On 2025-12-29 10:06 a.m., Bonita Montero wrote:
    However, you need to distinguish the two if you want, like Python
    does, to be able to have a rCLuniversal newlinerCY mode, where you can
    correctly handle line breaks in files written on any of the three main
    platform families: *nix/Unix, Windows, and macOS.
    No, MacOS, not macOS; the latter is "MacOS" since macOS X.

    Your assertion is contrary to that operating system vendor's own stance
    and branding.
    https://www.apple.com/os/macos/

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Mon Dec 29 20:49:12 2025
    From Newsgroup: comp.lang.c

    Am 29.12.2025 um 17:49 schrieb mjos_examine:
    Your assertion is contrary to that operating system vendor's own
    stance and branding.
    https://www.apple.com/os/macos

    There's nothing about the distinction between MacOS and macOS on this page.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to comp.lang.c on Tue Dec 30 01:52:07 2025
    From Newsgroup: comp.lang.c

    On Mon, 29 Dec 2025 16:06:50 +0100, Bonita Montero wrote:

    Am 27.12.2025 um 06:51 schrieb Lawrence DrCOOliveiro:

    On Mon, 8 Dec 2025 13:51:49 +0100, Bonita Montero wrote:

    From the glibc Reference Manual:

    rCLThe distinction between text and binary streams is only
    meaningful on systems where text files have a different internal
    representation. On Unix systems, there is no difference between
    the two; the rCybrCO is accepted but ignored.rCY

    However, you need to distinguish the two if you want, like Python
    does, to be able to have a rCLuniversal newlinerCY mode, where you can
    correctly handle line breaks in files written on any of the three main
    platform families: *nix/Unix, Windows, and macOS.

    This is such a useful idea IrCOm surprised no one has suggested that C
    should offer the option.

    Way to distract from my point!
    --- Synchronet 3.21a-Linux NewsLink 1.2