• Re: encapsulating directory operations

    From Jakob Bohm@egenagwemdimtapsar@jbohm.dk to comp.lang.c on Sun Aug 17 21:04:11 2025
    From Newsgroup: comp.lang.c

    On 2025-05-21 12:00, Paul Edwards wrote:
    "Lawrence D'Oliveiro" <ldo@nz.invalid> wrote in message news:100jhor$2lgt3$4@dont-email.me...
    On Wed, 21 May 2025 10:23:27 +1000, Paul Edwards wrote:

    ...

    The C90 standard deferred to MVS - probably still does -
    and says that you can't open a file as "w", then read it as
    "rb" and write (a new file) as "wb", and still access (the
    new file) with "r".

    I was shocked when I saw IBM's C library lose the newlines
    when I did the above, and went to look at the standard to
    show that IBM was violating C90 - but it turns out they
    weren't.

    That sort of means you can't write a "zip" program portably,
    against the theoretical C90 file system. Or you would have
    to have flags to say which files need to be opened as text
    or binary.


    I believe the Info-Zip group's ZIP program overcame this problem, by
    somehow enhancing the same feature that handles the difference between
    line endings on UNIX (LF), CP/M (CRLF) and MacOsClassic (CR), but I
    haven't checked .

    Also, the file system peculiarity is probably the same one I experienced
    when transferring some of my old work from VM/CMS to MS-DOS decades ago.

    I do not agree with IBM's C library, and PDPCLIB does
    not have that behavior, so that constraint could potentially
    be dropped in a C90+ standard.

    BFN. Paul.




    Enjoy

    Jakob
    --
    Jakob Bohm, MSc.Eng., I speak only for myself, not my company
    This public discussion message is non-binding and may contain errors
    All trademarks and other things belong to their owners, if any.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Jakob Bohm@egenagwemdimtapsar@jbohm.dk to comp.lang.c on Sun Aug 17 22:34:30 2025
    From Newsgroup: comp.lang.c

    On 2025-05-22 07:14, Keith Thompson wrote:
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Thu, 22 May 2025 02:20:36 +0200, Jakob Bohm wrote:
    The later UNIX-like file system NTFS ...

    It was (and is) hard to describe NTFS as rCLUnix-likerCY. Yes, it had
    hierarchical directories and long(ish) file names, but not much else.
    Drive letters were inherited (indirectly) from DEC OSes, of all things,
    along with an insistence on having filename extensions, restrictions on
    characters allowed in names etc.

    I consider NTFS UNIX-like because it is built around inodes (called MFT entries) and inode numbers, with no inherent special treatment of MSDOS metacharacters in the fundamental logic, except the few places that
    parse user supplied path strings such as symlink targets or passed
    through path strings from API calls .

    When the CP/M style directory listing operations are done on an NTFS directory, the NTFS code loads the file names and inode number from the directory storage, then checks the inode to fill in details such as
    MS-DOS file attributes and POSIX-style time stamps (using 64 bit time
    since 1600-01-01 00:00:00 GMT), next the user mode API logic converts filenames to the locale character set, discards the unwanted time stamps
    and convert the rest to the API encoding (which may be MS-DOS API locale
    time since 1980 or POSIX time since 1970 or Win32 FILETIME which is same
    as NTFS time).


    I don't believe that NTFS requires filename extensions.
    My understanding is that a file name is stored as a single string
    (with some restrictions).

    Symlinks were not even added until Windows Vista. And you have to have
    special privileges to create them.


    The mechanism for symlinks (NTFS reparse points) was included from the
    start, but exposure of that to user mode was always limited . I'm
    unsure if the crippled POSIX subsystem in the NT 3.10 release included symlinks or only hardlinks .


    Enjoy

    Jakob
    --
    Jakob Bohm, MSc.Eng., I speak only for myself, not my company
    This public discussion message is non-binding and may contain errors
    All trademarks and other things belong to their owners, if any.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Kaz Kylheku@643-408-1753@kylheku.com to comp.lang.c on Mon Aug 18 00:30:35 2025
    From Newsgroup: comp.lang.c

    On 2025-05-21, Paul Edwards <mutazilah@gmail.com> wrote:
    Do note one more thing.

    The C90 standard deferred to MVS - probably still does -
    and says that you can't open a file as "w", then read it as
    "rb" and write (a new file) as "wb", and still access (the
    new file) with "r".

    You mean:

    - write a text file, close it; then
    - open it as a binary file and copy the bytes to another, new binary file; and - finally, read the new binary file in text mode?

    I don't see how that would be allowed to lose any newlines.

    You made a bitwise copy of the file.

    The worst thing that can happen is this: stdio implementations are not
    required to keep the exact length of a binary file down to a byte.

    Binary files can be rounded up and have padding bytes at the end.

    E.g. you write 37 byte, but the file ends up 256 bytes long.

    If an implementation has this issue, it probably will still represent
    text files in such a way that when you copy a binary file, including
    any gratuitous padding, the text file will come out right.

    A recent draft of ISO C says "A binary stream is an ordered sequence of characters that can transparently record internal data. Data read in
    from a binary stream shall compare equal to the data that were earlier
    written out to that stream, under the same implementation. Such a stream
    may, however, have an implementation- defined number of null characters appended to the end of the stream."

    I have C90 somewhere, I can look that up too, but I suspect it was
    the same.

    I was shocked when I saw IBM's C library lose the newlines
    when I did the above, and went to look at the standard to
    show that IBM was violating C90 - but it turns out they
    weren't.

    Losing the newlines in the above scenario (bit copy made of text
    file as a binary file) makes no sense.

    If it is true, someone went out of their way to fuck up something
    simple. (IBM would never do that, right?)

    If we copy a text file as binary and newlines change it suggests that
    the implementation is falling afoul of the "binary stream is an
    ordered sequence of characters that can transparently record
    internal data".

    That sort of means you can't write a "zip" program portably,
    against the theoretical C90 file system.

    Or you would have
    to have flags to say which files need to be opened as text
    or binary.

    This is probably a good idea anyway; you don't want to be compressing
    the proprietary-format binary images of text files, if they are to
    decompress correctly on another system.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Tue Aug 19 18:09:48 2025
    From Newsgroup: comp.lang.c

    On 2025-08-17 20:30, Kaz Kylheku wrote:
    On 2025-05-21, Paul Edwards <mutazilah@gmail.com> wrote:
    Do note one more thing.

    The C90 standard deferred to MVS - probably still does -
    and says that you can't open a file as "w", then read it as
    "rb" and write (a new file) as "wb", and still access (the
    new file) with "r".

    You mean:

    - write a text file, close it; then
    - open it as a binary file and copy the bytes to another, new binary file; and
    - finally, read the new binary file in text mode?

    I don't see how that would be allowed to lose any newlines.

    You made a bitwise copy of the file.
    ...
    A recent draft of ISO C says "A binary stream is an ordered sequence of characters that can transparently record internal data. Data read in
    from a binary stream shall compare equal to the data that were earlier written out to that stream, under the same implementation. Such a stream
    may, however, have an implementation- defined number of null characters appended to the end of the stream."

    I have C90 somewhere, I can look that up too, but I suspect it was
    the same.

    It was.

    This can interact with text mode, if the representation of new-lines in
    text mode involves null characters. I know of two different ways that
    have actually been used where this could be a problem. One method uses
    null characters to represent a single new-line. The other method stores
    lines in fixed-length blocks, with the end of a line indicated by
    padding to the end of the block with null characters. Either way, the
    padding bytes that may be added in binary mode could be interpreted as
    extra newlines in text mode. That does not match Paul's problem:

    I was shocked when I saw IBM's C library lose the newlines
    when I did the above, and went to look at the standard to
    show that IBM was violating C90 - but it turns out they
    weren't.

    It would help if Paul would identify the clauses from C90 that he
    interpreted as permitting such behavior.
    --- Synchronet 3.21a-Linux NewsLink 1.2