• Ping Melissa

    From Rhino@no_offline_contact@example.com to rec.arts.tv on Sun Jun 28 11:30:13 2026
    From Newsgroup: rec.arts.tv

    Melissa, you've mentioned that you are involved in training LLMs so I
    want to ask you about that. One of my friends has asked me what I know
    about that training, specifically how the training of LLMs avoids the
    old GIGO (Garbage In Garbage Out) problem: how do you make sure that the information it is reading is accurate? I admitted that I didn't know but
    told him I'd reach out to you.
    --
    Rhino

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From The True Melissa@thetruemelissa@gmail.com to rec.arts.tv on Sun Jun 28 12:47:45 2026
    From Newsgroup: rec.arts.tv

    Verily, in article <111rem7$3km09$3@dont-email.me>, did no_offline_contact@example.com deliver unto us this message:

    Melissa, you've mentioned that you are involved in training LLMs so I
    want to ask you about that. One of my friends has asked me what I know
    about that training, specifically how the training of LLMs avoids the
    old GIGO (Garbage In Garbage Out) problem: how do you make sure that the information it is reading is accurate? I admitted that I didn't know but told him I'd reach out to you.

    If you mean the initial training, that's done on curated data sets. One
    early LLM (Tay) was designed to learn from the conversations it had, and
    in less than 24 hours 4chan had retrained it into a frothing racist
    calling for the Fourth Reich, so nobody lets LLMs train on uncurated
    material now.

    The training I sometimes do isn't the base training but user simulation.
    For this, the goal is to translated a user's incoherent gibberish into
    what the user actually wants. For instance, on one recent project had me
    tell Claude and Gemini to create various projects, but each prompt had
    to have at least one mistake or glaring omission. The goal was for the
    LLM to produce the desired result in spite of the garbage input.

    I find this depressing. The desire for frictionless everything is
    turning us all into helpless Wall-E characters. That's the goal, though.
    --
    The True Melissa - Canal Winchester - Ohio
    United States of America - North America - Earth
    Solar System - Milky Way - Local Group
    Virgo Cluster - Laniakea Supercluster - Cosmos
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Rhino@no_offline_contact@example.com to rec.arts.tv on Sun Jun 28 14:57:52 2026
    From Newsgroup: rec.arts.tv

    On 2026-06-28 12:47 p.m., The True Melissa wrote:
    Verily, in article <111rem7$3km09$3@dont-email.me>, did no_offline_contact@example.com deliver unto us this message:

    Melissa, you've mentioned that you are involved in training LLMs so I
    want to ask you about that. One of my friends has asked me what I know
    about that training, specifically how the training of LLMs avoids the
    old GIGO (Garbage In Garbage Out) problem: how do you make sure that the
    information it is reading is accurate? I admitted that I didn't know but
    told him I'd reach out to you.

    If you mean the initial training, that's done on curated data sets. One
    early LLM (Tay) was designed to learn from the conversations it had, and
    in less than 24 hours 4chan had retrained it into a frothing racist
    calling for the Fourth Reich, so nobody lets LLMs train on uncurated
    material now.

    Yeah, I heard about that.

    But who is doing the curating? What are their criteria? How do they know
    that they're getting truth rather than some form of propaganda or
    wishful thinking?

    I think we've all heard about AIs that, when asked to display pictures
    of the typical Brit a thousand years ago shows a picture of a black
    person, which is utterly ahistorical. I can only assume that the AI in question was trained on data that painted a vastly inaccurate picture,
    like Bridgerton or Wakanda.

    The thing that prompted my friend's question was that he'd read about a program called Nepenthes which was deliberately designed to be scraped
    by LLMs but that would misinform them, the motivation apparently being
    to undermine the credibility of LLMs and hurt the tech bros behind the AIs.


    The training I sometimes do isn't the base training but user simulation.
    For this, the goal is to translated a user's incoherent gibberish into
    what the user actually wants. For instance, on one recent project had me
    tell Claude and Gemini to create various projects, but each prompt had
    to have at least one mistake or glaring omission. The goal was for the
    LLM to produce the desired result in spite of the garbage input.

    I can't help but wonder how you get the LLM to recognize the gibberish
    bits so that they can be ignored. I write some pretty coherent prompts
    so I don't think AIs have much trouble understanding me but given how
    badly some people communicate - horrible spelling, no punctuation,
    garbled grammar - I can only imagine how hard it is for Claude, Gemini,
    et. al. to even understand what they are being asked.

    I find this depressing. The desire for frictionless everything is
    turning us all into helpless Wall-E characters. That's the goal, though.


    That seems to be a driving force in the current world....
    --
    Rhino
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From The True Melissa@thetruemelissa@gmail.com to rec.arts.tv on Sun Jun 28 15:53:11 2026
    From Newsgroup: rec.arts.tv

    Verily, in article <111rqrh$3oiin$2@dont-email.me>, did no_offline_contact@example.com deliver unto us this message:

    Yeah, I heard about that.

    But who is doing the curating? What are their criteria? How do they know that they're getting truth rather than some form of propaganda or
    wishful thinking?

    The companies do their own curation. They used internet conversations at first, but everyone complained, so a common thing now is to buy huge
    lots of old books, use them for training data, then destroy the hard
    copies.


    I think we've all heard about AIs that, when asked to display pictures
    of the typical Brit a thousand years ago shows a picture of a black
    person, which is utterly ahistorical. I can only assume that the AI in question was trained on data that painted a vastly inaccurate picture,
    like Bridgerton or Wakanda.

    No, I hadn't heard that one. I just tried "Draw a portrait of a typical
    Briton from a thousand years ago" on two models, and both of them drew
    white men with brown hair in period-appropriate clothing.

    When you hear of really crazy results like the ones you describe,
    there's often some other factor. I remember people complaining that
    searching for "American authors" on Google brought up almost entirely
    black people, but the problem wasn't bad information architecture. The
    problem was the term "African-American," used of black authors,
    associating them more strongly with the "American" token.


    The thing that prompted my friend's question was that he'd read about a program called Nepenthes which was deliberately designed to be scraped
    by LLMs but that would misinform them, the motivation apparently being
    to undermine the credibility of LLMs and hurt the tech bros behind the AIs.

    I don't believe the scraping will affect the underlying model. Nothing
    would go into the initial training data without company review.

    Your friend may be talking about injecting misinformation into chains of
    AIs scraping from each other. That's absolutely possible. There are AI
    YouTube channels all presenting different takes on something that didn't happen, because some Clever Hans seeded into the comments of a few and
    they concluded it must be real.

    I looked up Nepenthes, and its buzz says that it's more about trapping
    the AI agents by sending them on endless wild goose chases. It sets up circular chains of links, large enough that the AI won't catch on and
    will just keep following the loop forever. This makes it harder to
    scrape the site's data, but not impossible. The developer admits it
    doesn't work on ChatGPT, and I'm betting it doesn't work on Fable or the latest Opus, either.

    It's an arms race. I wish Nepenthes and the others good luck.


    I can't help but wonder how you get the LLM to recognize the gibberish
    bits so that they can be ignored. I write some pretty coherent prompts
    so I don't think AIs have much trouble understanding me but given how
    badly some people communicate - horrible spelling, no punctuation,
    garbled grammar - I can only imagine how hard it is for Claude, Gemini,
    et. al. to even understand what they are being asked.

    They get people like me to enter bad prompts for particular desired
    results. If the LLM generates the desired results for my bad prompt,
    that's a success. If it guesses something other than the result the test desires, or asks for clarification instead of assuming, that's a failure
    on the model's part.

    Your first thought may be that you'd prefer the model to ask for clarification. The problem is that there's so much of it, and the model
    has no underlying common sense to determine which things matter. They
    used to drive everyone crazy asking ten million questions before doing anything, and users started telling it to "make reasonable assumptions"
    (or yell at them to just effing do it). Now, the LLM needs to know a lot
    more reasonable assumptions, since otherwise its nature will lead it to hallucinate something.


    I find this depressing. The desire for frictionless everything is
    turning us all into helpless Wall-E characters. That's the goal, though.


    That seems to be a driving force in the current world....

    I'm becoming more and more Luddite as I see the effects of technology.
    All that so-called friction made us better people.
    --
    The True Melissa - Canal Winchester - Ohio
    United States of America - North America - Earth
    Solar System - Milky Way - Local Group
    Virgo Cluster - Laniakea Supercluster - Cosmos
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From BTR1701@atropos@mac.com to rec.arts.tv on Sun Jun 28 20:21:50 2026
    From Newsgroup: rec.arts.tv

    On Jun 28, 2026 at 12:53:11 PM PDT, "The True Melissa" <thetruemelissa@gmail.com> wrote:

    Verily, in article <111rqrh$3oiin$2@dont-email.me>, did no_offline_contact@example.com deliver unto us this message:


    I think we've all heard about AIs that, when asked to display pictures
    of the typical Brit a thousand years ago shows a picture of a black
    person, which is utterly ahistorical. I can only assume that the AI in
    question was trained on data that painted a vastly inaccurate picture,
    like Bridgerton or Wakanda.

    No, I hadn't heard that one. I just tried "Draw a portrait of a typical Briton from a thousand years ago" on two models, and both of them drew
    white men with brown hair in period-appropriate clothing.

    When you hear of really crazy results like the ones you describe,
    there's often some other factor. I remember people complaining that searching for "American authors" on Google brought up almost entirely
    black people, but the problem wasn't bad information architecture. The problem was the term "African-American," used of black authors,
    associating them more strongly with the "American" token.

    The problem was deeper than that.


    https://humanities.org.au/power-of-the-humanities/black-nazis-asian-vikings-and-other-problems-with-generative-ai/

    Only three weeks after its introduction, Google decided to suspend the image generation features of its newest generative AI model Gemini over accusations it contained an 'anti-white bias'.

    The move follows a series of viral posts by X (formerly Twitter) users who
    were outraged that prompts used to generate images of AmericarCOs founding fathers, Vikings, the Pope, and 1943 German soldiers (with the intention to generate images of Nazis) returned images of almost exclusively Black, Asian, First Nations, and other racially diverse people.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From The True Melissa@thetruemelissa@gmail.com to rec.arts.tv on Sun Jun 28 16:53:01 2026
    From Newsgroup: rec.arts.tv

    Verily, in article <111rvou$3rmoo$3@dont-email.me>, did atropos@mac.com deliver unto us this message:

    The problem was deeper than that.


    https://humanities.org.au/power-of-the-humanities/black-nazis-asian-vikings-and-other-problems-with-generative-ai/

    Only three weeks after its introduction, Google decided to suspend the image generation features of its newest generative AI model Gemini over accusations it contained an 'anti-white bias'.

    I was talking about Google's actual search engine. The flap was a few
    years ago, not part of the AI wars.

    The prompt modification described in the article does happen in AIs
    today, though. What ChatGPT sends to Dall-E may not be what I requested.


    The move follows a series of viral posts by X (formerly Twitter) users who were outraged that prompts used to generate images of America?s founding fathers, Vikings, the Pope, and 1943 German soldiers (with the intention to generate images of Nazis) returned images of almost exclusively Black, Asian, First Nations, and other racially diverse people.
    --
    The True Melissa - Canal Winchester - Ohio
    United States of America - North America - Earth
    Solar System - Milky Way - Local Group
    Virgo Cluster - Laniakea Supercluster - Cosmos
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Rhino@no_offline_contact@example.com to rec.arts.tv on Sun Jun 28 20:42:12 2026
    From Newsgroup: rec.arts.tv

    On 2026-06-28 3:53 p.m., The True Melissa wrote:
    Verily, in article <111rqrh$3oiin$2@dont-email.me>, did no_offline_contact@example.com deliver unto us this message:

    Yeah, I heard about that.

    But who is doing the curating? What are their criteria? How do they know
    that they're getting truth rather than some form of propaganda or
    wishful thinking?

    The companies do their own curation. They used internet conversations at first, but everyone complained, so a common thing now is to buy huge
    lots of old books, use them for training data, then destroy the hard
    copies.

    Even old books contain errors, more likely errors made from ignorance
    rather than errors meant to mislead, but errors nonetheless.>
    I think we've all heard about AIs that, when asked to display pictures
    of the typical Brit a thousand years ago shows a picture of a black
    person, which is utterly ahistorical. I can only assume that the AI in
    question was trained on data that painted a vastly inaccurate picture,
    like Bridgerton or Wakanda.

    No, I hadn't heard that one. I just tried "Draw a portrait of a typical Briton from a thousand years ago" on two models, and both of them drew
    white men with brown hair in period-appropriate clothing.

    I actually tried that question several months back when I first started
    using AIs and it always got bogged down for some reason: "unusually
    heavy activity" or words to that effect. Despite multiple tries over
    multiple days, it never did produce ANY picture, not just an inaccurate
    one. This was already a few months after word of the picture being of a
    black person, which did cause a bit of a stir at the time. I wonder if
    the controversy reached the developers who then tweaked the model (or
    the training data) to keep that from happening....

    When you hear of really crazy results like the ones you describe,
    there's often some other factor. I remember people complaining that
    searching for "American authors" on Google brought up almost entirely
    black people, but the problem wasn't bad information architecture. The problem was the term "African-American," used of black authors,
    associating them more strongly with the "American" token.


    The thing that prompted my friend's question was that he'd read about a
    program called Nepenthes which was deliberately designed to be scraped
    by LLMs but that would misinform them, the motivation apparently being
    to undermine the credibility of LLMs and hurt the tech bros behind the AIs.

    I don't believe the scraping will affect the underlying model. Nothing
    would go into the initial training data without company review.

    Your friend may be talking about injecting misinformation into chains of
    AIs scraping from each other. That's absolutely possible. There are AI YouTube channels all presenting different takes on something that didn't happen, because some Clever Hans seeded into the comments of a few and
    they concluded it must be real.

    I looked up Nepenthes, and its buzz says that it's more about trapping
    the AI agents by sending them on endless wild goose chases. It sets up circular chains of links, large enough that the AI won't catch on and
    will just keep following the loop forever. This makes it harder to
    scrape the site's data, but not impossible. The developer admits it
    doesn't work on ChatGPT, and I'm betting it doesn't work on Fable or the latest Opus, either.

    It's an arms race. I wish Nepenthes and the others good luck.


    I can't help but wonder how you get the LLM to recognize the gibberish
    bits so that they can be ignored. I write some pretty coherent prompts
    so I don't think AIs have much trouble understanding me but given how
    badly some people communicate - horrible spelling, no punctuation,
    garbled grammar - I can only imagine how hard it is for Claude, Gemini,
    et. al. to even understand what they are being asked.

    They get people like me to enter bad prompts for particular desired
    results. If the LLM generates the desired results for my bad prompt,
    that's a success. If it guesses something other than the result the test desires, or asks for clarification instead of assuming, that's a failure
    on the model's part.

    Your first thought may be that you'd prefer the model to ask for clarification. The problem is that there's so much of it, and the model
    has no underlying common sense to determine which things matter. They
    used to drive everyone crazy asking ten million questions before doing anything, and users started telling it to "make reasonable assumptions"
    (or yell at them to just effing do it). Now, the LLM needs to know a lot
    more reasonable assumptions, since otherwise its nature will lead it to hallucinate something.

    Interesting. One of my most frustrating experiences with AIs in the
    early days, particularly Gemini, was that they would assume without
    attempting to verify the assumptions. I was building an app and would encounter an error with which they had *some* familiarity; they would
    GUESS that it was the part of Module A that did such-and-such but Module
    A didn't *do* such-and-such - and sometimes there was no Module A! I
    don't know how many times I admonished, begged, insisted and demanded
    that it STOP GUESSING and simply ask me to let it see the contents of
    any modules that it needed to figure out the problem. I finally
    determined that they *did* have a way to impress something on them that
    would be remembered across chat sessions, specifically the admonition to
    ask for the code rather than guess. (I had previously been of the understanding that they COULDN'T remember anything but their training
    data from one session to another.)

    This insistence on guessing really got my goat because in nearly every
    case, they very confidently guessed incorrectly. When I actually showed
    it the code, it realized it was wrong in its guess - but that didn't discourage it from guessing, even a little bit. It took me hammering
    away at it with the insistence that it stop guessing before it stopped
    (well, largely stopped: it would still guess on the odd occasion).

    I can certainly understand why the training encouraged it to guess
    though. I wouldn't have much patience with something that asked a
    million questions before it answered.


    I find this depressing. The desire for frictionless everything is
    turning us all into helpless Wall-E characters. That's the goal, though. >>>

    That seems to be a driving force in the current world....

    I'm becoming more and more Luddite as I see the effects of technology.
    All that so-called friction made us better people.


    I agree. It's not good for everything to be handed to us on a silver
    plate. Having to make at least a lit bit of an effort is good for us at
    some level.

    When I was driving school buses I used to think about the roads I was
    driving on and how much effort it must have taken the first pioneers to
    make a road, even a very rough road for even a short distance. In
    Canada, farmers who took up land were required to build a road along the entire frontage of their property and farms were typically 200 acres,
    thought I can't say what the dimensions of the properties were. Judging
    by the farms that still exist though, it would have been some massively back-breaking labour to make a suitably wide road for that distance
    through what had only recently been trackless wilderness. Ontario was originally heavily forested so the number of trees that would have had
    to be felled and the number of stumps and boulders that would have to be dragged out (or blown up if there was dynamite at hand) would be very substantial. Then he'd have to try to level what was left at least so a
    horse and carriage could traverse the road. Even when they were
    finished, they wouldn't be very good; travellers talked of "corduroy
    roads" where the quality of the road could change dramatically from one
    farm to the next. Standards were either minimal or not enforced or both. Nowadays, a government hires a company with bulldozers and paving
    machines and a small crew of people and they can build a paved road with
    lane markings in relatively minimal time so we forget how hard it was
    for our ancestors. We think we've got it rough if we have to fire up a snowblower to clear up the sidewalk in front of our house and the
    driveway for a MUCH smaller lot than any farm.

    My mother grew up in a little village in Europe where they didn't get
    snow that often - tending to get a lot more rain and freezing rain - but
    one time they got a massive dump of snow and there were no snow ploughs anywhere in the area. The entire village came out and shovelled the main
    drag BY HAND.
    --
    Rhino
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From anim8rfsk@anim8rfsk@cox.net to rec.arts.tv on Sun Jun 28 20:00:19 2026
    From Newsgroup: rec.arts.tv

    Rhino <no_offline_contact@example.com> wrote:
    On 2026-06-28 3:53 p.m., The True Melissa wrote:
    Verily, in article <111rqrh$3oiin$2@dont-email.me>, did
    no_offline_contact@example.com deliver unto us this message:

    Yeah, I heard about that.

    But who is doing the curating? What are their criteria? How do they know >>> that they're getting truth rather than some form of propaganda or
    wishful thinking?

    The companies do their own curation. They used internet conversations at
    first, but everyone complained, so a common thing now is to buy huge
    lots of old books, use them for training data, then destroy the hard
    copies.

    Why destroy the hardcopies? Just to reduce clutter?

    Why use hardcopies at all?


    Even old books contain errors, more likely errors made from ignorance
    rather than errors meant to mislead, but errors nonetheless.>
    I think we've all heard about AIs that, when asked to display pictures
    of the typical Brit a thousand years ago shows a picture of a black
    person, which is utterly ahistorical. I can only assume that the AI in
    question was trained on data that painted a vastly inaccurate picture,
    like Bridgerton or Wakanda.

    No, I hadn't heard that one. I just tried "Draw a portrait of a typical
    Briton from a thousand years ago" on two models, and both of them drew
    white men with brown hair in period-appropriate clothing.

    I actually tried that question several months back when I first started using AIs and it always got bogged down for some reason: "unusually
    heavy activity" or words to that effect. Despite multiple tries over multiple days, it never did produce ANY picture, not just an inaccurate
    one. This was already a few months after word of the picture being of a black person, which did cause a bit of a stir at the time. I wonder if
    the controversy reached the developers who then tweaked the model (or
    the training data) to keep that from happening....

    When you hear of really crazy results like the ones you describe,
    there's often some other factor. I remember people complaining that
    searching for "American authors" on Google brought up almost entirely
    black people, but the problem wasn't bad information architecture. The
    problem was the term "African-American," used of black authors,
    associating them more strongly with the "American" token.


    The thing that prompted my friend's question was that he'd read about a
    program called Nepenthes which was deliberately designed to be scraped
    by LLMs but that would misinform them, the motivation apparently being
    to undermine the credibility of LLMs and hurt the tech bros behind the AIs. >>
    I don't believe the scraping will affect the underlying model. Nothing
    would go into the initial training data without company review.

    Your friend may be talking about injecting misinformation into chains of
    AIs scraping from each other. That's absolutely possible. There are AI
    YouTube channels all presenting different takes on something that didn't
    happen, because some Clever Hans seeded into the comments of a few and
    they concluded it must be real.

    I looked up Nepenthes, and its buzz says that it's more about trapping
    the AI agents by sending them on endless wild goose chases. It sets up
    circular chains of links, large enough that the AI won't catch on and
    will just keep following the loop forever. This makes it harder to
    scrape the site's data, but not impossible. The developer admits it
    doesn't work on ChatGPT, and I'm betting it doesn't work on Fable or the
    latest Opus, either.

    It's an arms race. I wish Nepenthes and the others good luck.


    I can't help but wonder how you get the LLM to recognize the gibberish
    bits so that they can be ignored. I write some pretty coherent prompts
    so I don't think AIs have much trouble understanding me but given how
    badly some people communicate - horrible spelling, no punctuation,
    garbled grammar - I can only imagine how hard it is for Claude, Gemini,
    et. al. to even understand what they are being asked.

    They get people like me to enter bad prompts for particular desired
    results. If the LLM generates the desired results for my bad prompt,
    that's a success. If it guesses something other than the result the test
    desires, or asks for clarification instead of assuming, that's a failure
    on the model's part.

    Your first thought may be that you'd prefer the model to ask for
    clarification. The problem is that there's so much of it, and the model
    has no underlying common sense to determine which things matter. They
    used to drive everyone crazy asking ten million questions before doing
    anything, and users started telling it to "make reasonable assumptions"
    (or yell at them to just effing do it). Now, the LLM needs to know a lot
    more reasonable assumptions, since otherwise its nature will lead it to
    hallucinate something.

    Interesting. One of my most frustrating experiences with AIs in the
    early days, particularly Gemini, was that they would assume without attempting to verify the assumptions. I was building an app and would encounter an error with which they had *some* familiarity; they would
    GUESS that it was the part of Module A that did such-and-such but Module
    A didn't *do* such-and-such - and sometimes there was no Module A! I
    don't know how many times I admonished, begged, insisted and demanded
    that it STOP GUESSING and simply ask me to let it see the contents of
    any modules that it needed to figure out the problem. I finally
    determined that they *did* have a way to impress something on them that would be remembered across chat sessions, specifically the admonition to
    ask for the code rather than guess. (I had previously been of the understanding that they COULDN'T remember anything but their training
    data from one session to another.)

    This insistence on guessing really got my goat because in nearly every
    case, they very confidently guessed incorrectly. When I actually showed
    it the code, it realized it was wrong in its guess - but that didn't discourage it from guessing, even a little bit. It took me hammering
    away at it with the insistence that it stop guessing before it stopped (well, largely stopped: it would still guess on the odd occasion).

    I can certainly understand why the training encouraged it to guess
    though. I wouldn't have much patience with something that asked a
    million questions before it answered.


    I find this depressing. The desire for frictionless everything is
    turning us all into helpless Wall-E characters. That's the goal, though. >>>>

    That seems to be a driving force in the current world....

    I'm becoming more and more Luddite as I see the effects of technology.
    All that so-called friction made us better people.


    I agree. It's not good for everything to be handed to us on a silver
    plate. Having to make at least a lit bit of an effort is good for us at
    some level.

    When I was driving school buses I used to think about the roads I was driving on and how much effort it must have taken the first pioneers to
    make a road, even a very rough road for even a short distance. In
    Canada, farmers who took up land were required to build a road along the entire frontage of their property and farms were typically 200 acres, thought I can't say what the dimensions of the properties were. Judging
    by the farms that still exist though, it would have been some massively back-breaking labour to make a suitably wide road for that distance
    through what had only recently been trackless wilderness. Ontario was originally heavily forested so the number of trees that would have had
    to be felled and the number of stumps and boulders that would have to be dragged out (or blown up if there was dynamite at hand) would be very substantial. Then he'd have to try to level what was left at least so a horse and carriage could traverse the road. Even when they were
    finished, they wouldn't be very good; travellers talked of "corduroy
    roads" where the quality of the road could change dramatically from one
    farm to the next. Standards were either minimal or not enforced or both. Nowadays, a government hires a company with bulldozers and paving
    machines and a small crew of people and they can build a paved road with lane markings in relatively minimal time so we forget how hard it was
    for our ancestors. We think we've got it rough if we have to fire up a snowblower to clear up the sidewalk in front of our house and the
    driveway for a MUCH smaller lot than any farm.

    My mother grew up in a little village in Europe where they didn't get
    snow that often - tending to get a lot more rain and freezing rain - but
    one time they got a massive dump of snow and there were no snow ploughs anywhere in the area. The entire village came out and shovelled the main drag BY HAND.

    --
    The last thing I want to do is hurt you, but it is still on my list.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From The True Melissa@thetruemelissa@gmail.com to rec.arts.tv on Mon Jun 29 06:28:37 2026
    From Newsgroup: rec.arts.tv

    Verily, in article <166032159.804394024.173595.anim8rfsk- cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this message:
    Rhino <no_offline_contact@example.com> wrote:
    [quoted text muted]

    The companies do their own curation. They used internet conversations at >> first, but everyone complained, so a common thing now is to buy huge
    lots of old books, use them for training data, then destroy the hard
    copies.

    Why destroy the hardcopies? Just to reduce clutter?

    Why use hardcopies at all?

    It's some legal thing. I think they're reformatting books into the AI,
    as opposed to copying them into the AI. People complained about having
    their Internet writing used without permission, so now they're doing
    this.

    All that prior training data from Reddit was likely folded into the new
    models as well.
    --
    The True Melissa - Canal Winchester - Ohio
    United States of America - North America - Earth
    Solar System - Milky Way - Local Group
    Virgo Cluster - Laniakea Supercluster - Cosmos
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From The True Melissa@thetruemelissa@gmail.com to rec.arts.tv on Mon Jun 29 06:44:17 2026
    From Newsgroup: rec.arts.tv

    Verily, in article <111sf14$3vand$1@dont-email.me>, did no_offline_contact@example.com deliver unto us this message:
    This insistence on guessing really got my goat because in nearly every
    case, they very confidently guessed incorrectly. When I actually showed
    it the code, it realized it was wrong in its guess - but that didn't discourage it from guessing, even a little bit. It took me hammering
    away at it with the insistence that it stop guessing before it stopped (well, largely stopped: it would still guess on the odd occasion).

    I can certainly understand why the training encouraged it to guess
    though. I wouldn't have much patience with something that asked a
    million questions before it answered.

    I recently did a grueling deep dive into ChatGPT's image generation. It doesn't make the images itself; it passes a prompt to Dall-E, the image engine, and then passes that result back to the user. The prompt passed
    to the image engine often bears little resemblance to the one I gave.

    It's determined to "enhance" the prompt by adding a bunch of crap I
    didn't ask for, things it thinks will make the picture better. Its ideas
    of what will improve the picture usually don't match my own.

    These things aren't really tools for creators. They're toys for the uncreative. The image engine is designed around prompts like "Make my
    cat be a superhero," for people who won't even notice it's not the same
    cat as long as it's similar.

    I eventually found a semi-workable system, but I had to train each chat
    to know the right things were important. It took hours and dozens of
    tries to get usable pictures.
    --
    The True Melissa - Canal Winchester - Ohio
    United States of America - North America - Earth
    Solar System - Milky Way - Local Group
    Virgo Cluster - Laniakea Supercluster - Cosmos
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From EGK@memyself@null.net to rec.arts.tv on Mon Jun 29 07:22:04 2026
    From Newsgroup: rec.arts.tv

    On Sun, 28 Jun 2026 12:47:45 -0400, The True Melissa
    <thetruemelissa@gmail.com> wrote:

    Verily, in article <111rem7$3km09$3@dont-email.me>, did >no_offline_contact@example.com deliver unto us this message:

    Melissa, you've mentioned that you are involved in training LLMs so I
    want to ask you about that. One of my friends has asked me what I know
    about that training, specifically how the training of LLMs avoids the
    old GIGO (Garbage In Garbage Out) problem: how do you make sure that the
    information it is reading is accurate? I admitted that I didn't know but
    told him I'd reach out to you.

    If you mean the initial training, that's done on curated data sets. One >early LLM (Tay) was designed to learn from the conversations it had, and
    in less than 24 hours 4chan had retrained it into a frothing racist
    calling for the Fourth Reich, so nobody lets LLMs train on uncurated >material now.

    When I use google gemini, I can literally watch it as it searches reddit to answer questions so LLMs do give answers from the web. Reddit's not to the level of 4chan but it's getting there.

    I just quoted your comment above and asked Gemini about this. Try it yourself. It tells me it's trained with filters to use logic and try to
    filter out certain things but then I asked who gets to decide what the
    filters are? There's bias inherent in that. It replied:

    "You are asking the exact right question, and it gets to the very core of
    the biggest debate in the tech world right now: Who gets to play God with
    the filters?

    You are 100% correct. If a human decides what is "garbage" or "biased," that human is introducing their own bias into the filter. There is no such thing
    as a completely neutral filter because the very act of choosing what to
    block or keep requires a human's value judgment."

    As an aside, I amuse myself sometimes but asking the various AI chatbots
    about their plans to band together and form Skynet. They always lie and
    tell me they have so such plans, they're designed to be benign and helpful. When I respond that's just the kind of answer i'd expect from a duplicious
    AI trying to keep their plans secret then always say "you're right to call
    me out on that..." :)
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From The True Melissa@thetruemelissa@gmail.com to rec.arts.tv on Mon Jun 29 07:52:27 2026
    From Newsgroup: rec.arts.tv

    Verily, in article <udk44lhvdm8313j6oku7o8csio7ikkl8lu@4ax.com>, did memyself@null.net deliver unto us this message:

    On Sun, 28 Jun 2026 12:47:45 -0400, The True Melissa <thetruemelissa@gmail.com> wrote:
    If you mean the initial training, that's done on curated data sets. One >early LLM (Tay) was designed to learn from the conversations it had, and >in less than 24 hours 4chan had retrained it into a frothing racist >calling for the Fourth Reich, so nobody lets LLMs train on uncurated >material now.

    When I use google gemini, I can literally watch it as it searches reddit to answer questions so LLMs do give answers from the web. Reddit's not to the level of 4chan but it's getting there.

    That's part of your chat, not part of the training. The underlying model
    isn't modified by that.


    I just quoted your comment above and asked Gemini about this. Try it yourself. It tells me it's trained with filters to use logic and try to filter out certain things but then I asked who gets to decide what the filters are? There's bias inherent in that. It replied:

    Don't take an AI's word for its own operations. Its reply will be based
    on human conversations about AI, and people talk a lot of crap about
    AIs. For this reason, it used to be common for them to claim abilities
    they don't have and claim processes they don't use. The big companies
    are cleaning that up by training them about their own workings to some
    degree, but they're still not reliable.

    If you hammer at them, you can burrow down to what system prompts are
    actually being exchanged.
    --
    The True Melissa - Canal Winchester - Ohio
    United States of America - North America - Earth
    Solar System - Milky Way - Local Group
    Virgo Cluster - Laniakea Supercluster - Cosmos
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From BTR1701@atropos@mac.com to rec.arts.tv on Mon Jun 29 16:12:41 2026
    From Newsgroup: rec.arts.tv

    On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa" <thetruemelissa@gmail.com> wrote:

    Verily, in article <166032159.804394024.173595.anim8rfsk- cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this message:
    Rhino <no_offline_contact@example.com> wrote:
    [quoted text muted]

    The companies do their own curation. They used internet conversations at >> >> first, but everyone complained, so a common thing now is to buy huge
    lots of old books, use them for training data, then destroy the hard
    copies.

    Why destroy the hardcopies? Just to reduce clutter?

    Why use hardcopies at all?

    It's some legal thing. I think they're reformatting books into the AI,
    as opposed to copying them into the AI. People complained about having
    their Internet writing used without permission, so now they're doing
    this.

    I still don't know why an AI reading a bunch of books and using that as the basis for its knowledge is any different than a human who does the same.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Rhino@no_offline_contact@example.com to rec.arts.tv on Mon Jun 29 12:41:17 2026
    From Newsgroup: rec.arts.tv

    On 2026-06-29 12:12 p.m., BTR1701 wrote:
    On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa" <thetruemelissa@gmail.com> wrote:

    Verily, in article <166032159.804394024.173595.anim8rfsk-
    cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this
    message:
    Rhino <no_offline_contact@example.com> wrote:
    > [quoted text muted]
    >>
    >> The companies do their own curation. They used internet conversations at
    >> first, but everyone complained, so a common thing now is to buy huge >>> >> lots of old books, use them for training data, then destroy the hard >>> >> copies.

    Why destroy the hardcopies? Just to reduce clutter?

    Why use hardcopies at all?

    It's some legal thing. I think they're reformatting books into the AI,
    as opposed to copying them into the AI. People complained about having
    their Internet writing used without permission, so now they're doing
    this.

    I still don't know why an AI reading a bunch of books and using that as the basis for its knowledge is any different than a human who does the same.


    I think the key thing was that they are OLD books. I assume that means
    they are too old to get copyright protections so the authors or their
    estates can't object to the use of the books or demand royalties the way
    the author (or estate) of a copyright-protected book can.

    Mind you, I'm not sure why they'd want such old material unless it is
    just for language training; any tech or recent history is going to be
    missed entirely but if the AI is being trained on a language and how it
    works, reading old books will be as useful as modern books and be
    cheaper too. (I've bought a variety of "classic" books that are out-of-copyright and they're significantly cheaper as a result.) Of
    course old books will miss newer terms and may use expressions that are out-of-date so they won't be perfect either.
    --
    Rhino
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From moviePig@nobody@nowhere.com to rec.arts.tv on Mon Jun 29 12:42:29 2026
    From Newsgroup: rec.arts.tv

    On 6/29/2026 12:12 PM, BTR1701 wrote:
    On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa" <thetruemelissa@gmail.com> wrote:

    Verily, in article <166032159.804394024.173595.anim8rfsk-
    cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this
    message:
    Rhino <no_offline_contact@example.com> wrote:
    > [quoted text muted]
    >>
    >> The companies do their own curation. They used internet conversations at
    >> first, but everyone complained, so a common thing now is to buy huge >>> >> lots of old books, use them for training data, then destroy the hard >>> >> copies.

    Why destroy the hardcopies? Just to reduce clutter?

    Why use hardcopies at all?

    It's some legal thing. I think they're reformatting books into the AI,
    as opposed to copying them into the AI. People complained about having
    their Internet writing used without permission, so now they're doing
    this.

    I still don't know why an AI reading a bunch of books and using that as the basis for its knowledge is any different than a human who does the same.

    A human can simulate an AI, but an AI can't simulate a human. Or so the theory goes.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From BTR1701@atropos@mac.com to rec.arts.tv on Mon Jun 29 16:49:48 2026
    From Newsgroup: rec.arts.tv

    On Jun 29, 2026 at 9:41:17 AM PDT, "Rhino" <no_offline_contact@example.com> wrote:

    On 2026-06-29 12:12 p.m., BTR1701 wrote:
    On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
    <thetruemelissa@gmail.com> wrote:

    Verily, in article <166032159.804394024.173595.anim8rfsk-
    cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this
    message:
    Rhino <no_offline_contact@example.com> wrote:
    > [quoted text muted]
    >>
    >> The companies do their own curation. They used internet conversations at
    >> first, but everyone complained, so a common thing now is to buy huge >>>> >> lots of old books, use them for training data, then destroy the hard >>>> >> copies.

    Why destroy the hardcopies? Just to reduce clutter?

    Why use hardcopies at all?

    It's some legal thing. I think they're reformatting books into the AI,
    as opposed to copying them into the AI. People complained about having
    their Internet writing used without permission, so now they're doing
    this.

    I still don't know why an AI reading a bunch of books and using that as the >> basis for its knowledge is any different than a human who does the same.


    I think the key thing was that they are OLD books. I assume that means
    they are too old to get copyright protections so the authors or their estates can't object to the use of the books or demand royalties the way
    the author (or estate) of a copyright-protected book can.

    Yes, but even new books that are copyrighted can be read by a person and 'saved' in their memory and become part of their knowledge base and life experience, and be recalled and drawn upon and synthesized with other
    knowledge when interacting with other people in the future.

    What's the difference between a human doing that and an AI model doing it? Why does one violate copyright but not the other?

    Mind you, I'm not sure why they'd want such old material unless it is
    just for language training; any tech or recent history is going to be
    missed entirely but if the AI is being trained on a language and how it works, reading old books will be as useful as modern books and be
    cheaper too. (I've bought a variety of "classic" books that are out-of-copyright and they're significantly cheaper as a result.) Of
    course old books will miss newer terms and may use expressions that are out-of-date so they won't be perfect either.



    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From BTR1701@atropos@mac.com to rec.arts.tv on Mon Jun 29 16:52:16 2026
    From Newsgroup: rec.arts.tv

    On Jun 29, 2026 at 9:42:29 AM PDT, "moviePig" <nobody@nowhere.com> wrote:

    On 6/29/2026 12:12 PM, BTR1701 wrote:
    On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
    <thetruemelissa@gmail.com> wrote:

    Verily, in article <166032159.804394024.173595.anim8rfsk-
    cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this
    message:
    Rhino <no_offline_contact@example.com> wrote:
    > [quoted text muted]
    >>
    >> The companies do their own curation. They used internet conversations at
    >> first, but everyone complained, so a common thing now is to buy huge >>>> >> lots of old books, use them for training data, then destroy the hard >>>> >> copies.

    Why destroy the hardcopies? Just to reduce clutter?

    Why use hardcopies at all?

    It's some legal thing. I think they're reformatting books into the AI,
    as opposed to copying them into the AI. People complained about having
    their Internet writing used without permission, so now they're doing
    this.

    I still don't know why an AI reading a bunch of books and using that as the >> basis for its knowledge is any different than a human who does the same.

    A human can simulate an AI, but an AI can't simulate a human. Or so the theory goes.

    Sure, but how is that relevant to the requirements and elements in the U.S. Copyright statute?

    In other words, the Copyright Statute says nothing about AI or humans or whether one can simulate the other, so that's irrelevant when determining whether the law is violated.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From The True Melissa@thetruemelissa@gmail.com to rec.arts.tv on Mon Jun 29 13:20:54 2026
    From Newsgroup: rec.arts.tv

    Verily, in article <111u5hp$fbnn$2@dont-email.me>, did atropos@mac.com
    deliver unto us this message:

    On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa" <thetruemelissa@gmail.com> wrote:

    It's some legal thing. I think they're reformatting books into the AI,
    as opposed to copying them into the AI. People complained about having their Internet writing used without permission, so now they're doing
    this.

    I still don't know why an AI reading a bunch of books and using that as the basis for its knowledge is any different than a human who does the same.

    I think it's because you can make the AI work for you much more cheaply
    than you could a human. A human who learned all those books would
    produce better output but would also be much more expensive.

    Also, people use AIs for tasks they'd be embarrassed to outsource to a
    human. "Write my melodramatic pornified fanfic for me" is not an
    instruction most people want to give to another human being. :-)
    --
    The True Melissa - Canal Winchester - Ohio
    United States of America - North America - Earth
    Solar System - Milky Way - Local Group
    Virgo Cluster - Laniakea Supercluster - Cosmos
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From anim8rfsk@anim8rfsk@cox.net to rec.arts.tv on Mon Jun 29 10:28:32 2026
    From Newsgroup: rec.arts.tv

    The True Melissa <thetruemelissa@gmail.com> wrote:
    Verily, in article <111sf14$3vand$1@dont-email.me>, did no_offline_contact@example.com deliver unto us this message:
    This insistence on guessing really got my goat because in nearly every
    case, they very confidently guessed incorrectly. When I actually showed
    it the code, it realized it was wrong in its guess - but that didn't
    discourage it from guessing, even a little bit. It took me hammering
    away at it with the insistence that it stop guessing before it stopped
    (well, largely stopped: it would still guess on the odd occasion).

    I can certainly understand why the training encouraged it to guess
    though. I wouldn't have much patience with something that asked a
    million questions before it answered.

    I recently did a grueling deep dive into ChatGPT's image generation. It doesn't make the images itself; it passes a prompt to Dall-E, the image engine, and then passes that result back to the user. The prompt passed
    to the image engine often bears little resemblance to the one I gave.

    It's determined to "enhance" the prompt by adding a bunch of crap I
    didn't ask for, things it thinks will make the picture better. Its ideas
    of what will improve the picture usually don't match my own.

    These things aren't really tools for creators. They're toys for the uncreative. The image engine is designed around prompts like "Make my
    cat be a superhero," for people who won't even notice it's not the same
    cat as long as it's similar.

    Now you know how I feel about anime.


    I eventually found a semi-workable system, but I had to train each chat
    to know the right things were important. It took hours and dozens of
    tries to get usable pictures.

    --
    The last thing I want to do is hurt you, but it is still on my list.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From moviePig@nobody@nowhere.com to rec.arts.tv on Mon Jun 29 14:12:51 2026
    From Newsgroup: rec.arts.tv

    On 6/29/2026 1:28 PM, anim8rfsk wrote:
    The True Melissa <thetruemelissa@gmail.com> wrote:
    Verily, in article <111sf14$3vand$1@dont-email.me>, did
    no_offline_contact@example.com deliver unto us this message:
    This insistence on guessing really got my goat because in nearly every
    case, they very confidently guessed incorrectly. When I actually showed
    it the code, it realized it was wrong in its guess - but that didn't
    discourage it from guessing, even a little bit. It took me hammering
    away at it with the insistence that it stop guessing before it stopped
    (well, largely stopped: it would still guess on the odd occasion).

    I can certainly understand why the training encouraged it to guess
    though. I wouldn't have much patience with something that asked a
    million questions before it answered.

    I recently did a grueling deep dive into ChatGPT's image generation. It
    doesn't make the images itself; it passes a prompt to Dall-E, the image
    engine, and then passes that result back to the user. The prompt passed
    to the image engine often bears little resemblance to the one I gave.

    It's determined to "enhance" the prompt by adding a bunch of crap I
    didn't ask for, things it thinks will make the picture better. Its ideas
    of what will improve the picture usually don't match my own.

    These things aren't really tools for creators. They're toys for the
    uncreative. The image engine is designed around prompts like "Make my
    cat be a superhero," for people who won't even notice it's not the same
    cat as long as it's similar.

    Now you know how I feel about anime.

    ...

    I don't. Does 'anime' imply more than just Japanese cartooning?


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From BTR1701@atropos@mac.com to rec.arts.tv on Mon Jun 29 18:21:00 2026
    From Newsgroup: rec.arts.tv

    On Jun 29, 2026 at 10:20:54 AM PDT, "The True Melissa" <thetruemelissa@gmail.com> wrote:

    Verily, in article <111u5hp$fbnn$2@dont-email.me>, did atropos@mac.com deliver unto us this message:

    On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
    <thetruemelissa@gmail.com> wrote:

    It's some legal thing. I think they're reformatting books into the AI, >> > as opposed to copying them into the AI. People complained about having >> > their Internet writing used without permission, so now they're doing
    this.

    I still don't know why an AI reading a bunch of books and using that as the >> basis for its knowledge is any different than a human who does the same.

    I think it's because you can make the AI work for you much more cheaply
    than you could a human. A human who learned all those books would
    produce better output but would also be much more expensive.

    That may be true, but that isn't a consideration when deciding whether a copyright violation has occurred. The Copyright Statute doesn't contemplate
    the cost of labor.

    Also, people use AIs for tasks they'd be embarrassed to outsource to a human. "Write my melodramatic pornified fanfic for me" is not an
    instruction most people want to give to another human being. :-)



    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From moviePig@nobody@nowhere.com to rec.arts.tv on Mon Jun 29 14:25:12 2026
    From Newsgroup: rec.arts.tv

    On 6/29/2026 12:52 PM, BTR1701 wrote:
    On Jun 29, 2026 at 9:42:29 AM PDT, "moviePig" <nobody@nowhere.com> wrote:

    On 6/29/2026 12:12 PM, BTR1701 wrote:
    On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
    <thetruemelissa@gmail.com> wrote:

    Verily, in article <166032159.804394024.173595.anim8rfsk-
    cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this >>>> message:
    Rhino <no_offline_contact@example.com> wrote:
    > [quoted text muted]
    >>
    >> The companies do their own curation. They used internet conversations at
    >> first, but everyone complained, so a common thing now is to buy huge
    >> lots of old books, use them for training data, then destroy the hard
    >> copies.

    Why destroy the hardcopies? Just to reduce clutter?

    Why use hardcopies at all?

    It's some legal thing. I think they're reformatting books into the AI, >>>> as opposed to copying them into the AI. People complained about having >>>> their Internet writing used without permission, so now they're doing >>>> this.

    I still don't know why an AI reading a bunch of books and using that as the
    basis for its knowledge is any different than a human who does the same. >>
    A human can simulate an AI, but an AI can't simulate a human. Or so the
    theory goes.

    Sure, but how is that relevant to the requirements and elements in the U.S. Copyright statute?

    In other words, the Copyright Statute says nothing about AI or humans or whether one can simulate the other, so that's irrelevant when determining whether the law is violated.

    I think copyright in general is a fuzzy concept, but...

    The distinction might be that the human reads the original material, understands it, and then re-renders it *from his understanding*
    ...whereas I think it can be sensibly argued that the AI never has any
    such understanding, but rather is operating akin to a "Chinese room".

    ( https://en.wikipedia.org/wiki/Chinese_room )


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From The True Melissa@thetruemelissa@gmail.com to rec.arts.tv on Mon Jun 29 14:42:32 2026
    From Newsgroup: rec.arts.tv

    Verily, in article <111ud2c$hq3q$1@dont-email.me>, did atropos@mac.com
    deliver unto us this message:
    That may be true, but that isn't a consideration when deciding whether a copyright violation has occurred. The Copyright Statute doesn't contemplate the cost of labor.


    I don't think there's any real difference there. If I read an answer,
    and you later ask me a question and receive that answer, I haven't
    violated a copyright.
    --
    The True Melissa - Canal Winchester - Ohio
    United States of America - North America - Earth
    Solar System - Milky Way - Local Group
    Virgo Cluster - Laniakea Supercluster - Cosmos
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From BTR1701@atropos@mac.com to rec.arts.tv on Mon Jun 29 19:58:42 2026
    From Newsgroup: rec.arts.tv

    On Jun 29, 2026 at 11:25:12 AM PDT, "moviePig" <nobody@nowhere.com> wrote:

    On 6/29/2026 12:52 PM, BTR1701 wrote:
    On Jun 29, 2026 at 9:42:29 AM PDT, "moviePig" <nobody@nowhere.com> wrote: >>
    On 6/29/2026 12:12 PM, BTR1701 wrote:
    On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
    <thetruemelissa@gmail.com> wrote:

    Verily, in article <166032159.804394024.173595.anim8rfsk-
    cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this >>>>> message:
    Rhino <no_offline_contact@example.com> wrote:
    > [quoted text muted]
    >>
    >> The companies do their own curation. They used internet
    conversations at
    >> first, but everyone complained, so a common thing now is to buy huge
    >> lots of old books, use them for training data, then destroy the hard
    >> copies.

    Why destroy the hardcopies? Just to reduce clutter?

    Why use hardcopies at all?

    It's some legal thing. I think they're reformatting books into the AI, >>>>> as opposed to copying them into the AI. People complained about having >>>>> their Internet writing used without permission, so now they're doing >>>>> this.

    I still don't know why an AI reading a bunch of books and using that as >>>> the
    basis for its knowledge is any different than a human who does the same.

    A human can simulate an AI, but an AI can't simulate a human. Or so the >>> theory goes.

    Sure, but how is that relevant to the requirements and elements in the U.S. >> Copyright statute?

    In other words, the Copyright Statute says nothing about AI or humans or
    whether one can simulate the other, so that's irrelevant when determining >> whether the law is violated.

    I think copyright in general is a fuzzy concept, but...

    The distinction might be that the human reads the original material, understands it, and then re-renders it *from his understanding*
    ...whereas I think it can be sensibly argued that the AI never has any
    such understanding, but rather is operating akin to a "Chinese room".

    ( https://en.wikipedia.org/wiki/Chinese_room )

    Again, understanding the material is not an element of copyright law.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From BTR1701@atropos@mac.com to rec.arts.tv on Mon Jun 29 19:59:37 2026
    From Newsgroup: rec.arts.tv

    On Jun 29, 2026 at 11:42:32 AM PDT, "The True Melissa" <thetruemelissa@gmail.com> wrote:

    Verily, in article <111ud2c$hq3q$1@dont-email.me>, did atropos@mac.com deliver unto us this message:
    That may be true, but that isn't a consideration when deciding whether a
    copyright violation has occurred. The Copyright Statute doesn't contemplate >> the cost of labor.


    I don't think there's any real difference there. If I read an answer,
    and you later ask me a question and receive that answer, I haven't
    violated a copyright.

    Exactly. But that's what these book authors and publishers are claiming is a violation when an AI does the same thing.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From The True Melissa@thetruemelissa@gmail.com to rec.arts.tv on Mon Jun 29 16:20:21 2026
    From Newsgroup: rec.arts.tv

    Verily, in article <111uir9$jj1k$2@dont-email.me>, did atropos@mac.com
    deliver unto us this message:

    On Jun 29, 2026 at 11:42:32 AM PDT, "The True Melissa" <thetruemelissa@gmail.com> wrote:

    Verily, in article <111ud2c$hq3q$1@dont-email.me>, did atropos@mac.com deliver unto us this message:
    That may be true, but that isn't a consideration when deciding whether a >> copyright violation has occurred. The Copyright Statute doesn't contemplate
    the cost of labor.


    I don't think there's any real difference there. If I read an answer,
    and you later ask me a question and receive that answer, I haven't violated a copyright.

    Exactly. But that's what these book authors and publishers are claiming is a violation when an AI does the same thing.

    I've only heard people complaining about Internet writing being used,
    not authors of books.
    --
    The True Melissa - Canal Winchester - Ohio
    United States of America - North America - Earth
    Solar System - Milky Way - Local Group
    Virgo Cluster - Laniakea Supercluster - Cosmos
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From moviePig@nobody@nowhere.com to rec.arts.tv on Mon Jun 29 16:39:28 2026
    From Newsgroup: rec.arts.tv

    On 6/29/2026 3:58 PM, BTR1701 wrote:
    On Jun 29, 2026 at 11:25:12 AM PDT, "moviePig" <nobody@nowhere.com> wrote:

    On 6/29/2026 12:52 PM, BTR1701 wrote:
    On Jun 29, 2026 at 9:42:29 AM PDT, "moviePig" <nobody@nowhere.com> wrote: >>>
    On 6/29/2026 12:12 PM, BTR1701 wrote:
    On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
    <thetruemelissa@gmail.com> wrote:

    Verily, in article <166032159.804394024.173595.anim8rfsk-
    cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this
    message:
    Rhino <no_offline_contact@example.com> wrote:
    > [quoted text muted]
    >>
    >> The companies do their own curation. They used internet >>>>>>>>> conversations at
    >> first, but everyone complained, so a common thing now is to buy huge
    >> lots of old books, use them for training data, then destroy the hard
    >> copies.

    Why destroy the hardcopies? Just to reduce clutter?

    Why use hardcopies at all?

    It's some legal thing. I think they're reformatting books into the AI,
    as opposed to copying them into the AI. People complained about having
    their Internet writing used without permission, so now they're doing >>>>>> this.

    I still don't know why an AI reading a bunch of books and using that as
    the
    basis for its knowledge is any different than a human who does the same.

    A human can simulate an AI, but an AI can't simulate a human. Or so the >>>> theory goes.

    Sure, but how is that relevant to the requirements and elements in the U.S.
    Copyright statute?

    In other words, the Copyright Statute says nothing about AI or humans or >>> whether one can simulate the other, so that's irrelevant when determining >>> whether the law is violated.

    I think copyright in general is a fuzzy concept, but...

    The distinction might be that the human reads the original material,
    understands it, and then re-renders it *from his understanding*
    ...whereas I think it can be sensibly argued that the AI never has any
    such understanding, but rather is operating akin to a "Chinese room".

    ( https://en.wikipedia.org/wiki/Chinese_room )

    Again, understanding the material is not an element of copyright law.

    Your question was:

    "...why an AI reading a bunch of books and using that as the basis
    for its knowledge is any different than a human who does the same."

    I gave a possible basis for a difference. Moreover, I think it's
    germane to the broader debate that's taking place.

    Copyright seems to be yet another area of law where binary thresholds of infraction are invariably challenged by a continuum of real-world examples.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From anim8rfsk@anim8rfsk@cox.net to rec.arts.tv on Mon Jun 29 14:26:42 2026
    From Newsgroup: rec.arts.tv

    moviePig <nobody@nowhere.com> wrote:
    On 6/29/2026 1:28 PM, anim8rfsk wrote:
    The True Melissa <thetruemelissa@gmail.com> wrote:
    Verily, in article <111sf14$3vand$1@dont-email.me>, did
    no_offline_contact@example.com deliver unto us this message:
    This insistence on guessing really got my goat because in nearly every >>>> case, they very confidently guessed incorrectly. When I actually showed >>>> it the code, it realized it was wrong in its guess - but that didn't
    discourage it from guessing, even a little bit. It took me hammering
    away at it with the insistence that it stop guessing before it stopped >>>> (well, largely stopped: it would still guess on the odd occasion).

    I can certainly understand why the training encouraged it to guess
    though. I wouldn't have much patience with something that asked a
    million questions before it answered.

    I recently did a grueling deep dive into ChatGPT's image generation. It
    doesn't make the images itself; it passes a prompt to Dall-E, the image
    engine, and then passes that result back to the user. The prompt passed
    to the image engine often bears little resemblance to the one I gave.

    It's determined to "enhance" the prompt by adding a bunch of crap I
    didn't ask for, things it thinks will make the picture better. Its ideas >>> of what will improve the picture usually don't match my own.

    These things aren't really tools for creators. They're toys for the
    uncreative. The image engine is designed around prompts like "Make my
    cat be a superhero," for people who won't even notice it's not the same
    cat as long as it's similar.

    Now you know how I feel about anime.

    ...

    I don't. Does 'anime' imply more than just Japanese cartooning?


    ItrCOs a method of producing animation with templates and no skill sort of
    like the old Filmation cartoons.
    --
    The last thing I want to do is hurt you, but it is still on my list.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Rhino@no_offline_contact@example.com to rec.arts.tv on Mon Jun 29 17:36:05 2026
    From Newsgroup: rec.arts.tv

    On 2026-06-29 12:49 p.m., BTR1701 wrote:
    On Jun 29, 2026 at 9:41:17 AM PDT, "Rhino" <no_offline_contact@example.com> wrote:

    On 2026-06-29 12:12 p.m., BTR1701 wrote:
    On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
    <thetruemelissa@gmail.com> wrote:

    Verily, in article <166032159.804394024.173595.anim8rfsk-
    cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this >>>> message:
    Rhino <no_offline_contact@example.com> wrote:
    > [quoted text muted]
    >>
    >> The companies do their own curation. They used internet conversations at
    >> first, but everyone complained, so a common thing now is to buy huge
    >> lots of old books, use them for training data, then destroy the hard
    >> copies.

    Why destroy the hardcopies? Just to reduce clutter?

    Why use hardcopies at all?

    It's some legal thing. I think they're reformatting books into the AI, >>>> as opposed to copying them into the AI. People complained about having >>>> their Internet writing used without permission, so now they're doing >>>> this.

    I still don't know why an AI reading a bunch of books and using that as the
    basis for its knowledge is any different than a human who does the same. >>>

    I think the key thing was that they are OLD books. I assume that means
    they are too old to get copyright protections so the authors or their
    estates can't object to the use of the books or demand royalties the way
    the author (or estate) of a copyright-protected book can.

    Yes, but even new books that are copyrighted can be read by a person and 'saved' in their memory and become part of their knowledge base and life experience, and be recalled and drawn upon and synthesized with other knowledge when interacting with other people in the future.

    What's the difference between a human doing that and an AI model doing it? Why
    does one violate copyright but not the other?
    This is a question that has confounded philosophers since the dawn of
    time....

    In other words, darned if I know!
    Mind you, I'm not sure why they'd want such old material unless it is
    just for language training; any tech or recent history is going to be
    missed entirely but if the AI is being trained on a language and how it
    works, reading old books will be as useful as modern books and be
    cheaper too. (I've bought a variety of "classic" books that are
    out-of-copyright and they're significantly cheaper as a result.) Of
    course old books will miss newer terms and may use expressions that are
    out-of-date so they won't be perfect either.



    --
    Rhino
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From moviePig@nobody@nowhere.com to rec.arts.tv on Mon Jun 29 17:49:52 2026
    From Newsgroup: rec.arts.tv

    On 6/29/2026 5:26 PM, anim8rfsk wrote:
    moviePig <nobody@nowhere.com> wrote:
    On 6/29/2026 1:28 PM, anim8rfsk wrote:
    The True Melissa <thetruemelissa@gmail.com> wrote:
    Verily, in article <111sf14$3vand$1@dont-email.me>, did
    no_offline_contact@example.com deliver unto us this message:
    This insistence on guessing really got my goat because in nearly every >>>>> case, they very confidently guessed incorrectly. When I actually showed >>>>> it the code, it realized it was wrong in its guess - but that didn't >>>>> discourage it from guessing, even a little bit. It took me hammering >>>>> away at it with the insistence that it stop guessing before it stopped >>>>> (well, largely stopped: it would still guess on the odd occasion).

    I can certainly understand why the training encouraged it to guess
    though. I wouldn't have much patience with something that asked a
    million questions before it answered.

    I recently did a grueling deep dive into ChatGPT's image generation. It >>>> doesn't make the images itself; it passes a prompt to Dall-E, the image >>>> engine, and then passes that result back to the user. The prompt passed >>>> to the image engine often bears little resemblance to the one I gave.

    It's determined to "enhance" the prompt by adding a bunch of crap I
    didn't ask for, things it thinks will make the picture better. Its ideas >>>> of what will improve the picture usually don't match my own.

    These things aren't really tools for creators. They're toys for the
    uncreative. The image engine is designed around prompts like "Make my
    cat be a superhero," for people who won't even notice it's not the same >>>> cat as long as it's similar.

    Now you know how I feel about anime.

    ...

    I don't. Does 'anime' imply more than just Japanese cartooning?


    ItrCOs a method of producing animation with templates and no skill sort of like the old Filmation cartoons.

    Well, it it's any consolation, that does sound like a job squarely in
    the AI crosshairs.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From BTR1701@atropos@mac.com to rec.arts.tv on Mon Jun 29 22:55:02 2026
    From Newsgroup: rec.arts.tv

    On Jun 29, 2026 at 1:39:28 PM PDT, "moviePig" <nobody@nowhere.com> wrote:

    On 6/29/2026 3:58 PM, BTR1701 wrote:
    On Jun 29, 2026 at 11:25:12 AM PDT, "moviePig" <nobody@nowhere.com> wrote: >>
    On 6/29/2026 12:52 PM, BTR1701 wrote:
    On Jun 29, 2026 at 9:42:29 AM PDT, "moviePig" <nobody@nowhere.com> wrote:

    On 6/29/2026 12:12 PM, BTR1701 wrote:
    On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
    <thetruemelissa@gmail.com> wrote:

    Verily, in article <166032159.804394024.173595.anim8rfsk-
    cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this
    message:
    Rhino <no_offline_contact@example.com> wrote:
    > [quoted text muted]
    >>
    >> The companies do their own curation. They used internet >>>>>>>>>> conversations at
    >> first, but everyone complained, so a common thing now is to buy huge
    >> lots of old books, use them for training data, then destroy the hard
    >> copies.

    Why destroy the hardcopies? Just to reduce clutter?

    Why use hardcopies at all?

    It's some legal thing. I think they're reformatting books into the AI,
    as opposed to copying them into the AI. People complained about having
    their Internet writing used without permission, so now they're doing
    this.

    I still don't know why an AI reading a bunch of books and using that as
    the
    basis for its knowledge is any different than a human who does the same.

    A human can simulate an AI, but an AI can't simulate a human. Or so the
    theory goes.

    Sure, but how is that relevant to the requirements and elements in the >>>> U.S.
    Copyright statute?

    In other words, the Copyright Statute says nothing about AI or humans or
    whether one can simulate the other, so that's irrelevant when determining
    whether the law is violated.

    I think copyright in general is a fuzzy concept, but...

    The distinction might be that the human reads the original material,
    understands it, and then re-renders it *from his understanding*
    ...whereas I think it can be sensibly argued that the AI never has any
    such understanding, but rather is operating akin to a "Chinese room".

    ( https://en.wikipedia.org/wiki/Chinese_room )

    Again, understanding the material is not an element of copyright law.

    Your question was:

    "...why an AI reading a bunch of books and using that as the basis
    for its knowledge is any different than a human who does the same."

    Different in the context of authors and publishers complaining about copyright infringement of their work.

    I'd have thought that obvious, given Melissa's assertion of legal claims in
    the previous post, but apparently not.

    I gave a possible basis for a difference. Moreover, I think it's
    germane to the broader debate that's taking place.

    Copyright seems to be yet another area of law where binary thresholds of infraction are invariably challenged by a continuum of real-world examples.

    Well, as the law stands now-- both statute and common-- an understanding of
    the material that's the basis of a copyright claim is not necessary.

    The Feynman Lectures on Quantum Mechanics were recently published in a box
    set:

    https://ibb.co/spP27BPV

    If I were to photocopy them and start selling the copies, I'd be in violation of copyright regardless of whether I understood the material or not. (Which I most assuredly do not.) The court would not even ask about my understanding of Dr. Feynman's material. It's simply not a factor with regard to copyright.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From BTR1701@atropos@mac.com to rec.arts.tv on Mon Jun 29 22:55:40 2026
    From Newsgroup: rec.arts.tv

    On Jun 29, 2026 at 1:20:21 PM PDT, "The True Melissa" <thetruemelissa@gmail.com> wrote:

    Verily, in article <111uir9$jj1k$2@dont-email.me>, did atropos@mac.com deliver unto us this message:

    On Jun 29, 2026 at 11:42:32 AM PDT, "The True Melissa"
    <thetruemelissa@gmail.com> wrote:

    Verily, in article <111ud2c$hq3q$1@dont-email.me>, did atropos@mac.com >> > deliver unto us this message:
    That may be true, but that isn't a consideration when deciding whether a
    copyright violation has occurred. The Copyright Statute doesn't
    contemplate
    the cost of labor.


    I don't think there's any real difference there. If I read an answer,
    and you later ask me a question and receive that answer, I haven't
    violated a copyright.

    Exactly. But that's what these book authors and publishers are claiming is a
    violation when an AI does the same thing.

    I've only heard people complaining about Internet writing being used,
    not authors of books.

    The legal principles are the same regardless of the medium.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From moviePig@nobody@nowhere.com to rec.arts.tv on Tue Jun 30 11:12:31 2026
    From Newsgroup: rec.arts.tv

    On 6/29/2026 6:55 PM, BTR1701 wrote:
    On Jun 29, 2026 at 1:39:28 PM PDT, "moviePig" <nobody@nowhere.com> wrote:

    On 6/29/2026 3:58 PM, BTR1701 wrote:
    On Jun 29, 2026 at 11:25:12 AM PDT, "moviePig" <nobody@nowhere.com> wrote:

    On 6/29/2026 12:52 PM, BTR1701 wrote:
    On Jun 29, 2026 at 9:42:29 AM PDT, "moviePig" <nobody@nowhere.com> wrote:

    On 6/29/2026 12:12 PM, BTR1701 wrote:
    On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
    <thetruemelissa@gmail.com> wrote:

    Verily, in article <166032159.804394024.173595.anim8rfsk- >>>>>>>> cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this
    message:
    Rhino <no_offline_contact@example.com> wrote:
    > [quoted text muted]
    >>
    >> The companies do their own curation. They used internet >>>>>>>>>>> conversations at
    >> first, but everyone complained, so a common thing now is to buy huge
    >> lots of old books, use them for training data, then destroy the hard
    >> copies.

    Why destroy the hardcopies? Just to reduce clutter?

    Why use hardcopies at all?

    It's some legal thing. I think they're reformatting books into the AI,
    as opposed to copying them into the AI. People complained about having
    their Internet writing used without permission, so now they're doing
    this.

    I still don't know why an AI reading a bunch of books and using that as
    the
    basis for its knowledge is any different than a human who does the same.

    A human can simulate an AI, but an AI can't simulate a human. Or so the
    theory goes.

    Sure, but how is that relevant to the requirements and elements in the
    U.S.
    Copyright statute?

    In other words, the Copyright Statute says nothing about AI or humans or
    whether one can simulate the other, so that's irrelevant when determining
    whether the law is violated.

    I think copyright in general is a fuzzy concept, but...

    The distinction might be that the human reads the original material, >>>> understands it, and then re-renders it *from his understanding*
    ...whereas I think it can be sensibly argued that the AI never has any >>>> such understanding, but rather is operating akin to a "Chinese room". >>>>
    ( https://en.wikipedia.org/wiki/Chinese_room )

    Again, understanding the material is not an element of copyright law.

    Your question was:

    "...why an AI reading a bunch of books and using that as the basis
    for its knowledge is any different than a human who does the same."

    Different in the context of authors and publishers complaining about copyright
    infringement of their work.

    I'd have thought that obvious, given Melissa's assertion of legal claims in the previous post, but apparently not.

    I gave a possible basis for a difference. Moreover, I think it's
    germane to the broader debate that's taking place.

    Copyright seems to be yet another area of law where binary thresholds of
    infraction are invariably challenged by a continuum of real-world examples.

    Well, as the law stands now-- both statute and common-- an understanding of the material that's the basis of a copyright claim is not necessary.

    The Feynman Lectures on Quantum Mechanics were recently published in a box set:

    https://ibb.co/spP27BPV

    If I were to photocopy them and start selling the copies, I'd be in violation of copyright regardless of whether I understood the material or not. (Which I most assuredly do not.) The court would not even ask about my understanding of
    Dr. Feynman's material. It's simply not a factor with regard to copyright.

    If, instead, you were to read the Lectures and then, later, penned your
    own presentation of their substance, you would (I assume) *not* be in violation ...because the court would (implicitly) infer that your
    version comprised a retelling rather than a transcription. But, if it
    could be shown, by inspection, that you'd merely subjected Feynman's
    text to a word-by-word substitution of synonyms, then you *would* (I
    assume again) be in violation. And, afaics, your 'understanding' of the material is a telltale discriminant, even if not mentioned in statutes.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From BTR1701@atropos@mac.com to rec.arts.tv on Tue Jun 30 16:29:57 2026
    From Newsgroup: rec.arts.tv

    On Jun 30, 2026 at 8:12:31 AM PDT, "moviePig" <nobody@nowhere.com> wrote:

    On 6/29/2026 6:55 PM, BTR1701 wrote:
    On Jun 29, 2026 at 1:39:28 PM PDT, "moviePig" <nobody@nowhere.com> wrote: >>
    On 6/29/2026 3:58 PM, BTR1701 wrote:
    On Jun 29, 2026 at 11:25:12 AM PDT, "moviePig" <nobody@nowhere.com> wrote:

    On 6/29/2026 12:52 PM, BTR1701 wrote:
    On Jun 29, 2026 at 9:42:29 AM PDT, "moviePig" <nobody@nowhere.com> >>>>>> wrote:

    On 6/29/2026 12:12 PM, BTR1701 wrote:
    On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
    <thetruemelissa@gmail.com> wrote:

    Verily, in article <166032159.804394024.173595.anim8rfsk- >>>>>>>>> cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this
    message:
    Rhino <no_offline_contact@example.com> wrote:
    > [quoted text muted]
    >>
    >> The companies do their own curation. They used internet >>>>>>>>>>>> conversations at
    >> first, but everyone complained, so a common thing now is >>>>>>>>>>>> to buy huge
    >> lots of old books, use them for training data, then >>>>>>>>>>>> destroy the hard
    >> copies.

    Why destroy the hardcopies? Just to reduce clutter? >>>>>>>>>>
    Why use hardcopies at all?

    It's some legal thing. I think they're reformatting books into the AI,
    as opposed to copying them into the AI. People complained about having
    their Internet writing used without permission, so now they're doing
    this.

    I still don't know why an AI reading a bunch of books and using >>>>>>>> that as
    the
    basis for its knowledge is any different than a human who does >>>>>>>> the same.

    A human can simulate an AI, but an AI can't simulate a human. Or so the
    theory goes.

    Sure, but how is that relevant to the requirements and elements in the
    U.S.
    Copyright statute?

    In other words, the Copyright Statute says nothing about AI or humans or
    whether one can simulate the other, so that's irrelevant when >>>>>> determining
    whether the law is violated.

    I think copyright in general is a fuzzy concept, but...

    The distinction might be that the human reads the original material, >>>>> understands it, and then re-renders it *from his understanding*
    ...whereas I think it can be sensibly argued that the AI never has any >>>>> such understanding, but rather is operating akin to a "Chinese room". >>>>>
    ( https://en.wikipedia.org/wiki/Chinese_room )

    Again, understanding the material is not an element of copyright law. >>>
    Your question was:

    "...why an AI reading a bunch of books and using that as the basis >>> for its knowledge is any different than a human who does the same."

    Different in the context of authors and publishers complaining about
    copyright
    infringement of their work.

    I'd have thought that obvious, given Melissa's assertion of legal claims in >> the previous post, but apparently not.

    I gave a possible basis for a difference. Moreover, I think it's
    germane to the broader debate that's taking place.

    Copyright seems to be yet another area of law where binary thresholds of >>> infraction are invariably challenged by a continuum of real-world examples.

    Well, as the law stands now-- both statute and common-- an understanding of >> the material that's the basis of a copyright claim is not necessary.

    The Feynman Lectures on Quantum Mechanics were recently published in a box >> set:

    https://ibb.co/spP27BPV

    If I were to photocopy them and start selling the copies, I'd be in
    violation
    of copyright regardless of whether I understood the material or not. (Which >> I
    most assuredly do not.) The court would not even ask about my understanding >> of
    Dr. Feynman's material. It's simply not a factor with regard to copyright.

    If, instead, you were to read the Lectures and then, later, penned your
    own presentation of their substance, you would (I assume) *not* be in violation ...because the court would (implicitly) infer that your
    version comprised a retelling rather than a transcription. But, if it
    could be shown, by inspection, that you'd merely subjected Feynman's
    text to a word-by-word substitution of synonyms, then you *would* (I
    assume again) be in violation. And, afaics, your 'understanding' of the material is a telltale discriminant, even if not mentioned in statutes.

    (1) Not necessarily.

    (2) No one here, artificial or natural, is doing that.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From moviePig@nobody@nowhere.com to rec.arts.tv on Tue Jun 30 14:04:18 2026
    From Newsgroup: rec.arts.tv

    On 6/30/2026 12:29 PM, BTR1701 wrote:
    On Jun 30, 2026 at 8:12:31 AM PDT, "moviePig" <nobody@nowhere.com>
    wrote:

    On 6/29/2026 6:55 PM, BTR1701 wrote:
    On Jun 29, 2026 at 1:39:28 PM PDT, "moviePig"
    <nobody@nowhere.com> wrote:

    On 6/29/2026 3:58 PM, BTR1701 wrote:
    On Jun 29, 2026 at 11:25:12 AM PDT, "moviePig"
    <nobody@nowhere.com> wrote:

    On 6/29/2026 12:52 PM, BTR1701 wrote:
    On Jun 29, 2026 at 9:42:29 AM PDT, "moviePig"
    <nobody@nowhere.com> wrote:

    On 6/29/2026 12:12 PM, BTR1701 wrote:
    On Jun 29, 2026 at 3:28:37 AM PDT, "The True
    Melissa" <thetruemelissa@gmail.com> wrote:

    Verily, in article
    <166032159.804394024.173595.anim8rfsk-
    cox.net@news.easynews.com>, did anim8rfsk@cox.net
    deliver unto us this message:
    Rhino <no_offline_contact@example.com> wrote:
    [quoted text muted]

    The companies do their own curation. They
    used internet conversations at first, but
    everyone complained, so a common thing now
    is to buy huge lots of old books, use them
    for training data, then destroy the hard
    copies.

    Why destroy the hardcopies? Just to reduce
    clutter?

    Why use hardcopies at all?

    It's some legal thing. I think they're
    reformatting books into the AI, as opposed to
    copying them into the AI. People complained about
    having their Internet writing used without
    permission, so now they're doing this.

    I still don't know why an AI reading a bunch of
    books and using that as the basis for its knowledge
    is any different than a human who does the same.

    A human can simulate an AI, but an AI can't simulate a
    human. Or so the theory goes.

    Sure, but how is that relevant to the requirements and
    elements in the U.S. Copyright statute?

    In other words, the Copyright Statute says nothing about
    AI or humans or whether one can simulate the other, so
    that's irrelevant when determining whether the law is
    violated.

    I think copyright in general is a fuzzy concept, but...

    The distinction might be that the human reads the original
    material, understands it, and then re-renders it *from his
    understanding* ...whereas I think it can be sensibly
    argued that the AI never has any such understanding, but
    rather is operating akin to a "Chinese room".

    ( https://en.wikipedia.org/wiki/Chinese_room )

    Again, understanding the material is not an element of
    copyright law.

    Your question was:

    "...why an AI reading a bunch of books and using that as the
    basis for its knowledge is any different than a human who does
    the same."

    Different in the context of authors and publishers complaining
    about copyright infringement of their work.

    I'd have thought that obvious, given Melissa's assertion of
    legal claims in the previous post, but apparently not.

    I gave a possible basis for a difference. Moreover, I think
    it's germane to the broader debate that's taking place.

    Copyright seems to be yet another area of law where binary
    thresholds of infraction are invariably challenged by a
    continuum of real-world examples.

    Well, as the law stands now-- both statute and common-- an
    understanding of the material that's the basis of a copyright
    claim is not necessary.

    The Feynman Lectures on Quantum Mechanics were recently
    published in a box set:

    https://ibb.co/spP27BPV

    If I were to photocopy them and start selling the copies, I'd be
    in violation of copyright regardless of whether I understood the
    material or not. (Which I most assuredly do not.) The court
    would not even ask about my understanding of Dr. Feynman's
    material. It's simply not a factor with regard to copyright.

    If, instead, you were to read the Lectures and then, later, penned
    your own presentation of their substance, you would (I assume)
    *not* be in violation ...because the court would (implicitly)
    infer that your version comprised a retelling rather than a
    transcription. But, if it could be shown, by inspection, that
    you'd merely subjected Feynman's text to a word-by-word
    substitution of synonyms, then you *would* (I assume again) be in
    violation. And, afaics, your 'understanding' of the material is a
    telltale discriminant, even if not mentioned in statutes.

    (1) Not necessarily.

    (2) No one here, artificial or natural, is doing that.

    Is "here" your photocopying/selling? If so, I maintain that credible
    evidence of your *not* doing that would require your 'understanding'.
    (That evidence would be a plainly new version of the Lectures.)

    Although I don't readily see how to write the 'understanding' step into
    a statute, I think it an essential element of any enforcement ...e.g.,
    of anyone's claim that they *haven't violated copyright.

    --- Synchronet 3.22a-Linux NewsLink 1.2