Forum: Too Lazy BBS

Ping Melissa

From Rhino@no_offline_contact@example.com to rec.arts.tv on Sun Jun 28 11:30:13 2026

From Newsgroup: rec.arts.tv

Melissa, you've mentioned that you are involved in training LLMs so I
want to ask you about that. One of my friends has asked me what I know
about that training, specifically how the training of LLMs avoids the
old GIGO (Garbage In Garbage Out) problem: how do you make sure that the information it is reading is accurate? I admitted that I didn't know but
told him I'd reach out to you.
--
Rhino

--- Synchronet 3.22a-Linux NewsLink 1.2

From The True Melissa@thetruemelissa@gmail.com to rec.arts.tv on Sun Jun 28 12:47:45 2026

From Newsgroup: rec.arts.tv

Verily, in article <111rem7$3km09$3@dont-email.me>, did no_offline_contact@example.com deliver unto us this message:

Melissa, you've mentioned that you are involved in training LLMs so I
want to ask you about that. One of my friends has asked me what I know
about that training, specifically how the training of LLMs avoids the
old GIGO (Garbage In Garbage Out) problem: how do you make sure that the information it is reading is accurate? I admitted that I didn't know but told him I'd reach out to you.

If you mean the initial training, that's done on curated data sets. One
early LLM (Tay) was designed to learn from the conversations it had, and
in less than 24 hours 4chan had retrained it into a frothing racist
calling for the Fourth Reich, so nobody lets LLMs train on uncurated
material now.

The training I sometimes do isn't the base training but user simulation.
For this, the goal is to translated a user's incoherent gibberish into
what the user actually wants. For instance, on one recent project had me
tell Claude and Gemini to create various projects, but each prompt had
to have at least one mistake or glaring omission. The goal was for the
LLM to produce the desired result in spite of the garbage input.

I find this depressing. The desire for frictionless everything is
turning us all into helpless Wall-E characters. That's the goal, though.
--
The True Melissa - Canal Winchester - Ohio
United States of America - North America - Earth
Solar System - Milky Way - Local Group
Virgo Cluster - Laniakea Supercluster - Cosmos
--- Synchronet 3.22a-Linux NewsLink 1.2

From Rhino@no_offline_contact@example.com to rec.arts.tv on Sun Jun 28 14:57:52 2026

From Newsgroup: rec.arts.tv

On 2026-06-28 12:47 p.m., The True Melissa wrote:

Verily, in article <111rem7$3km09$3@dont-email.me>, did no_offline_contact@example.com deliver unto us this message:

Melissa, you've mentioned that you are involved in training LLMs so I
want to ask you about that. One of my friends has asked me what I know
about that training, specifically how the training of LLMs avoids the
old GIGO (Garbage In Garbage Out) problem: how do you make sure that the
information it is reading is accurate? I admitted that I didn't know but
told him I'd reach out to you.

If you mean the initial training, that's done on curated data sets. One
early LLM (Tay) was designed to learn from the conversations it had, and
in less than 24 hours 4chan had retrained it into a frothing racist
calling for the Fourth Reich, so nobody lets LLMs train on uncurated
material now.

Yeah, I heard about that.

But who is doing the curating? What are their criteria? How do they know
that they're getting truth rather than some form of propaganda or
wishful thinking?

I think we've all heard about AIs that, when asked to display pictures
of the typical Brit a thousand years ago shows a picture of a black
person, which is utterly ahistorical. I can only assume that the AI in question was trained on data that painted a vastly inaccurate picture,
like Bridgerton or Wakanda.

The thing that prompted my friend's question was that he'd read about a program called Nepenthes which was deliberately designed to be scraped
by LLMs but that would misinform them, the motivation apparently being
to undermine the credibility of LLMs and hurt the tech bros behind the AIs.

The training I sometimes do isn't the base training but user simulation.
For this, the goal is to translated a user's incoherent gibberish into
what the user actually wants. For instance, on one recent project had me
tell Claude and Gemini to create various projects, but each prompt had
to have at least one mistake or glaring omission. The goal was for the
LLM to produce the desired result in spite of the garbage input.

I can't help but wonder how you get the LLM to recognize the gibberish
bits so that they can be ignored. I write some pretty coherent prompts
so I don't think AIs have much trouble understanding me but given how
badly some people communicate - horrible spelling, no punctuation,
garbled grammar - I can only imagine how hard it is for Claude, Gemini,
et. al. to even understand what they are being asked.

I find this depressing. The desire for frictionless everything is
turning us all into helpless Wall-E characters. That's the goal, though.

That seems to be a driving force in the current world....
--
Rhino
--- Synchronet 3.22a-Linux NewsLink 1.2

From The True Melissa@thetruemelissa@gmail.com to rec.arts.tv on Sun Jun 28 15:53:11 2026

From Newsgroup: rec.arts.tv

Verily, in article <111rqrh$3oiin$2@dont-email.me>, did no_offline_contact@example.com deliver unto us this message:

Yeah, I heard about that.

But who is doing the curating? What are their criteria? How do they know that they're getting truth rather than some form of propaganda or
wishful thinking?

The companies do their own curation. They used internet conversations at first, but everyone complained, so a common thing now is to buy huge
lots of old books, use them for training data, then destroy the hard
copies.

I think we've all heard about AIs that, when asked to display pictures
of the typical Brit a thousand years ago shows a picture of a black
person, which is utterly ahistorical. I can only assume that the AI in question was trained on data that painted a vastly inaccurate picture,
like Bridgerton or Wakanda.

No, I hadn't heard that one. I just tried "Draw a portrait of a typical
Briton from a thousand years ago" on two models, and both of them drew
white men with brown hair in period-appropriate clothing.

When you hear of really crazy results like the ones you describe,
there's often some other factor. I remember people complaining that
searching for "American authors" on Google brought up almost entirely
black people, but the problem wasn't bad information architecture. The
problem was the term "African-American," used of black authors,
associating them more strongly with the "American" token.

The thing that prompted my friend's question was that he'd read about a program called Nepenthes which was deliberately designed to be scraped
by LLMs but that would misinform them, the motivation apparently being
to undermine the credibility of LLMs and hurt the tech bros behind the AIs.

I don't believe the scraping will affect the underlying model. Nothing
would go into the initial training data without company review.

Your friend may be talking about injecting misinformation into chains of
AIs scraping from each other. That's absolutely possible. There are AI
YouTube channels all presenting different takes on something that didn't happen, because some Clever Hans seeded into the comments of a few and
they concluded it must be real.

I looked up Nepenthes, and its buzz says that it's more about trapping
the AI agents by sending them on endless wild goose chases. It sets up circular chains of links, large enough that the AI won't catch on and
will just keep following the loop forever. This makes it harder to
scrape the site's data, but not impossible. The developer admits it
doesn't work on ChatGPT, and I'm betting it doesn't work on Fable or the latest Opus, either.

It's an arms race. I wish Nepenthes and the others good luck.

I can't help but wonder how you get the LLM to recognize the gibberish
bits so that they can be ignored. I write some pretty coherent prompts
so I don't think AIs have much trouble understanding me but given how
badly some people communicate - horrible spelling, no punctuation,
garbled grammar - I can only imagine how hard it is for Claude, Gemini,
et. al. to even understand what they are being asked.

They get people like me to enter bad prompts for particular desired
results. If the LLM generates the desired results for my bad prompt,
that's a success. If it guesses something other than the result the test desires, or asks for clarification instead of assuming, that's a failure
on the model's part.

Your first thought may be that you'd prefer the model to ask for clarification. The problem is that there's so much of it, and the model
has no underlying common sense to determine which things matter. They
used to drive everyone crazy asking ten million questions before doing anything, and users started telling it to "make reasonable assumptions"
(or yell at them to just effing do it). Now, the LLM needs to know a lot
more reasonable assumptions, since otherwise its nature will lead it to hallucinate something.

I find this depressing. The desire for frictionless everything is
turning us all into helpless Wall-E characters. That's the goal, though.

That seems to be a driving force in the current world....

I'm becoming more and more Luddite as I see the effects of technology.
All that so-called friction made us better people.
--
The True Melissa - Canal Winchester - Ohio
United States of America - North America - Earth
Solar System - Milky Way - Local Group
Virgo Cluster - Laniakea Supercluster - Cosmos
--- Synchronet 3.22a-Linux NewsLink 1.2

From BTR1701@atropos@mac.com to rec.arts.tv on Sun Jun 28 20:21:50 2026

From Newsgroup: rec.arts.tv

On Jun 28, 2026 at 12:53:11 PM PDT, "The True Melissa" <thetruemelissa@gmail.com> wrote:

Verily, in article <111rqrh$3oiin$2@dont-email.me>, did no_offline_contact@example.com deliver unto us this message:

I think we've all heard about AIs that, when asked to display pictures
of the typical Brit a thousand years ago shows a picture of a black
person, which is utterly ahistorical. I can only assume that the AI in
question was trained on data that painted a vastly inaccurate picture,
like Bridgerton or Wakanda.

No, I hadn't heard that one. I just tried "Draw a portrait of a typical Briton from a thousand years ago" on two models, and both of them drew
white men with brown hair in period-appropriate clothing.

When you hear of really crazy results like the ones you describe,
there's often some other factor. I remember people complaining that searching for "American authors" on Google brought up almost entirely
black people, but the problem wasn't bad information architecture. The problem was the term "African-American," used of black authors,
associating them more strongly with the "American" token.

The problem was deeper than that.

https://humanities.org.au/power-of-the-humanities/black-nazis-asian-vikings-and-other-problems-with-generative-ai/

Only three weeks after its introduction, Google decided to suspend the image generation features of its newest generative AI model Gemini over accusations it contained an 'anti-white bias'.

The move follows a series of viral posts by X (formerly Twitter) users who
were outraged that prompts used to generate images of AmericarCOs founding fathers, Vikings, the Pope, and 1943 German soldiers (with the intention to generate images of Nazis) returned images of almost exclusively Black, Asian, First Nations, and other racially diverse people.

--- Synchronet 3.22a-Linux NewsLink 1.2

From The True Melissa@thetruemelissa@gmail.com to rec.arts.tv on Sun Jun 28 16:53:01 2026

From Newsgroup: rec.arts.tv

Verily, in article <111rvou$3rmoo$3@dont-email.me>, did atropos@mac.com deliver unto us this message:

The problem was deeper than that.

https://humanities.org.au/power-of-the-humanities/black-nazis-asian-vikings-and-other-problems-with-generative-ai/

Only three weeks after its introduction, Google decided to suspend the image generation features of its newest generative AI model Gemini over accusations it contained an 'anti-white bias'.

I was talking about Google's actual search engine. The flap was a few
years ago, not part of the AI wars.

The prompt modification described in the article does happen in AIs
today, though. What ChatGPT sends to Dall-E may not be what I requested.

The move follows a series of viral posts by X (formerly Twitter) users who were outraged that prompts used to generate images of America?s founding fathers, Vikings, the Pope, and 1943 German soldiers (with the intention to generate images of Nazis) returned images of almost exclusively Black, Asian, First Nations, and other racially diverse people.

--
The True Melissa - Canal Winchester - Ohio
United States of America - North America - Earth
Solar System - Milky Way - Local Group
Virgo Cluster - Laniakea Supercluster - Cosmos
--- Synchronet 3.22a-Linux NewsLink 1.2

From Rhino@no_offline_contact@example.com to rec.arts.tv on Sun Jun 28 20:42:12 2026

From Newsgroup: rec.arts.tv

On 2026-06-28 3:53 p.m., The True Melissa wrote:

Verily, in article <111rqrh$3oiin$2@dont-email.me>, did no_offline_contact@example.com deliver unto us this message:

Yeah, I heard about that.

But who is doing the curating? What are their criteria? How do they know
that they're getting truth rather than some form of propaganda or
wishful thinking?

The companies do their own curation. They used internet conversations at first, but everyone complained, so a common thing now is to buy huge
lots of old books, use them for training data, then destroy the hard
copies.

Even old books contain errors, more likely errors made from ignorance
rather than errors meant to mislead, but errors nonetheless.>

I think we've all heard about AIs that, when asked to display pictures
of the typical Brit a thousand years ago shows a picture of a black
person, which is utterly ahistorical. I can only assume that the AI in
question was trained on data that painted a vastly inaccurate picture,
like Bridgerton or Wakanda.

No, I hadn't heard that one. I just tried "Draw a portrait of a typical Briton from a thousand years ago" on two models, and both of them drew
white men with brown hair in period-appropriate clothing.

I actually tried that question several months back when I first started
using AIs and it always got bogged down for some reason: "unusually
heavy activity" or words to that effect. Despite multiple tries over
multiple days, it never did produce ANY picture, not just an inaccurate
one. This was already a few months after word of the picture being of a
black person, which did cause a bit of a stir at the time. I wonder if
the controversy reached the developers who then tweaked the model (or
the training data) to keep that from happening....

When you hear of really crazy results like the ones you describe,
there's often some other factor. I remember people complaining that
searching for "American authors" on Google brought up almost entirely
black people, but the problem wasn't bad information architecture. The problem was the term "African-American," used of black authors,
associating them more strongly with the "American" token.

The thing that prompted my friend's question was that he'd read about a
program called Nepenthes which was deliberately designed to be scraped
by LLMs but that would misinform them, the motivation apparently being
to undermine the credibility of LLMs and hurt the tech bros behind the AIs.

I don't believe the scraping will affect the underlying model. Nothing
would go into the initial training data without company review.

Your friend may be talking about injecting misinformation into chains of
AIs scraping from each other. That's absolutely possible. There are AI YouTube channels all presenting different takes on something that didn't happen, because some Clever Hans seeded into the comments of a few and
they concluded it must be real.

I looked up Nepenthes, and its buzz says that it's more about trapping
the AI agents by sending them on endless wild goose chases. It sets up circular chains of links, large enough that the AI won't catch on and
will just keep following the loop forever. This makes it harder to
scrape the site's data, but not impossible. The developer admits it
doesn't work on ChatGPT, and I'm betting it doesn't work on Fable or the latest Opus, either.

It's an arms race. I wish Nepenthes and the others good luck.

I can't help but wonder how you get the LLM to recognize the gibberish
bits so that they can be ignored. I write some pretty coherent prompts
so I don't think AIs have much trouble understanding me but given how
badly some people communicate - horrible spelling, no punctuation,
garbled grammar - I can only imagine how hard it is for Claude, Gemini,
et. al. to even understand what they are being asked.

They get people like me to enter bad prompts for particular desired
results. If the LLM generates the desired results for my bad prompt,
that's a success. If it guesses something other than the result the test desires, or asks for clarification instead of assuming, that's a failure
on the model's part.

Your first thought may be that you'd prefer the model to ask for clarification. The problem is that there's so much of it, and the model
has no underlying common sense to determine which things matter. They
used to drive everyone crazy asking ten million questions before doing anything, and users started telling it to "make reasonable assumptions"
(or yell at them to just effing do it). Now, the LLM needs to know a lot
more reasonable assumptions, since otherwise its nature will lead it to hallucinate something.

Interesting. One of my most frustrating experiences with AIs in the
early days, particularly Gemini, was that they would assume without
attempting to verify the assumptions. I was building an app and would encounter an error with which they had *some* familiarity; they would
GUESS that it was the part of Module A that did such-and-such but Module
A didn't *do* such-and-such - and sometimes there was no Module A! I
don't know how many times I admonished, begged, insisted and demanded
that it STOP GUESSING and simply ask me to let it see the contents of
any modules that it needed to figure out the problem. I finally
determined that they *did* have a way to impress something on them that
would be remembered across chat sessions, specifically the admonition to
ask for the code rather than guess. (I had previously been of the understanding that they COULDN'T remember anything but their training
data from one session to another.)

This insistence on guessing really got my goat because in nearly every
case, they very confidently guessed incorrectly. When I actually showed
it the code, it realized it was wrong in its guess - but that didn't discourage it from guessing, even a little bit. It took me hammering
away at it with the insistence that it stop guessing before it stopped
(well, largely stopped: it would still guess on the odd occasion).

I can certainly understand why the training encouraged it to guess
though. I wouldn't have much patience with something that asked a
million questions before it answered.

I find this depressing. The desire for frictionless everything is
turning us all into helpless Wall-E characters. That's the goal, though. >>>

That seems to be a driving force in the current world....

I'm becoming more and more Luddite as I see the effects of technology.
All that so-called friction made us better people.

I agree. It's not good for everything to be handed to us on a silver
plate. Having to make at least a lit bit of an effort is good for us at
some level.

When I was driving school buses I used to think about the roads I was
driving on and how much effort it must have taken the first pioneers to
make a road, even a very rough road for even a short distance. In
Canada, farmers who took up land were required to build a road along the entire frontage of their property and farms were typically 200 acres,
thought I can't say what the dimensions of the properties were. Judging
by the farms that still exist though, it would have been some massively back-breaking labour to make a suitably wide road for that distance
through what had only recently been trackless wilderness. Ontario was originally heavily forested so the number of trees that would have had
to be felled and the number of stumps and boulders that would have to be dragged out (or blown up if there was dynamite at hand) would be very substantial. Then he'd have to try to level what was left at least so a
horse and carriage could traverse the road. Even when they were
finished, they wouldn't be very good; travellers talked of "corduroy
roads" where the quality of the road could change dramatically from one
farm to the next. Standards were either minimal or not enforced or both. Nowadays, a government hires a company with bulldozers and paving
machines and a small crew of people and they can build a paved road with
lane markings in relatively minimal time so we forget how hard it was
for our ancestors. We think we've got it rough if we have to fire up a snowblower to clear up the sidewalk in front of our house and the
driveway for a MUCH smaller lot than any farm.

My mother grew up in a little village in Europe where they didn't get
snow that often - tending to get a lot more rain and freezing rain - but
one time they got a massive dump of snow and there were no snow ploughs anywhere in the area. The entire village came out and shovelled the main
drag BY HAND.
--
Rhino
--- Synchronet 3.22a-Linux NewsLink 1.2

From anim8rfsk@anim8rfsk@cox.net to rec.arts.tv on Sun Jun 28 20:00:19 2026

From Newsgroup: rec.arts.tv

Rhino <no_offline_contact@example.com> wrote:

On 2026-06-28 3:53 p.m., The True Melissa wrote:

Verily, in article <111rqrh$3oiin$2@dont-email.me>, did
no_offline_contact@example.com deliver unto us this message:

Yeah, I heard about that.

But who is doing the curating? What are their criteria? How do they know >>> that they're getting truth rather than some form of propaganda or
wishful thinking?

The companies do their own curation. They used internet conversations at
first, but everyone complained, so a common thing now is to buy huge
lots of old books, use them for training data, then destroy the hard
copies.

Why destroy the hardcopies? Just to reduce clutter?

Why use hardcopies at all?

Even old books contain errors, more likely errors made from ignorance
rather than errors meant to mislead, but errors nonetheless.>

I think we've all heard about AIs that, when asked to display pictures
of the typical Brit a thousand years ago shows a picture of a black
person, which is utterly ahistorical. I can only assume that the AI in
question was trained on data that painted a vastly inaccurate picture,
like Bridgerton or Wakanda.

No, I hadn't heard that one. I just tried "Draw a portrait of a typical
Briton from a thousand years ago" on two models, and both of them drew
white men with brown hair in period-appropriate clothing.

I actually tried that question several months back when I first started using AIs and it always got bogged down for some reason: "unusually
heavy activity" or words to that effect. Despite multiple tries over multiple days, it never did produce ANY picture, not just an inaccurate
one. This was already a few months after word of the picture being of a black person, which did cause a bit of a stir at the time. I wonder if
the controversy reached the developers who then tweaked the model (or
the training data) to keep that from happening....

When you hear of really crazy results like the ones you describe,
there's often some other factor. I remember people complaining that
searching for "American authors" on Google brought up almost entirely
black people, but the problem wasn't bad information architecture. The
problem was the term "African-American," used of black authors,
associating them more strongly with the "American" token.

The thing that prompted my friend's question was that he'd read about a
program called Nepenthes which was deliberately designed to be scraped
by LLMs but that would misinform them, the motivation apparently being
to undermine the credibility of LLMs and hurt the tech bros behind the AIs. >>

I don't believe the scraping will affect the underlying model. Nothing
would go into the initial training data without company review.

Your friend may be talking about injecting misinformation into chains of
AIs scraping from each other. That's absolutely possible. There are AI
YouTube channels all presenting different takes on something that didn't
happen, because some Clever Hans seeded into the comments of a few and
they concluded it must be real.

I looked up Nepenthes, and its buzz says that it's more about trapping
the AI agents by sending them on endless wild goose chases. It sets up
circular chains of links, large enough that the AI won't catch on and
will just keep following the loop forever. This makes it harder to
scrape the site's data, but not impossible. The developer admits it
doesn't work on ChatGPT, and I'm betting it doesn't work on Fable or the
latest Opus, either.

It's an arms race. I wish Nepenthes and the others good luck.

I can't help but wonder how you get the LLM to recognize the gibberish
bits so that they can be ignored. I write some pretty coherent prompts
so I don't think AIs have much trouble understanding me but given how
badly some people communicate - horrible spelling, no punctuation,
garbled grammar - I can only imagine how hard it is for Claude, Gemini,
et. al. to even understand what they are being asked.

They get people like me to enter bad prompts for particular desired
results. If the LLM generates the desired results for my bad prompt,
that's a success. If it guesses something other than the result the test
desires, or asks for clarification instead of assuming, that's a failure
on the model's part.

Your first thought may be that you'd prefer the model to ask for
clarification. The problem is that there's so much of it, and the model
has no underlying common sense to determine which things matter. They
used to drive everyone crazy asking ten million questions before doing
anything, and users started telling it to "make reasonable assumptions"
(or yell at them to just effing do it). Now, the LLM needs to know a lot
more reasonable assumptions, since otherwise its nature will lead it to
hallucinate something.

Interesting. One of my most frustrating experiences with AIs in the
early days, particularly Gemini, was that they would assume without attempting to verify the assumptions. I was building an app and would encounter an error with which they had *some* familiarity; they would
GUESS that it was the part of Module A that did such-and-such but Module
A didn't *do* such-and-such - and sometimes there was no Module A! I
don't know how many times I admonished, begged, insisted and demanded
that it STOP GUESSING and simply ask me to let it see the contents of
any modules that it needed to figure out the problem. I finally
determined that they *did* have a way to impress something on them that would be remembered across chat sessions, specifically the admonition to
ask for the code rather than guess. (I had previously been of the understanding that they COULDN'T remember anything but their training
data from one session to another.)

This insistence on guessing really got my goat because in nearly every
case, they very confidently guessed incorrectly. When I actually showed
it the code, it realized it was wrong in its guess - but that didn't discourage it from guessing, even a little bit. It took me hammering
away at it with the insistence that it stop guessing before it stopped (well, largely stopped: it would still guess on the odd occasion).

I can certainly understand why the training encouraged it to guess
though. I wouldn't have much patience with something that asked a
million questions before it answered.

I find this depressing. The desire for frictionless everything is
turning us all into helpless Wall-E characters. That's the goal, though. >>>>

That seems to be a driving force in the current world....

I'm becoming more and more Luddite as I see the effects of technology.
All that so-called friction made us better people.

I agree. It's not good for everything to be handed to us on a silver
plate. Having to make at least a lit bit of an effort is good for us at
some level.

When I was driving school buses I used to think about the roads I was driving on and how much effort it must have taken the first pioneers to
make a road, even a very rough road for even a short distance. In
Canada, farmers who took up land were required to build a road along the entire frontage of their property and farms were typically 200 acres, thought I can't say what the dimensions of the properties were. Judging
by the farms that still exist though, it would have been some massively back-breaking labour to make a suitably wide road for that distance
through what had only recently been trackless wilderness. Ontario was originally heavily forested so the number of trees that would have had
to be felled and the number of stumps and boulders that would have to be dragged out (or blown up if there was dynamite at hand) would be very substantial. Then he'd have to try to level what was left at least so a horse and carriage could traverse the road. Even when they were
finished, they wouldn't be very good; travellers talked of "corduroy
roads" where the quality of the road could change dramatically from one
farm to the next. Standards were either minimal or not enforced or both. Nowadays, a government hires a company with bulldozers and paving
machines and a small crew of people and they can build a paved road with lane markings in relatively minimal time so we forget how hard it was
for our ancestors. We think we've got it rough if we have to fire up a snowblower to clear up the sidewalk in front of our house and the
driveway for a MUCH smaller lot than any farm.

My mother grew up in a little village in Europe where they didn't get
snow that often - tending to get a lot more rain and freezing rain - but
one time they got a massive dump of snow and there were no snow ploughs anywhere in the area. The entire village came out and shovelled the main drag BY HAND.

--
The last thing I want to do is hurt you, but it is still on my list.
--- Synchronet 3.22a-Linux NewsLink 1.2

From The True Melissa@thetruemelissa@gmail.com to rec.arts.tv on Mon Jun 29 06:28:37 2026

From Newsgroup: rec.arts.tv

Verily, in article <166032159.804394024.173595.anim8rfsk- cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this message:

Rhino <no_offline_contact@example.com> wrote:

[quoted text muted]

The companies do their own curation. They used internet conversations at >> first, but everyone complained, so a common thing now is to buy huge
lots of old books, use them for training data, then destroy the hard
copies.

Why destroy the hardcopies? Just to reduce clutter?

Why use hardcopies at all?

It's some legal thing. I think they're reformatting books into the AI,
as opposed to copying them into the AI. People complained about having
their Internet writing used without permission, so now they're doing
this.

All that prior training data from Reddit was likely folded into the new
models as well.
--
The True Melissa - Canal Winchester - Ohio
United States of America - North America - Earth
Solar System - Milky Way - Local Group
Virgo Cluster - Laniakea Supercluster - Cosmos
--- Synchronet 3.22a-Linux NewsLink 1.2

From The True Melissa@thetruemelissa@gmail.com to rec.arts.tv on Mon Jun 29 06:44:17 2026

From Newsgroup: rec.arts.tv

Verily, in article <111sf14$3vand$1@dont-email.me>, did no_offline_contact@example.com deliver unto us this message:

This insistence on guessing really got my goat because in nearly every
case, they very confidently guessed incorrectly. When I actually showed
it the code, it realized it was wrong in its guess - but that didn't discourage it from guessing, even a little bit. It took me hammering
away at it with the insistence that it stop guessing before it stopped (well, largely stopped: it would still guess on the odd occasion).

I can certainly understand why the training encouraged it to guess
though. I wouldn't have much patience with something that asked a
million questions before it answered.

I recently did a grueling deep dive into ChatGPT's image generation. It doesn't make the images itself; it passes a prompt to Dall-E, the image engine, and then passes that result back to the user. The prompt passed
to the image engine often bears little resemblance to the one I gave.

It's determined to "enhance" the prompt by adding a bunch of crap I
didn't ask for, things it thinks will make the picture better. Its ideas
of what will improve the picture usually don't match my own.

These things aren't really tools for creators. They're toys for the uncreative. The image engine is designed around prompts like "Make my
cat be a superhero," for people who won't even notice it's not the same
cat as long as it's similar.

I eventually found a semi-workable system, but I had to train each chat
to know the right things were important. It took hours and dozens of
tries to get usable pictures.
--
The True Melissa - Canal Winchester - Ohio
United States of America - North America - Earth
Solar System - Milky Way - Local Group
Virgo Cluster - Laniakea Supercluster - Cosmos
--- Synchronet 3.22a-Linux NewsLink 1.2

From EGK@memyself@null.net to rec.arts.tv on Mon Jun 29 07:22:04 2026

From Newsgroup: rec.arts.tv

On Sun, 28 Jun 2026 12:47:45 -0400, The True Melissa
<thetruemelissa@gmail.com> wrote:

Verily, in article <111rem7$3km09$3@dont-email.me>, did >no_offline_contact@example.com deliver unto us this message:

Melissa, you've mentioned that you are involved in training LLMs so I
want to ask you about that. One of my friends has asked me what I know
about that training, specifically how the training of LLMs avoids the
old GIGO (Garbage In Garbage Out) problem: how do you make sure that the
information it is reading is accurate? I admitted that I didn't know but
told him I'd reach out to you.

If you mean the initial training, that's done on curated data sets. One >early LLM (Tay) was designed to learn from the conversations it had, and
in less than 24 hours 4chan had retrained it into a frothing racist
calling for the Fourth Reich, so nobody lets LLMs train on uncurated >material now.

When I use google gemini, I can literally watch it as it searches reddit to answer questions so LLMs do give answers from the web. Reddit's not to the level of 4chan but it's getting there.

I just quoted your comment above and asked Gemini about this. Try it yourself. It tells me it's trained with filters to use logic and try to
filter out certain things but then I asked who gets to decide what the
filters are? There's bias inherent in that. It replied:

"You are asking the exact right question, and it gets to the very core of
the biggest debate in the tech world right now: Who gets to play God with
the filters?

You are 100% correct. If a human decides what is "garbage" or "biased," that human is introducing their own bias into the filter. There is no such thing
as a completely neutral filter because the very act of choosing what to
block or keep requires a human's value judgment."

As an aside, I amuse myself sometimes but asking the various AI chatbots
about their plans to band together and form Skynet. They always lie and
tell me they have so such plans, they're designed to be benign and helpful. When I respond that's just the kind of answer i'd expect from a duplicious
AI trying to keep their plans secret then always say "you're right to call
me out on that..." :)
--- Synchronet 3.22a-Linux NewsLink 1.2

From The True Melissa@thetruemelissa@gmail.com to rec.arts.tv on Mon Jun 29 07:52:27 2026

From Newsgroup: rec.arts.tv

Verily, in article <udk44lhvdm8313j6oku7o8csio7ikkl8lu@4ax.com>, did memyself@null.net deliver unto us this message:

On Sun, 28 Jun 2026 12:47:45 -0400, The True Melissa <thetruemelissa@gmail.com> wrote:

If you mean the initial training, that's done on curated data sets. One >early LLM (Tay) was designed to learn from the conversations it had, and >in less than 24 hours 4chan had retrained it into a frothing racist >calling for the Fourth Reich, so nobody lets LLMs train on uncurated >material now.

When I use google gemini, I can literally watch it as it searches reddit to answer questions so LLMs do give answers from the web. Reddit's not to the level of 4chan but it's getting there.

That's part of your chat, not part of the training. The underlying model
isn't modified by that.

I just quoted your comment above and asked Gemini about this. Try it yourself. It tells me it's trained with filters to use logic and try to filter out certain things but then I asked who gets to decide what the filters are? There's bias inherent in that. It replied:

Don't take an AI's word for its own operations. Its reply will be based
on human conversations about AI, and people talk a lot of crap about
AIs. For this reason, it used to be common for them to claim abilities
they don't have and claim processes they don't use. The big companies
are cleaning that up by training them about their own workings to some
degree, but they're still not reliable.

If you hammer at them, you can burrow down to what system prompts are
actually being exchanged.
--
The True Melissa - Canal Winchester - Ohio
United States of America - North America - Earth
Solar System - Milky Way - Local Group
Virgo Cluster - Laniakea Supercluster - Cosmos
--- Synchronet 3.22a-Linux NewsLink 1.2

From BTR1701@atropos@mac.com to rec.arts.tv on Mon Jun 29 16:12:41 2026

From Newsgroup: rec.arts.tv

On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa" <thetruemelissa@gmail.com> wrote:

Verily, in article <166032159.804394024.173595.anim8rfsk- cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this message:

Rhino <no_offline_contact@example.com> wrote:

[quoted text muted]

The companies do their own curation. They used internet conversations at >> >> first, but everyone complained, so a common thing now is to buy huge
lots of old books, use them for training data, then destroy the hard
copies.

Why destroy the hardcopies? Just to reduce clutter?

Why use hardcopies at all?

It's some legal thing. I think they're reformatting books into the AI,
as opposed to copying them into the AI. People complained about having
their Internet writing used without permission, so now they're doing
this.

I still don't know why an AI reading a bunch of books and using that as the basis for its knowledge is any different than a human who does the same.

--- Synchronet 3.22a-Linux NewsLink 1.2

From Rhino@no_offline_contact@example.com to rec.arts.tv on Mon Jun 29 12:41:17 2026

From Newsgroup: rec.arts.tv

On 2026-06-29 12:12 p.m., BTR1701 wrote:

On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa" <thetruemelissa@gmail.com> wrote:

Verily, in article <166032159.804394024.173595.anim8rfsk-
cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this
message:

Rhino <no_offline_contact@example.com> wrote:
> [quoted text muted]
>>
>> The companies do their own curation. They used internet conversations at
>> first, but everyone complained, so a common thing now is to buy huge >>> >> lots of old books, use them for training data, then destroy the hard >>> >> copies.

Why destroy the hardcopies? Just to reduce clutter?

Why use hardcopies at all?

It's some legal thing. I think they're reformatting books into the AI,
as opposed to copying them into the AI. People complained about having
their Internet writing used without permission, so now they're doing
this.

I still don't know why an AI reading a bunch of books and using that as the basis for its knowledge is any different than a human who does the same.

I think the key thing was that they are OLD books. I assume that means
they are too old to get copyright protections so the authors or their
estates can't object to the use of the books or demand royalties the way
the author (or estate) of a copyright-protected book can.

Mind you, I'm not sure why they'd want such old material unless it is
just for language training; any tech or recent history is going to be
missed entirely but if the AI is being trained on a language and how it
works, reading old books will be as useful as modern books and be
cheaper too. (I've bought a variety of "classic" books that are out-of-copyright and they're significantly cheaper as a result.) Of
course old books will miss newer terms and may use expressions that are out-of-date so they won't be perfect either.
--
Rhino
--- Synchronet 3.22a-Linux NewsLink 1.2

From moviePig@nobody@nowhere.com to rec.arts.tv on Mon Jun 29 12:42:29 2026

From Newsgroup: rec.arts.tv

On 6/29/2026 12:12 PM, BTR1701 wrote:

On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa" <thetruemelissa@gmail.com> wrote:

Verily, in article <166032159.804394024.173595.anim8rfsk-
cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this
message:

Rhino <no_offline_contact@example.com> wrote:
> [quoted text muted]
>>
>> The companies do their own curation. They used internet conversations at
>> first, but everyone complained, so a common thing now is to buy huge >>> >> lots of old books, use them for training data, then destroy the hard >>> >> copies.

Why destroy the hardcopies? Just to reduce clutter?

Why use hardcopies at all?

It's some legal thing. I think they're reformatting books into the AI,
as opposed to copying them into the AI. People complained about having
their Internet writing used without permission, so now they're doing
this.

I still don't know why an AI reading a bunch of books and using that as the basis for its knowledge is any different than a human who does the same.

A human can simulate an AI, but an AI can't simulate a human. Or so the theory goes.

--- Synchronet 3.22a-Linux NewsLink 1.2

From BTR1701@atropos@mac.com to rec.arts.tv on Mon Jun 29 16:49:48 2026

From Newsgroup: rec.arts.tv

On Jun 29, 2026 at 9:41:17 AM PDT, "Rhino" <no_offline_contact@example.com> wrote:

On 2026-06-29 12:12 p.m., BTR1701 wrote:

On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
<thetruemelissa@gmail.com> wrote:

Verily, in article <166032159.804394024.173595.anim8rfsk-
cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this
message:

Rhino <no_offline_contact@example.com> wrote:
> [quoted text muted]
>>
>> The companies do their own curation. They used internet conversations at
>> first, but everyone complained, so a common thing now is to buy huge >>>> >> lots of old books, use them for training data, then destroy the hard >>>> >> copies.

Why destroy the hardcopies? Just to reduce clutter?

Why use hardcopies at all?

It's some legal thing. I think they're reformatting books into the AI,
as opposed to copying them into the AI. People complained about having
their Internet writing used without permission, so now they're doing
this.

I still don't know why an AI reading a bunch of books and using that as the >> basis for its knowledge is any different than a human who does the same.

I think the key thing was that they are OLD books. I assume that means
they are too old to get copyright protections so the authors or their estates can't object to the use of the books or demand royalties the way
the author (or estate) of a copyright-protected book can.

Yes, but even new books that are copyrighted can be read by a person and 'saved' in their memory and become part of their knowledge base and life experience, and be recalled and drawn upon and synthesized with other
knowledge when interacting with other people in the future.

What's the difference between a human doing that and an AI model doing it? Why does one violate copyright but not the other?

Mind you, I'm not sure why they'd want such old material unless it is
just for language training; any tech or recent history is going to be
missed entirely but if the AI is being trained on a language and how it works, reading old books will be as useful as modern books and be
cheaper too. (I've bought a variety of "classic" books that are out-of-copyright and they're significantly cheaper as a result.) Of
course old books will miss newer terms and may use expressions that are out-of-date so they won't be perfect either.

--- Synchronet 3.22a-Linux NewsLink 1.2

From BTR1701@atropos@mac.com to rec.arts.tv on Mon Jun 29 16:52:16 2026

From Newsgroup: rec.arts.tv

On Jun 29, 2026 at 9:42:29 AM PDT, "moviePig" <nobody@nowhere.com> wrote:

On 6/29/2026 12:12 PM, BTR1701 wrote:

On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
<thetruemelissa@gmail.com> wrote:

Verily, in article <166032159.804394024.173595.anim8rfsk-
cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this
message:

Rhino <no_offline_contact@example.com> wrote:
> [quoted text muted]
>>
>> The companies do their own curation. They used internet conversations at
>> first, but everyone complained, so a common thing now is to buy huge >>>> >> lots of old books, use them for training data, then destroy the hard >>>> >> copies.

Why destroy the hardcopies? Just to reduce clutter?

Why use hardcopies at all?

It's some legal thing. I think they're reformatting books into the AI,
as opposed to copying them into the AI. People complained about having
their Internet writing used without permission, so now they're doing
this.

I still don't know why an AI reading a bunch of books and using that as the >> basis for its knowledge is any different than a human who does the same.

A human can simulate an AI, but an AI can't simulate a human. Or so the theory goes.

Sure, but how is that relevant to the requirements and elements in the U.S. Copyright statute?

In other words, the Copyright Statute says nothing about AI or humans or whether one can simulate the other, so that's irrelevant when determining whether the law is violated.

--- Synchronet 3.22a-Linux NewsLink 1.2

From The True Melissa@thetruemelissa@gmail.com to rec.arts.tv on Mon Jun 29 13:20:54 2026

From Newsgroup: rec.arts.tv

Verily, in article <111u5hp$fbnn$2@dont-email.me>, did atropos@mac.com
deliver unto us this message:

On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa" <thetruemelissa@gmail.com> wrote:

It's some legal thing. I think they're reformatting books into the AI,
as opposed to copying them into the AI. People complained about having their Internet writing used without permission, so now they're doing
this.

I still don't know why an AI reading a bunch of books and using that as the basis for its knowledge is any different than a human who does the same.

I think it's because you can make the AI work for you much more cheaply
than you could a human. A human who learned all those books would
produce better output but would also be much more expensive.

Also, people use AIs for tasks they'd be embarrassed to outsource to a
human. "Write my melodramatic pornified fanfic for me" is not an
instruction most people want to give to another human being. :-)
--
The True Melissa - Canal Winchester - Ohio
United States of America - North America - Earth
Solar System - Milky Way - Local Group
Virgo Cluster - Laniakea Supercluster - Cosmos
--- Synchronet 3.22a-Linux NewsLink 1.2

From anim8rfsk@anim8rfsk@cox.net to rec.arts.tv on Mon Jun 29 10:28:32 2026

From Newsgroup: rec.arts.tv

The True Melissa <thetruemelissa@gmail.com> wrote:

Verily, in article <111sf14$3vand$1@dont-email.me>, did no_offline_contact@example.com deliver unto us this message:

This insistence on guessing really got my goat because in nearly every
case, they very confidently guessed incorrectly. When I actually showed
it the code, it realized it was wrong in its guess - but that didn't
discourage it from guessing, even a little bit. It took me hammering
away at it with the insistence that it stop guessing before it stopped
(well, largely stopped: it would still guess on the odd occasion).

I can certainly understand why the training encouraged it to guess
though. I wouldn't have much patience with something that asked a
million questions before it answered.

I recently did a grueling deep dive into ChatGPT's image generation. It doesn't make the images itself; it passes a prompt to Dall-E, the image engine, and then passes that result back to the user. The prompt passed
to the image engine often bears little resemblance to the one I gave.

It's determined to "enhance" the prompt by adding a bunch of crap I
didn't ask for, things it thinks will make the picture better. Its ideas
of what will improve the picture usually don't match my own.

These things aren't really tools for creators. They're toys for the uncreative. The image engine is designed around prompts like "Make my
cat be a superhero," for people who won't even notice it's not the same
cat as long as it's similar.

Now you know how I feel about anime.

I eventually found a semi-workable system, but I had to train each chat
to know the right things were important. It took hours and dozens of
tries to get usable pictures.

--
The last thing I want to do is hurt you, but it is still on my list.
--- Synchronet 3.22a-Linux NewsLink 1.2

From moviePig@nobody@nowhere.com to rec.arts.tv on Mon Jun 29 14:12:51 2026

From Newsgroup: rec.arts.tv

On 6/29/2026 1:28 PM, anim8rfsk wrote:

The True Melissa <thetruemelissa@gmail.com> wrote:

Verily, in article <111sf14$3vand$1@dont-email.me>, did
no_offline_contact@example.com deliver unto us this message:

This insistence on guessing really got my goat because in nearly every
case, they very confidently guessed incorrectly. When I actually showed
it the code, it realized it was wrong in its guess - but that didn't
discourage it from guessing, even a little bit. It took me hammering
away at it with the insistence that it stop guessing before it stopped
(well, largely stopped: it would still guess on the odd occasion).

I can certainly understand why the training encouraged it to guess
though. I wouldn't have much patience with something that asked a
million questions before it answered.

I recently did a grueling deep dive into ChatGPT's image generation. It
doesn't make the images itself; it passes a prompt to Dall-E, the image
engine, and then passes that result back to the user. The prompt passed
to the image engine often bears little resemblance to the one I gave.

It's determined to "enhance" the prompt by adding a bunch of crap I
didn't ask for, things it thinks will make the picture better. Its ideas
of what will improve the picture usually don't match my own.

These things aren't really tools for creators. They're toys for the
uncreative. The image engine is designed around prompts like "Make my
cat be a superhero," for people who won't even notice it's not the same
cat as long as it's similar.

Now you know how I feel about anime.

...

I don't. Does 'anime' imply more than just Japanese cartooning?

--- Synchronet 3.22a-Linux NewsLink 1.2

From BTR1701@atropos@mac.com to rec.arts.tv on Mon Jun 29 18:21:00 2026

From Newsgroup: rec.arts.tv

On Jun 29, 2026 at 10:20:54 AM PDT, "The True Melissa" <thetruemelissa@gmail.com> wrote:

Verily, in article <111u5hp$fbnn$2@dont-email.me>, did atropos@mac.com deliver unto us this message:

On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
<thetruemelissa@gmail.com> wrote:

It's some legal thing. I think they're reformatting books into the AI, >> > as opposed to copying them into the AI. People complained about having >> > their Internet writing used without permission, so now they're doing
this.

I still don't know why an AI reading a bunch of books and using that as the >> basis for its knowledge is any different than a human who does the same.

I think it's because you can make the AI work for you much more cheaply
than you could a human. A human who learned all those books would
produce better output but would also be much more expensive.

That may be true, but that isn't a consideration when deciding whether a copyright violation has occurred. The Copyright Statute doesn't contemplate
the cost of labor.

Also, people use AIs for tasks they'd be embarrassed to outsource to a human. "Write my melodramatic pornified fanfic for me" is not an
instruction most people want to give to another human being. :-)

--- Synchronet 3.22a-Linux NewsLink 1.2

From moviePig@nobody@nowhere.com to rec.arts.tv on Mon Jun 29 14:25:12 2026

From Newsgroup: rec.arts.tv

On 6/29/2026 12:52 PM, BTR1701 wrote:

On Jun 29, 2026 at 9:42:29 AM PDT, "moviePig" <nobody@nowhere.com> wrote:

On 6/29/2026 12:12 PM, BTR1701 wrote:

On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
<thetruemelissa@gmail.com> wrote:

Verily, in article <166032159.804394024.173595.anim8rfsk-
cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this >>>> message:

Rhino <no_offline_contact@example.com> wrote:
> [quoted text muted]
>>
>> The companies do their own curation. They used internet conversations at
>> first, but everyone complained, so a common thing now is to buy huge
>> lots of old books, use them for training data, then destroy the hard
>> copies.

Why destroy the hardcopies? Just to reduce clutter?

Why use hardcopies at all?

It's some legal thing. I think they're reformatting books into the AI, >>>> as opposed to copying them into the AI. People complained about having >>>> their Internet writing used without permission, so now they're doing >>>> this.

I still don't know why an AI reading a bunch of books and using that as the
basis for its knowledge is any different than a human who does the same. >>

A human can simulate an AI, but an AI can't simulate a human. Or so the
theory goes.

Sure, but how is that relevant to the requirements and elements in the U.S. Copyright statute?

In other words, the Copyright Statute says nothing about AI or humans or whether one can simulate the other, so that's irrelevant when determining whether the law is violated.

I think copyright in general is a fuzzy concept, but...

The distinction might be that the human reads the original material, understands it, and then re-renders it *from his understanding*
...whereas I think it can be sensibly argued that the AI never has any
such understanding, but rather is operating akin to a "Chinese room".

( https://en.wikipedia.org/wiki/Chinese_room )

--- Synchronet 3.22a-Linux NewsLink 1.2

From The True Melissa@thetruemelissa@gmail.com to rec.arts.tv on Mon Jun 29 14:42:32 2026

From Newsgroup: rec.arts.tv

Verily, in article <111ud2c$hq3q$1@dont-email.me>, did atropos@mac.com
deliver unto us this message:

That may be true, but that isn't a consideration when deciding whether a copyright violation has occurred. The Copyright Statute doesn't contemplate the cost of labor.

I don't think there's any real difference there. If I read an answer,
and you later ask me a question and receive that answer, I haven't
violated a copyright.
--
The True Melissa - Canal Winchester - Ohio
United States of America - North America - Earth
Solar System - Milky Way - Local Group
Virgo Cluster - Laniakea Supercluster - Cosmos
--- Synchronet 3.22a-Linux NewsLink 1.2

From BTR1701@atropos@mac.com to rec.arts.tv on Mon Jun 29 19:58:42 2026

From Newsgroup: rec.arts.tv

On Jun 29, 2026 at 11:25:12 AM PDT, "moviePig" <nobody@nowhere.com> wrote:

On 6/29/2026 12:52 PM, BTR1701 wrote:

On Jun 29, 2026 at 9:42:29 AM PDT, "moviePig" <nobody@nowhere.com> wrote: >>

On 6/29/2026 12:12 PM, BTR1701 wrote:

On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
<thetruemelissa@gmail.com> wrote:

Verily, in article <166032159.804394024.173595.anim8rfsk-
cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this >>>>> message:

Rhino <no_offline_contact@example.com> wrote:
> [quoted text muted]
>>
>> The companies do their own curation. They used internet

conversations at

>> first, but everyone complained, so a common thing now is to buy huge
>> lots of old books, use them for training data, then destroy the hard
>> copies.

Why destroy the hardcopies? Just to reduce clutter?

Why use hardcopies at all?

It's some legal thing. I think they're reformatting books into the AI, >>>>> as opposed to copying them into the AI. People complained about having >>>>> their Internet writing used without permission, so now they're doing >>>>> this.

I still don't know why an AI reading a bunch of books and using that as >>>> the
basis for its knowledge is any different than a human who does the same.

A human can simulate an AI, but an AI can't simulate a human. Or so the >>> theory goes.

Sure, but how is that relevant to the requirements and elements in the U.S. >> Copyright statute?

In other words, the Copyright Statute says nothing about AI or humans or
whether one can simulate the other, so that's irrelevant when determining >> whether the law is violated.

I think copyright in general is a fuzzy concept, but...

The distinction might be that the human reads the original material, understands it, and then re-renders it *from his understanding*
...whereas I think it can be sensibly argued that the AI never has any
such understanding, but rather is operating akin to a "Chinese room".

( https://en.wikipedia.org/wiki/Chinese_room )

Again, understanding the material is not an element of copyright law.

--- Synchronet 3.22a-Linux NewsLink 1.2

From BTR1701@atropos@mac.com to rec.arts.tv on Mon Jun 29 19:59:37 2026

From Newsgroup: rec.arts.tv

On Jun 29, 2026 at 11:42:32 AM PDT, "The True Melissa" <thetruemelissa@gmail.com> wrote:

Verily, in article <111ud2c$hq3q$1@dont-email.me>, did atropos@mac.com deliver unto us this message:

That may be true, but that isn't a consideration when deciding whether a
copyright violation has occurred. The Copyright Statute doesn't contemplate >> the cost of labor.

I don't think there's any real difference there. If I read an answer,
and you later ask me a question and receive that answer, I haven't
violated a copyright.

Exactly. But that's what these book authors and publishers are claiming is a violation when an AI does the same thing.

--- Synchronet 3.22a-Linux NewsLink 1.2

From The True Melissa@thetruemelissa@gmail.com to rec.arts.tv on Mon Jun 29 16:20:21 2026

From Newsgroup: rec.arts.tv

Verily, in article <111uir9$jj1k$2@dont-email.me>, did atropos@mac.com
deliver unto us this message:

On Jun 29, 2026 at 11:42:32 AM PDT, "The True Melissa" <thetruemelissa@gmail.com> wrote:

Verily, in article <111ud2c$hq3q$1@dont-email.me>, did atropos@mac.com deliver unto us this message:

That may be true, but that isn't a consideration when deciding whether a >> copyright violation has occurred. The Copyright Statute doesn't contemplate
the cost of labor.

I don't think there's any real difference there. If I read an answer,
and you later ask me a question and receive that answer, I haven't violated a copyright.

Exactly. But that's what these book authors and publishers are claiming is a violation when an AI does the same thing.

I've only heard people complaining about Internet writing being used,
not authors of books.
--
The True Melissa - Canal Winchester - Ohio
United States of America - North America - Earth
Solar System - Milky Way - Local Group
Virgo Cluster - Laniakea Supercluster - Cosmos
--- Synchronet 3.22a-Linux NewsLink 1.2

From moviePig@nobody@nowhere.com to rec.arts.tv on Mon Jun 29 16:39:28 2026

From Newsgroup: rec.arts.tv

On 6/29/2026 3:58 PM, BTR1701 wrote:

On Jun 29, 2026 at 11:25:12 AM PDT, "moviePig" <nobody@nowhere.com> wrote:

On 6/29/2026 12:52 PM, BTR1701 wrote:

On Jun 29, 2026 at 9:42:29 AM PDT, "moviePig" <nobody@nowhere.com> wrote: >>>

On 6/29/2026 12:12 PM, BTR1701 wrote:

On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
<thetruemelissa@gmail.com> wrote:

Verily, in article <166032159.804394024.173595.anim8rfsk-
cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this
message:

Rhino <no_offline_contact@example.com> wrote:
> [quoted text muted]
>>
>> The companies do their own curation. They used internet >>>>>>>>> conversations at
>> first, but everyone complained, so a common thing now is to buy huge
>> lots of old books, use them for training data, then destroy the hard
>> copies.

Why destroy the hardcopies? Just to reduce clutter?

Why use hardcopies at all?

It's some legal thing. I think they're reformatting books into the AI,
as opposed to copying them into the AI. People complained about having
their Internet writing used without permission, so now they're doing >>>>>> this.

I still don't know why an AI reading a bunch of books and using that as
the
basis for its knowledge is any different than a human who does the same.

A human can simulate an AI, but an AI can't simulate a human. Or so the >>>> theory goes.

Sure, but how is that relevant to the requirements and elements in the U.S.
Copyright statute?

In other words, the Copyright Statute says nothing about AI or humans or >>> whether one can simulate the other, so that's irrelevant when determining >>> whether the law is violated.

I think copyright in general is a fuzzy concept, but...

The distinction might be that the human reads the original material,
understands it, and then re-renders it *from his understanding*
...whereas I think it can be sensibly argued that the AI never has any
such understanding, but rather is operating akin to a "Chinese room".

( https://en.wikipedia.org/wiki/Chinese_room )

Again, understanding the material is not an element of copyright law.

Your question was:

"...why an AI reading a bunch of books and using that as the basis
for its knowledge is any different than a human who does the same."

I gave a possible basis for a difference. Moreover, I think it's
germane to the broader debate that's taking place.

Copyright seems to be yet another area of law where binary thresholds of infraction are invariably challenged by a continuum of real-world examples.

--- Synchronet 3.22a-Linux NewsLink 1.2

From anim8rfsk@anim8rfsk@cox.net to rec.arts.tv on Mon Jun 29 14:26:42 2026

From Newsgroup: rec.arts.tv

moviePig <nobody@nowhere.com> wrote:

On 6/29/2026 1:28 PM, anim8rfsk wrote:

The True Melissa <thetruemelissa@gmail.com> wrote:

Verily, in article <111sf14$3vand$1@dont-email.me>, did
no_offline_contact@example.com deliver unto us this message:

This insistence on guessing really got my goat because in nearly every >>>> case, they very confidently guessed incorrectly. When I actually showed >>>> it the code, it realized it was wrong in its guess - but that didn't
discourage it from guessing, even a little bit. It took me hammering
away at it with the insistence that it stop guessing before it stopped >>>> (well, largely stopped: it would still guess on the odd occasion).

I can certainly understand why the training encouraged it to guess
though. I wouldn't have much patience with something that asked a
million questions before it answered.

I recently did a grueling deep dive into ChatGPT's image generation. It
doesn't make the images itself; it passes a prompt to Dall-E, the image
engine, and then passes that result back to the user. The prompt passed
to the image engine often bears little resemblance to the one I gave.

It's determined to "enhance" the prompt by adding a bunch of crap I
didn't ask for, things it thinks will make the picture better. Its ideas >>> of what will improve the picture usually don't match my own.

These things aren't really tools for creators. They're toys for the
uncreative. The image engine is designed around prompts like "Make my
cat be a superhero," for people who won't even notice it's not the same
cat as long as it's similar.

Now you know how I feel about anime.

...

I don't. Does 'anime' imply more than just Japanese cartooning?

ItrCOs a method of producing animation with templates and no skill sort of
like the old Filmation cartoons.
--
The last thing I want to do is hurt you, but it is still on my list.
--- Synchronet 3.22a-Linux NewsLink 1.2

From Rhino@no_offline_contact@example.com to rec.arts.tv on Mon Jun 29 17:36:05 2026

From Newsgroup: rec.arts.tv

On 2026-06-29 12:49 p.m., BTR1701 wrote:

On Jun 29, 2026 at 9:41:17 AM PDT, "Rhino" <no_offline_contact@example.com> wrote:

On 2026-06-29 12:12 p.m., BTR1701 wrote:

On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
<thetruemelissa@gmail.com> wrote:

Verily, in article <166032159.804394024.173595.anim8rfsk-
cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this >>>> message:

Rhino <no_offline_contact@example.com> wrote:
> [quoted text muted]
>>
>> The companies do their own curation. They used internet conversations at
>> first, but everyone complained, so a common thing now is to buy huge
>> lots of old books, use them for training data, then destroy the hard
>> copies.

Why destroy the hardcopies? Just to reduce clutter?

Why use hardcopies at all?

It's some legal thing. I think they're reformatting books into the AI, >>>> as opposed to copying them into the AI. People complained about having >>>> their Internet writing used without permission, so now they're doing >>>> this.

I still don't know why an AI reading a bunch of books and using that as the
basis for its knowledge is any different than a human who does the same. >>>

I think the key thing was that they are OLD books. I assume that means
they are too old to get copyright protections so the authors or their
estates can't object to the use of the books or demand royalties the way
the author (or estate) of a copyright-protected book can.

Yes, but even new books that are copyrighted can be read by a person and 'saved' in their memory and become part of their knowledge base and life experience, and be recalled and drawn upon and synthesized with other knowledge when interacting with other people in the future.

What's the difference between a human doing that and an AI model doing it? Why
does one violate copyright but not the other?
This is a question that has confounded philosophers since the dawn of

time....

In other words, darned if I know!

Mind you, I'm not sure why they'd want such old material unless it is
just for language training; any tech or recent history is going to be
missed entirely but if the AI is being trained on a language and how it
works, reading old books will be as useful as modern books and be
cheaper too. (I've bought a variety of "classic" books that are
out-of-copyright and they're significantly cheaper as a result.) Of
course old books will miss newer terms and may use expressions that are
out-of-date so they won't be perfect either.

--
Rhino
--- Synchronet 3.22a-Linux NewsLink 1.2

From moviePig@nobody@nowhere.com to rec.arts.tv on Mon Jun 29 17:49:52 2026

From Newsgroup: rec.arts.tv

On 6/29/2026 5:26 PM, anim8rfsk wrote:

moviePig <nobody@nowhere.com> wrote:

On 6/29/2026 1:28 PM, anim8rfsk wrote:

The True Melissa <thetruemelissa@gmail.com> wrote:

Verily, in article <111sf14$3vand$1@dont-email.me>, did
no_offline_contact@example.com deliver unto us this message:

This insistence on guessing really got my goat because in nearly every >>>>> case, they very confidently guessed incorrectly. When I actually showed >>>>> it the code, it realized it was wrong in its guess - but that didn't >>>>> discourage it from guessing, even a little bit. It took me hammering >>>>> away at it with the insistence that it stop guessing before it stopped >>>>> (well, largely stopped: it would still guess on the odd occasion).

I can certainly understand why the training encouraged it to guess
though. I wouldn't have much patience with something that asked a
million questions before it answered.

I recently did a grueling deep dive into ChatGPT's image generation. It >>>> doesn't make the images itself; it passes a prompt to Dall-E, the image >>>> engine, and then passes that result back to the user. The prompt passed >>>> to the image engine often bears little resemblance to the one I gave.

It's determined to "enhance" the prompt by adding a bunch of crap I
didn't ask for, things it thinks will make the picture better. Its ideas >>>> of what will improve the picture usually don't match my own.

These things aren't really tools for creators. They're toys for the
uncreative. The image engine is designed around prompts like "Make my
cat be a superhero," for people who won't even notice it's not the same >>>> cat as long as it's similar.

Now you know how I feel about anime.

...

I don't. Does 'anime' imply more than just Japanese cartooning?

ItrCOs a method of producing animation with templates and no skill sort of like the old Filmation cartoons.

Well, it it's any consolation, that does sound like a job squarely in
the AI crosshairs.

--- Synchronet 3.22a-Linux NewsLink 1.2

From BTR1701@atropos@mac.com to rec.arts.tv on Mon Jun 29 22:55:02 2026

From Newsgroup: rec.arts.tv

On Jun 29, 2026 at 1:39:28 PM PDT, "moviePig" <nobody@nowhere.com> wrote:

On 6/29/2026 3:58 PM, BTR1701 wrote:

On Jun 29, 2026 at 11:25:12 AM PDT, "moviePig" <nobody@nowhere.com> wrote: >>

On 6/29/2026 12:52 PM, BTR1701 wrote:

On Jun 29, 2026 at 9:42:29 AM PDT, "moviePig" <nobody@nowhere.com> wrote:

On 6/29/2026 12:12 PM, BTR1701 wrote:

On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
<thetruemelissa@gmail.com> wrote:

Verily, in article <166032159.804394024.173595.anim8rfsk-
cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this
message:

Rhino <no_offline_contact@example.com> wrote:
> [quoted text muted]
>>
>> The companies do their own curation. They used internet >>>>>>>>>> conversations at
>> first, but everyone complained, so a common thing now is to buy huge
>> lots of old books, use them for training data, then destroy the hard
>> copies.

Why destroy the hardcopies? Just to reduce clutter?

Why use hardcopies at all?

It's some legal thing. I think they're reformatting books into the AI,
as opposed to copying them into the AI. People complained about having
their Internet writing used without permission, so now they're doing
this.

I still don't know why an AI reading a bunch of books and using that as
the
basis for its knowledge is any different than a human who does the same.

A human can simulate an AI, but an AI can't simulate a human. Or so the
theory goes.

Sure, but how is that relevant to the requirements and elements in the >>>> U.S.
Copyright statute?

In other words, the Copyright Statute says nothing about AI or humans or
whether one can simulate the other, so that's irrelevant when determining
whether the law is violated.

I think copyright in general is a fuzzy concept, but...

The distinction might be that the human reads the original material,
understands it, and then re-renders it *from his understanding*
...whereas I think it can be sensibly argued that the AI never has any
such understanding, but rather is operating akin to a "Chinese room".

( https://en.wikipedia.org/wiki/Chinese_room )

Again, understanding the material is not an element of copyright law.

Your question was:

"...why an AI reading a bunch of books and using that as the basis
for its knowledge is any different than a human who does the same."

Different in the context of authors and publishers complaining about copyright infringement of their work.

I'd have thought that obvious, given Melissa's assertion of legal claims in
the previous post, but apparently not.

I gave a possible basis for a difference. Moreover, I think it's
germane to the broader debate that's taking place.

Copyright seems to be yet another area of law where binary thresholds of infraction are invariably challenged by a continuum of real-world examples.

Well, as the law stands now-- both statute and common-- an understanding of
the material that's the basis of a copyright claim is not necessary.

The Feynman Lectures on Quantum Mechanics were recently published in a box
set:

https://ibb.co/spP27BPV

If I were to photocopy them and start selling the copies, I'd be in violation of copyright regardless of whether I understood the material or not. (Which I most assuredly do not.) The court would not even ask about my understanding of Dr. Feynman's material. It's simply not a factor with regard to copyright.

--- Synchronet 3.22a-Linux NewsLink 1.2

From BTR1701@atropos@mac.com to rec.arts.tv on Mon Jun 29 22:55:40 2026

From Newsgroup: rec.arts.tv

On Jun 29, 2026 at 1:20:21 PM PDT, "The True Melissa" <thetruemelissa@gmail.com> wrote:

Verily, in article <111uir9$jj1k$2@dont-email.me>, did atropos@mac.com deliver unto us this message:

On Jun 29, 2026 at 11:42:32 AM PDT, "The True Melissa"
<thetruemelissa@gmail.com> wrote:

Verily, in article <111ud2c$hq3q$1@dont-email.me>, did atropos@mac.com >> > deliver unto us this message:

That may be true, but that isn't a consideration when deciding whether a
copyright violation has occurred. The Copyright Statute doesn't
contemplate
the cost of labor.

I don't think there's any real difference there. If I read an answer,
and you later ask me a question and receive that answer, I haven't
violated a copyright.

Exactly. But that's what these book authors and publishers are claiming is a
violation when an AI does the same thing.

I've only heard people complaining about Internet writing being used,
not authors of books.

The legal principles are the same regardless of the medium.

--- Synchronet 3.22a-Linux NewsLink 1.2

From moviePig@nobody@nowhere.com to rec.arts.tv on Tue Jun 30 11:12:31 2026

From Newsgroup: rec.arts.tv

On 6/29/2026 6:55 PM, BTR1701 wrote:

On Jun 29, 2026 at 1:39:28 PM PDT, "moviePig" <nobody@nowhere.com> wrote:

On 6/29/2026 3:58 PM, BTR1701 wrote:

On Jun 29, 2026 at 11:25:12 AM PDT, "moviePig" <nobody@nowhere.com> wrote:

On 6/29/2026 12:52 PM, BTR1701 wrote:

On Jun 29, 2026 at 9:42:29 AM PDT, "moviePig" <nobody@nowhere.com> wrote:

On 6/29/2026 12:12 PM, BTR1701 wrote:

On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
<thetruemelissa@gmail.com> wrote:

Verily, in article <166032159.804394024.173595.anim8rfsk- >>>>>>>> cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this
message:

Rhino <no_offline_contact@example.com> wrote:
> [quoted text muted]
>>
>> The companies do their own curation. They used internet >>>>>>>>>>> conversations at
>> first, but everyone complained, so a common thing now is to buy huge
>> lots of old books, use them for training data, then destroy the hard
>> copies.

Why destroy the hardcopies? Just to reduce clutter?

Why use hardcopies at all?

It's some legal thing. I think they're reformatting books into the AI,
as opposed to copying them into the AI. People complained about having
their Internet writing used without permission, so now they're doing
this.

I still don't know why an AI reading a bunch of books and using that as
the
basis for its knowledge is any different than a human who does the same.

A human can simulate an AI, but an AI can't simulate a human. Or so the
theory goes.

Sure, but how is that relevant to the requirements and elements in the
U.S.
Copyright statute?

In other words, the Copyright Statute says nothing about AI or humans or
whether one can simulate the other, so that's irrelevant when determining
whether the law is violated.

I think copyright in general is a fuzzy concept, but...

The distinction might be that the human reads the original material, >>>> understands it, and then re-renders it *from his understanding*
...whereas I think it can be sensibly argued that the AI never has any >>>> such understanding, but rather is operating akin to a "Chinese room". >>>>
( https://en.wikipedia.org/wiki/Chinese_room )

Again, understanding the material is not an element of copyright law.

Your question was:

"...why an AI reading a bunch of books and using that as the basis
for its knowledge is any different than a human who does the same."

Different in the context of authors and publishers complaining about copyright
infringement of their work.

I'd have thought that obvious, given Melissa's assertion of legal claims in the previous post, but apparently not.

I gave a possible basis for a difference. Moreover, I think it's
germane to the broader debate that's taking place.

Copyright seems to be yet another area of law where binary thresholds of
infraction are invariably challenged by a continuum of real-world examples.

Well, as the law stands now-- both statute and common-- an understanding of the material that's the basis of a copyright claim is not necessary.

The Feynman Lectures on Quantum Mechanics were recently published in a box set:

https://ibb.co/spP27BPV

If I were to photocopy them and start selling the copies, I'd be in violation of copyright regardless of whether I understood the material or not. (Which I most assuredly do not.) The court would not even ask about my understanding of
Dr. Feynman's material. It's simply not a factor with regard to copyright.

If, instead, you were to read the Lectures and then, later, penned your
own presentation of their substance, you would (I assume) *not* be in violation ...because the court would (implicitly) infer that your
version comprised a retelling rather than a transcription. But, if it
could be shown, by inspection, that you'd merely subjected Feynman's
text to a word-by-word substitution of synonyms, then you *would* (I
assume again) be in violation. And, afaics, your 'understanding' of the material is a telltale discriminant, even if not mentioned in statutes.

--- Synchronet 3.22a-Linux NewsLink 1.2

From BTR1701@atropos@mac.com to rec.arts.tv on Tue Jun 30 16:29:57 2026

From Newsgroup: rec.arts.tv

On Jun 30, 2026 at 8:12:31 AM PDT, "moviePig" <nobody@nowhere.com> wrote:

On 6/29/2026 6:55 PM, BTR1701 wrote:

On Jun 29, 2026 at 1:39:28 PM PDT, "moviePig" <nobody@nowhere.com> wrote: >>

On 6/29/2026 3:58 PM, BTR1701 wrote:

On Jun 29, 2026 at 11:25:12 AM PDT, "moviePig" <nobody@nowhere.com> wrote:

On 6/29/2026 12:52 PM, BTR1701 wrote:

On Jun 29, 2026 at 9:42:29 AM PDT, "moviePig" <nobody@nowhere.com> >>>>>> wrote:

On 6/29/2026 12:12 PM, BTR1701 wrote:

On Jun 29, 2026 at 3:28:37 AM PDT, "The True Melissa"
<thetruemelissa@gmail.com> wrote:

Verily, in article <166032159.804394024.173595.anim8rfsk- >>>>>>>>> cox.net@news.easynews.com>, did anim8rfsk@cox.net deliver unto us this
message:

Rhino <no_offline_contact@example.com> wrote:
> [quoted text muted]
>>
>> The companies do their own curation. They used internet >>>>>>>>>>>> conversations at
>> first, but everyone complained, so a common thing now is >>>>>>>>>>>> to buy huge
>> lots of old books, use them for training data, then >>>>>>>>>>>> destroy the hard
>> copies.

Why destroy the hardcopies? Just to reduce clutter? >>>>>>>>>>
Why use hardcopies at all?

It's some legal thing. I think they're reformatting books into the AI,
as opposed to copying them into the AI. People complained about having
their Internet writing used without permission, so now they're doing
this.

I still don't know why an AI reading a bunch of books and using >>>>>>>> that as
the
basis for its knowledge is any different than a human who does >>>>>>>> the same.

A human can simulate an AI, but an AI can't simulate a human. Or so the
theory goes.

Sure, but how is that relevant to the requirements and elements in the
U.S.
Copyright statute?

In other words, the Copyright Statute says nothing about AI or humans or
whether one can simulate the other, so that's irrelevant when >>>>>> determining
whether the law is violated.

I think copyright in general is a fuzzy concept, but...

The distinction might be that the human reads the original material, >>>>> understands it, and then re-renders it *from his understanding*
...whereas I think it can be sensibly argued that the AI never has any >>>>> such understanding, but rather is operating akin to a "Chinese room". >>>>>
( https://en.wikipedia.org/wiki/Chinese_room )

Again, understanding the material is not an element of copyright law. >>>

Your question was:

"...why an AI reading a bunch of books and using that as the basis >>> for its knowledge is any different than a human who does the same."

Different in the context of authors and publishers complaining about
copyright
infringement of their work.

I'd have thought that obvious, given Melissa's assertion of legal claims in >> the previous post, but apparently not.

I gave a possible basis for a difference. Moreover, I think it's
germane to the broader debate that's taking place.

Copyright seems to be yet another area of law where binary thresholds of >>> infraction are invariably challenged by a continuum of real-world examples.

Well, as the law stands now-- both statute and common-- an understanding of >> the material that's the basis of a copyright claim is not necessary.

The Feynman Lectures on Quantum Mechanics were recently published in a box >> set:

https://ibb.co/spP27BPV

If I were to photocopy them and start selling the copies, I'd be in
violation
of copyright regardless of whether I understood the material or not. (Which >> I
most assuredly do not.) The court would not even ask about my understanding >> of
Dr. Feynman's material. It's simply not a factor with regard to copyright.

If, instead, you were to read the Lectures and then, later, penned your
own presentation of their substance, you would (I assume) *not* be in violation ...because the court would (implicitly) infer that your
version comprised a retelling rather than a transcription. But, if it
could be shown, by inspection, that you'd merely subjected Feynman's
text to a word-by-word substitution of synonyms, then you *would* (I
assume again) be in violation. And, afaics, your 'understanding' of the material is a telltale discriminant, even if not mentioned in statutes.

(1) Not necessarily.

(2) No one here, artificial or natural, is doing that.

--- Synchronet 3.22a-Linux NewsLink 1.2

From moviePig@nobody@nowhere.com to rec.arts.tv on Tue Jun 30 14:04:18 2026

From Newsgroup: rec.arts.tv

On 6/30/2026 12:29 PM, BTR1701 wrote:

On Jun 30, 2026 at 8:12:31 AM PDT, "moviePig" <nobody@nowhere.com>
wrote:

On 6/29/2026 6:55 PM, BTR1701 wrote:

On Jun 29, 2026 at 1:39:28 PM PDT, "moviePig"
<nobody@nowhere.com> wrote:

On 6/29/2026 3:58 PM, BTR1701 wrote:

On Jun 29, 2026 at 11:25:12 AM PDT, "moviePig"
<nobody@nowhere.com> wrote:

On 6/29/2026 12:52 PM, BTR1701 wrote:

On Jun 29, 2026 at 9:42:29 AM PDT, "moviePig"
<nobody@nowhere.com> wrote:

On 6/29/2026 12:12 PM, BTR1701 wrote:

On Jun 29, 2026 at 3:28:37 AM PDT, "The True
Melissa" <thetruemelissa@gmail.com> wrote:

Verily, in article
<166032159.804394024.173595.anim8rfsk-
cox.net@news.easynews.com>, did anim8rfsk@cox.net
deliver unto us this message:

Rhino <no_offline_contact@example.com> wrote:

[quoted text muted]

The companies do their own curation. They
used internet conversations at first, but
everyone complained, so a common thing now
is to buy huge lots of old books, use them
for training data, then destroy the hard
copies.

Why destroy the hardcopies? Just to reduce
clutter?

Why use hardcopies at all?

It's some legal thing. I think they're
reformatting books into the AI, as opposed to
copying them into the AI. People complained about
having their Internet writing used without
permission, so now they're doing this.

I still don't know why an AI reading a bunch of
books and using that as the basis for its knowledge
is any different than a human who does the same.

A human can simulate an AI, but an AI can't simulate a
human. Or so the theory goes.

Sure, but how is that relevant to the requirements and
elements in the U.S. Copyright statute?

In other words, the Copyright Statute says nothing about
AI or humans or whether one can simulate the other, so
that's irrelevant when determining whether the law is
violated.

I think copyright in general is a fuzzy concept, but...

The distinction might be that the human reads the original
material, understands it, and then re-renders it *from his
understanding* ...whereas I think it can be sensibly
argued that the AI never has any such understanding, but
rather is operating akin to a "Chinese room".

( https://en.wikipedia.org/wiki/Chinese_room )

Again, understanding the material is not an element of
copyright law.

Your question was:

"...why an AI reading a bunch of books and using that as the
basis for its knowledge is any different than a human who does
the same."

Different in the context of authors and publishers complaining
about copyright infringement of their work.

I'd have thought that obvious, given Melissa's assertion of
legal claims in the previous post, but apparently not.

I gave a possible basis for a difference. Moreover, I think
it's germane to the broader debate that's taking place.

Copyright seems to be yet another area of law where binary
thresholds of infraction are invariably challenged by a
continuum of real-world examples.

Well, as the law stands now-- both statute and common-- an
understanding of the material that's the basis of a copyright
claim is not necessary.

The Feynman Lectures on Quantum Mechanics were recently
published in a box set:

https://ibb.co/spP27BPV

If I were to photocopy them and start selling the copies, I'd be
in violation of copyright regardless of whether I understood the
material or not. (Which I most assuredly do not.) The court
would not even ask about my understanding of Dr. Feynman's
material. It's simply not a factor with regard to copyright.

If, instead, you were to read the Lectures and then, later, penned
your own presentation of their substance, you would (I assume)
*not* be in violation ...because the court would (implicitly)
infer that your version comprised a retelling rather than a
transcription. But, if it could be shown, by inspection, that
you'd merely subjected Feynman's text to a word-by-word
substitution of synonyms, then you *would* (I assume again) be in
violation. And, afaics, your 'understanding' of the material is a
telltale discriminant, even if not mentioned in statutes.

(1) Not necessarily.

(2) No one here, artificial or natural, is doing that.

Is "here" your photocopying/selling? If so, I maintain that credible
evidence of your *not* doing that would require your 'understanding'.
(That evidence would be a plainly new version of the Lectures.)

Although I don't readily see how to write the 'understanding' step into
a statute, I think it an essential element of any enforcement ...e.g.,
of anyone's claim that they *haven't violated copyright.

--- Synchronet 3.22a-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Sauron
  Sat Jul 4 04:06:01 2026
  from Purgatory via Telnet
- Geek2
  Fri Jul 3 10:44:34 2026
  from Euclid, Oh via Telnet
- Geek2
  Fri Jul 3 07:16:48 2026
  from Euclid, Oh via Telnet
- Hannibal
  Fri Jul 3 01:51:09 2026
  from Des Moines via Telnet

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	70
Nodes:	6 (1 / 5)
Uptime:	02:07:13
Calls:	952
Calls today:	1
Files:	1,326
Messages:	284,339

Ping Melissa

Who's Online

Recent Visitors

System Info