Forum: Too Lazy BBS

Book to catch up on current AI?

From Mike Spencer@mds@bogus.nodomain.nowhere to comp.misc on Wed Sep 10 01:40:52 2025

From Newsgroup: comp.misc

Nearly 40 years ago, the MIT press published Parallel Distributed
Processing, Vol. 1 & 2, by Rumelhart, McClelland et al. I read those
as well as similar material published at MIT in the early 90s, wrote
some functional toy code (on an Osborne I).

But I haven't kept up.

Can someone suggest books at more or less the same level of
technicality that that I might look at to catch up a bit on how neural
nets are now constructed, trained, connected etc. to produce what is
being called "large language models"?

The net is of course rife, indeed inundated, with stuff on the topic.
But the vast bulk of it falls into one of two categories. One category
is mass media news and pop science reporting, intended to provoke "Oh,
gee whiz" by the average person or at best a vague notion of the
subject for for the literate but non-technical. The other category
is material intended for someone who has read all the technical
literature for the last 40 years or at least has obtained a master's
degree in AI computing/theory in the last decade. In the latter case,
just the terminology is a barrier.

I'm now an old guy. I'm not going to completely beat up all the math
that has evolved since PDP but I'd like to get a more or less
caught-up handle on how this stuff works internally.

Any suggestions?

[ Yes, I had a look at some of the AI newsgroups. Moribund or
highjacked by politics.]
--
Mike Spencer Nova Scotia, Canada
--- Synchronet 3.21a-Linux NewsLink 1.2

From David LaRue@huey.dll@tampabay.rr.com to comp.misc on Wed Sep 10 06:59:12 2025

From Newsgroup: comp.misc

Mike Spencer <mds@bogus.nodomain.nowhere> wrote in news:874itakjhn.fsf@enoch.nodomain.nowhere:

Nearly 40 years ago, the MIT press published Parallel Distributed
Processing, Vol. 1 & 2, by Rumelhart, McClelland et al. I read those
as well as similar material published at MIT in the early 90s, wrote
some functional toy code (on an Osborne I).

But I haven't kept up.

Can someone suggest books at more or less the same level of
technicality that that I might look at to catch up a bit on how neural
nets are now constructed, trained, connected etc. to produce what is
being called "large language models"?

The net is of course rife, indeed inundated, with stuff on the topic.
But the vast bulk of it falls into one of two categories. One category
is mass media news and pop science reporting, intended to provoke "Oh,
gee whiz" by the average person or at best a vague notion of the
subject for for the literate but non-technical. The other category
is material intended for someone who has read all the technical
literature for the last 40 years or at least has obtained a master's
degree in AI computing/theory in the last decade. In the latter case,
just the terminology is a barrier.

I'm now an old guy. I'm not going to completely beat up all the math
that has evolved since PDP but I'd like to get a more or less
caught-up handle on how this stuff works internally.

Any suggestions?

[ Yes, I had a look at some of the AI newsgroups. Moribund or
highjacked by politics.]

Hi Mike,

You might look up Stephane Charette's work on Dark Net. He has several examples and technical descriptions online. His work and examples mainly focus on video recognition of still frames. LLMs, from my limited understanding, are just massive text training sets obtained by grabbing everything the developer deems potentially useful and using that to train massive AI Networks.

He does explain the math and training concepts he found useful.

https://www.youtube.com/c/StephaneCharette/videos
--- Synchronet 3.21a-Linux NewsLink 1.2

From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.misc on Wed Sep 10 09:15:24 2025

From Newsgroup: comp.misc

Mike Spencer <mds@bogus.nodomain.nowhere> wrote or quoted:

Can someone suggest books at more or less the same level of
technicality that that I might look at to catch up a bit on how neural
nets are now constructed, trained, connected etc. to produce what is
being called "large language models"?

From short to long:

"The Little Book of Deep Learning" - Fran|oois Fleuret
(GPT in section 5.3 "Attention Models")

"Understanding Deep Learning" - Simon J.D. Prince (section 12.5
"Transformers for natural language processing")

"Dive into Deep Learning" - Aston Zhang, Zachary C. Lipton,
Mu Li, And Alexander J. Smola (chapter 11 "Attention
Mechanisms and Transformers")

Hands-on:

"The Complete Guide to Deep Learning with Python Keras, Tensorflow,
And Pytorch" - Joseph G. Derek (section 15.4 "NLP Applications:
Chatbots, Sentiment Analysis Tools") - only conditionally
recommended due to deficiencies in source code formatting

--- Synchronet 3.21a-Linux NewsLink 1.2

From Bruce@07.013@scorecrow.com to comp.misc on Mon Sep 15 17:44:41 2025

From Newsgroup: comp.misc

On 10/09/2025 05:40, Mike Spencer wrote:

Any suggestions?

<https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/>

(In case you didn't recognise the name, Stephen Wolfram is the person
behind Mathematica <https://www.wolfram.com/mathematica/>)
--
Bruce Horrocks
Hampshire, England
--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	59
Nodes:	6 (0 / 6)
Uptime:	03:59:58
Calls:	810
Files:	1,287
D/L today:	5 files (10,057K bytes)
Messages:	203,128

Book to catch up on current AI?

Who's Online

System Info