• Cubeful and matchful training a BG bot

    From MK@playbg-rgb@yahoo.com to rec.games.backgammon on Wed Apr 10 15:51:22 2024
    From Newsgroup: rec.games.backgammon

    Hi Ian,

    Since this specific subject also strayd from the value of
    cube ownership, I'll do the same with it by creating a new
    thread and posting it also to RGB and Bgonline. My response
    to you is below the quoted posts.

    ---------------------------------------------------------
    *From:* MK <playbg-rgb@yahoo.com>
    *Sent:* Wednesday, April 3, 2024 10:01:17 PM
    *To:* Ian Shaw <Ian.Shaw@riverauto.co.uk>; GnuBg Bug <bug-gnubg@gnu.org> *Subject:* Re: Interesting question/experiment about value of cube ownership On 4/2/2024 5:13 AM, Ian Shaw wrote:

    What would be your proposed structure for training a
    cubeful bot? What gains and obstacles do you foresee.

    I don't know what you mean by "structure". What I propose
    is doing the same thing done training TD-Gammon v.1, i.e.
    random self-play, but this time also cubeful and matchful,
    i.e. random cube as well as checker decisions.

    Apparently Tseauro still works at IBM with access to huge
    CPU powers. Perhaps he can be put to shame for the damage
    he caused to BG AI by what he did with TD-Gammon v.2 and
    be urged to redeem himself.

    In other forums, people talk about doing "XG rollouts on
    Amazon's cloud servers", etc. Doing more biased rollouts
    is plain stupid/illogical. Any such efforts would be put
    to better use in training a new bot instead. The question
    is who would volunteer to do it.

    People like the Alpha-Zero team, etc. don't seem to want
    to touch "gamblegammon" with a ten feet pole, possibly
    because of the gambling nature of the game.

    In the past, I have suggested in RGB that random rollout
    feature can be added to GnuBG and results from trustable
    users can be collected over time in a central database
    to gradually create a bot that won't rely on concocted,
    biased/inaccurate cube formulas and match equity tables.

    Unfortunately the faithfuls are happy with their dogmas
    and no better bots are likely in the near future... :(

    ----------------------------------------------------------

    On 4/3/2024 11:44 PM, Ian Shaw wrote:
    MK: What I PROPOSE is doing the same thing done training
    TD-Gammon v.1, I.E. random self-play, but this time also
    cubeful and MATCHFUL, i.e. random cube as well as checker
    decisions.

    As I remember it (though it's many years since I read the
    research), the self-play wasn't accomplished by picking
    random moves. It was the initial network weights that were
    random. The move picked was the best-ranked move of all
    the evaluated moves. This is a calculation, not a random
    selection.

    How do you propose to rank double vs no double, and take
    vs pass?

    ----------------------------------------------------------

    I didn't say the selection was random. The self-play moves
    were random. There were no "calculations" either. Moves were
    compared and better performing ones rose up in rank. It was
    kind of a "bubble sorting" of large numbers of statistical
    data. I remember that Tom Keith had used the expression
    "percolating up" in describing how he trained a Hypergammon
    bot through cubeless random self-play. It's the only way,
    (using "empirical data and scientific method"), to train a
    "non-human-biased" BG bot, (at least as best as technically,
    minimally as possible).

    To answer your last question, just like checker decisions,
    cube decisions to double, take, pass, etc. would be random
    also and the "correct" cube decisions would "bubble up" the
    same way. It will take huge amounts of computing power and
    time, but nowadays we have both.

    For "matchful" play, checker and/or cube decisions based on
    match score need to be random as well, even if that requires
    exponentially more computing power and time. Again, we have
    both. It's just a matter of whether we want to do it. We can
    distribute the task and/or spread it over time to let the
    empirical, statistical data trickle in and accumulate.

    Perhaps other people more knowledgeable in bot training can
    suggest ways to go about it in more technical details.

    MK
    --- Synchronet 3.21b-Linux NewsLink 1.2