• 8086 32-bit multiply

    From Paul Edwards@mutazilah@nospicedham.gmail.com to comp.lang.asm.x86 on Fri Apr 23 04:42:29 2021
    From Newsgroup: comp.lang.asm.x86

    Hi.

    Since 1994 I have been working on a project to
    create a public domain version of MSDOS, called
    PDOS. There is an 8086 version and an 80386
    version which can be found here:

    http://pdos.sourceforge.net/

    I took some shortcuts along the way to get it to
    work at all, and one of those has finally bitten me.

    I'm getting incorrect results from this:

    https://sourceforge.net/p/pdos/gitcode/ci/master/tree/pdpclib/dossupa.asm

    ; multiply cx:bx by dx:ax, result in dx:ax

    public __I4M
    __I4M:
    public __U4M
    __U4M:
    public f_lxmul@
    f_lxmul@ proc
    push bp
    mov bp,sp
    push cx

    push ax
    mul cx
    mov cx, ax
    pop ax
    mul bx
    add dx, cx

    pop cx
    pop bp
    ret
    f_lxmul@ endp


    Does anyone have some public domain (explicit notice)
    8086 (not 80386) code they are willing to share to do
    this? Not LGPL. Not BSD. Public domain. The entire
    codebase of tens of thousands of lines of code is
    public domain.

    Also let me know if you wish to be acknowledged in
    the source code and/or code check-in. Some people
    prefer to remain anonymous.

    There are other routines in there that may not work
    properly either, but I haven't come across them yet.

    Thanks. Paul.

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From DJ Delorie@dj@nospicedham.delorie.com to comp.lang.asm.x86 on Fri Apr 23 19:24:37 2021
    From Newsgroup: comp.lang.asm.x86

    Paul Edwards <mutazilah@nospicedham.gmail.com> writes:
    ; multiply cx:bx by dx:ax, result in dx:ax

    Such would have three multiplies and a few adds:

    LSW = bx * ax (lower 16, save upper 16 in XX)

    MSW = bx * dx + cx * ax + XX (from lsw)

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From wolfgang kern@nowhere@nospicedham.never.at to comp.lang.asm.x86 on Sat Apr 24 02:46:36 2021
    From Newsgroup: comp.lang.asm.x86

    On 23.04.2021 13:42, Paul Edwards wrote:

    [x8086 only]

    ; multiply cx:bx by dx:ax, result in dx:ax

    the result of 32*32 bit doesn't fit into 32 bit.
    either go with the given limits (16*16 bit) or
    build a cascade with intermediate variables aka
    MUL-ADD chains.
    __
    wolfgang

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Paul Edwards@mutazilah@nospicedham.gmail.com to comp.lang.asm.x86 on Fri Apr 23 20:16:04 2021
    From Newsgroup: comp.lang.asm.x86

    On Saturday, April 24, 2021 at 9:36:28 AM UTC+10, DJ Delorie wrote:
    Paul Edwards <muta...@nospicedham.gmail.com> writes:
    ; multiply cx:bx by dx:ax, result in dx:ax
    Such would have three multiplies and a few adds:

    LSW = bx * ax (lower 16, save upper 16 in XX)

    MSW = bx * dx + cx * ax + XX (from lsw)

    Thanks for the algorithm! I thought I might be able to do that,
    but my brain started to melt down. Here's what I came up with,
    which causes a hang, but at least it happened after I got the
    results of some calculations. I'll see if I can figure out what
    is happening.


    ; multiply cx:bx by dx:ax, result in dx:ax

    public __I4M
    __I4M:
    public __U4M
    __U4M:
    public f_lxmul@
    f_lxmul@ proc
    push bp
    mov bp,sp
    push bx
    push cx
    push si
    push di

    push ax
    push bx

    ; I think this multiples bx * ax and puts the upper 16 bits in ax
    ; and lower 16 bits in bx
    mul bx

    ; Save upper 16 in si and lower 16 in di
    mov si, ax
    mov di, bx

    ; This does the equivalent of bx * dx
    pop bx
    mov ax, dx
    mul bx
    mov dx, ax

    ; Now we do cx * ax with upper 16 bits in ax and lower in cx
    pop ax
    mul cx

    ; Now we need to add the results of those two multiplies together
    ; lower 16 bits first, so we can get the carry
    push bp ; ran out of registers!
    mov bp, bx
    mov bx, ax
    mov ax, 1
    add dx, cx
    jc noone
    mov ax, 1
    noone:

    push ax

    ; Now the other lower 16 bits we saved
    mov ax, 1
    add dx, di
    jc noone2
    mov ax, 1
    noone2:

    push ax

    ; Upper 16 bits
    mov ax, bx
    add bx, ax
    pop ax
    add bx, ax ; one carry
    pop ax
    add bx, ax ; the other carry
    mov ax, bp
    add bx, ax

    ; store in proper output register
    mov dx, bx

    pop bp

    pop di
    pop si
    pop cx
    pop bx
    pop bp
    ret
    f_lxmul@ endp


    BFN. Paul.

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Paul Edwards@mutazilah@nospicedham.gmail.com to comp.lang.asm.x86 on Fri Apr 23 20:17:33 2021
    From Newsgroup: comp.lang.asm.x86

    On Saturday, April 24, 2021 at 10:51:35 AM UTC+10, wolfgang kern wrote:

    [x8086 only]
    ; multiply cx:bx by dx:ax, result in dx:ax
    the result of 32*32 bit doesn't fit into 32 bit.

    Good point. I didn't think of that. I can't multiply
    17 bits by 17 bits, one of the registers needs to
    be 0. But I assume I need to at least overflow in
    a predictable manner.

    either go with the given limits (16*16 bit) or
    build a cascade with intermediate variables aka
    MUL-ADD chains.

    See my most recent post. :-)

    BFN. Paul.

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From wolfgang kern@nowhere@nospicedham.never.at to comp.lang.asm.x86 on Sat Apr 24 10:36:46 2021
    From Newsgroup: comp.lang.asm.x86

    On 24.04.2021 05:17, Paul Edwards wrote:

    [x8086 only]
    ; multiply cx:bx by dx:ax, result in dx:ax
    the result of 32*32 bit doesn't fit into 32 bit.

    Good point. I didn't think of that. I can't multiply
    17 bits by 17 bits, one of the registers needs to
    be 0. But I assume I need to at least overflow in
    a predictable manner.

    either go with the given limits (16*16 bit) or
    build a cascade with intermediate variables aka
    MUL-ADD chains.

    See my most recent post. :-)

    you create a stack frame but use not a single variable there.
    and it may hang because your stack isn't balanced.
    __
    wolfgang

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Terje Mathisen@terje.mathisen@nospicedham.tmsw.no to comp.lang.asm.x86 on Sat Apr 24 12:17:08 2021
    From Newsgroup: comp.lang.asm.x86

    Paul Edwards wrote:
    Hi.

    Since 1994 I have been working on a project to
    create a public domain version of MSDOS, called
    PDOS. There is an 8086 version and an 80386
    version which can be found here:

    http://pdos.sourceforge.net/

    I took some shortcuts along the way to get it to
    work at all, and one of those has finally bitten me.

    I'm getting incorrect results from this:

    https://sourceforge.net/p/pdos/gitcode/ci/master/tree/pdpclib/dossupa.asm

    ; multiply cx:bx by dx:ax, result in dx:ax

    public __I4M
    __I4M:
    public __U4M
    __U4M:
    public f_lxmul@
    f_lxmul@ proc
    push bp
    mov bp,sp
    push cx

    push ax
    mul cx
    mov cx, ax
    pop ax
    mul bx
    add dx, cx

    pop cx
    pop bp
    ret
    f_lxmul@ endp


    As several have noted, the code above is missing at least one MUL!

    Please test it, then feel free to use (with or without attribution) this totally untested but reasonably efficent/short code:

    mov si,ax
    mov di,dx
    mul cx ;; hi * lo
    xchg ax,di ;; First mul saved, grab org dx
    mul bx ;; lo * hi
    add di,ax ;; top word of result

    mov ax,si ;; retrieve original AX
    mul bx ;; lo * lo
    add dx,di

    At this point DX:AX has the low 32 bits of the multiplication result.

    Terje
    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From anton@anton@nospicedham.mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.asm.x86 on Sat Apr 24 14:01:21 2021
    From Newsgroup: comp.lang.asm.x86

    Paul Edwards <mutazilah@nospicedham.gmail.com> writes:
    On Saturday, April 24, 2021 at 10:51:35 AM UTC+10, wolfgang kern wrote:

    [x8086 only]
    ; multiply cx:bx by dx:ax, result in dx:ax
    the result of 32*32 bit doesn't fit into 32 bit.

    Good point. I didn't think of that. I can't multiply
    17 bits by 17 bits, one of the registers needs to
    be 0. But I assume I need to at least overflow in
    a predictable manner.

    The usual way is to produce the lower 32 bits of the result, i.e.,
    produce a*b mod 2^32. And thanks to the magic of 2s-complement
    arithmetic, the result is the same for unsigned multiplication and for
    signed multiplication (the results for the high 32 bits would differ,
    but you are not interested in that).

    - anton
    --
    M. Anton Ertl Some things have to be seen to be believed anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen http://www.complang.tuwien.ac.at/anton/home.html

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Paul Edwards@mutazilah@nospicedham.gmail.com to comp.lang.asm.x86 on Sat Apr 24 14:00:07 2021
    From Newsgroup: comp.lang.asm.x86

    On Saturday, April 24, 2021 at 8:22:39 PM UTC+10, Terje Mathisen wrote:
    Paul Edwards wrote:
    Hi.

    Since 1994 I have been working on a project to
    create a public domain version of MSDOS, called
    PDOS. There is an 8086 version and an 80386
    version which can be found here:

    http://pdos.sourceforge.net/

    I took some shortcuts along the way to get it to
    work at all, and one of those has finally bitten me.

    I'm getting incorrect results from this:

    https://sourceforge.net/p/pdos/gitcode/ci/master/tree/pdpclib/dossupa.asm

    ; multiply cx:bx by dx:ax, result in dx:ax

    public __I4M
    __I4M:
    public __U4M
    __U4M:
    public f_lxmul@
    f_lxmul@ proc
    push bp
    mov bp,sp
    push cx

    push ax
    mul cx
    mov cx, ax
    pop ax
    mul bx
    add dx, cx

    pop cx
    pop bp
    ret
    f_lxmul@ endp

    As several have noted, the code above is missing at least one MUL!

    Please test it, then feel free to use (with or without attribution) this totally untested but reasonably efficent/short code:

    mov si,ax
    mov di,dx
    mul cx ;; hi * lo
    xchg ax,di ;; First mul saved, grab org dx
    mul bx ;; lo * hi
    add di,ax ;; top word of result

    mov ax,si ;; retrieve original AX
    mul bx ;; lo * lo
    add dx,di

    At this point DX:AX has the low 32 bits of the multiplication result.

    Thanks so much!!!

    I have tested it and it works fine. I have committed the
    change, with attribution:

    https://sourceforge.net/p/pdos/gitcode/ci/master/tree/pdpclib/dossupa.asm

    BFN. Paul.

    --- Synchronet 3.21d-Linux NewsLink 1.2