• rep movsb vs. simpler instructions for memcpy/memmove (was: Why VAX ..)

    From Anton Ertl@21:1/5 to Michael S on Wed Mar 12 16:46:36 2025
    Michael S <already5chosen@yahoo.com> writes:
    On Wed, 12 Mar 2025 11:28:36 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
    My experiments were with the code in
    <https://github.com/AntonErtl/move/>.

    Non of those are simple loops that I mentioned above.

    They are not. If you want short code, rep movsb is unbeatable (for
    memmove(), you have to do a little more, however).

    I posted performance results in
    <2017Sep19.082137@mips.complang.tuwien.ac.at>
    <2017Sep20.184358@mips.complang.tuwien.ac.at>
    <2017Sep23.174313@mips.complang.tuwien.ac.at>

    My routines were generally faster than rep movsb, except for pretty
    large blocks (16KB).


    Idiots from corporate IT blocked http://al.howardknight.net/

    I feel with you. In my workplace, Usenet is blocked (probably unintentionally). I have to post from home.

    So, link to google groups

    Sorry, I cannot provide that service. Trying to access
    groups.google.com tells me:

    |Couldn’t sign you in
    |
    |The browser you’re using doesn’t support JavaScript, or has JavaScript |turned off.
    |
    |To keep your Google Account secure, try signing in on a browser that
    |has JavaScript turned on.

    I certainly won't turn on JavaScript for Google, and apparently Google
    wants me to log in to a Google account to access groups.google.com. I
    don't have a Google account and I don't want one.

    But all I would do is try whether google groups finds the message-ids.
    You can do that yourself.

    or, if posts are relatively recent, to >https://www.novabbs.com/devel/thread.php?group=comp.arch
    would be helpful.

    The posts are from 2017; these message-ids are not random-generated.

    I don't know why gnu memcpy is huge. I don't even know if it is
    really *that* huge. But several KB is number that I had seen
    stated by other people.

    I stated in one of these messages that I have seen an 11KB memmove in
    glibc. Let's see:

    objdump -t /debian8/usr/lib/x86_64-linux-gnu/libc.a|grep .text|grep 'memmove' 00000000000001a0 g i .text 0000000000000047 __libc_memmove 0000000000000000 g F .text 000000000000019f __memmove_sse2 00000000000001a0 g i .text 0000000000000047 memmove
    0000000000000000 g F .text.ssse3 0000000000000009 __memmove_chk_ssse3 0000000000000010 g F .text.ssse3 0000000000002b67 __memmove_ssse3 0000000000000000 g F .text.ssse3 0000000000000009 __memmove_chk_ssse3_back
    0000000000000010 g F .text.ssse3 0000000000002b06 __memmove_ssse3_back ...

    Yes, 11111 bytes for __memmove_ssse3. Debian 8 is one of the systems
    I used at the time.

    Let's see how it looks in Debian 12:

    objdump -t /usr/lib/x86_64-linux-gnu/libc.a|grep .text|grep 'memmove'|grep -v wmemmove
    0000000000000000 l F .text 00000000000000f6 __libc_memmove_ifunc 0000000000000000 g i .text 00000000000000f6 __libc_memmove 0000000000000000 g i .text 00000000000000f6 memmove
    0000000000000010 g F .text.avx 000000000000002f __memmove_avx_unaligned
    0000000000000080 g F .text.avx 00000000000006de __memmove_avx_unaligned_erms
    0000000000000010 g F .text.avx.rtm 000000000000002d __memmove_avx_unaligned_rtm
    0000000000000080 g F .text.avx.rtm 00000000000006df __memmove_avx_unaligned_erms_rtm
    0000000000000020 g F .text.avx512 0000000000000009 __memmove_chk_avx512_no_vzeroupper
    0000000000000030 g F .text.avx512 000000000000073b __memmove_avx512_no_vzeroupper
    0000000000000010 g F .text.evex512 0000000000000037 __memmove_avx512_unaligned
    0000000000000080 g F .text.evex512 00000000000007a0 __memmove_avx512_unaligned_erms
    0000000000000020 g F .text 0000000000000009 __memmove_chk_erms 0000000000000030 g F .text 000000000000002d __memmove_erms 0000000000000010 g F .text.evex 0000000000000034 __memmove_evex_unaligned
    0000000000000080 g F .text.evex 00000000000007bb __memmove_evex_unaligned_erms
    0000000000000010 g F .text 0000000000000028 __memmove_sse2_unaligned 0000000000000080 g F .text 0000000000000552 __memmove_sse2_unaligned_erms 0000000000000040 g F .text.ssse3 0000000000000f3d __memmove_ssse3 0000000000000000 g F .text 000000000000000e __memmove_chk

    So __memmove_ssse3 is no longer that big ("only" 3901 bytes); it's
    still the biggest implementation, but many others are quite a bit
    bigger than the 0x113=275 bytes of my ssememmove.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Anton Ertl on Thu Mar 13 23:06:02 2025
    On Wed, 12 Mar 2025 16:46:36 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    Michael S <already5chosen@yahoo.com> writes:
    On Wed, 12 Mar 2025 11:28:36 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
    My experiments were with the code in
    <https://github.com/AntonErtl/move/>.

    Non of those are simple loops that I mentioned above.

    They are not. If you want short code, rep movsb is unbeatable (for memmove(), you have to do a little more, however).

    I posted performance results in
    <2017Sep19.082137@mips.complang.tuwien.ac.at>
    <2017Sep20.184358@mips.complang.tuwien.ac.at>
    <2017Sep23.174313@mips.complang.tuwien.ac.at>

    My routines were generally faster than rep movsb, except for pretty
    large blocks (16KB).


    Idiots from corporate IT blocked http://al.howardknight.net/

    I feel with you. In my workplace, Usenet is blocked (probably unintentionally). I have to post from home.

    So, link to google groups

    Sorry, I cannot provide that service. Trying to access
    groups.google.com tells me:

    |Couldn’t sign you in
    |
    |The browser you’re using doesn’t support JavaScript, or has JavaScript |turned off.
    |
    |To keep your Google Account secure, try signing in on a browser that
    |has JavaScript turned on.

    I certainly won't turn on JavaScript for Google, and apparently Google
    wants me to log in to a Google account to access groups.google.com. I
    don't have a Google account and I don't want one.


    For me it works fine without login. But not without JS.
    For those who are willing to use JS, the link: https://groups.google.com/g/comp.arch/c/ULvFgEM_ZSY/m/ysPySToGAwAJ

    But all I would do is try whether google groups finds the message-ids.
    You can do that yourself.


    GG only searches by contexts. It appears to have no idea about message
    ids.

    or, if posts are relatively recent, to >https://www.novabbs.com/devel/thread.php?group=comp.arch
    would be helpful.

    The posts are from 2017; these message-ids are not random-generated.


    Then GG is the only place to find it that I am aware of. http://al.howardknight.net helped me to see that start of the message,
    but not the full message.
    And eternal-september is still struggling with restoration of its
    archives after the crash of 9 months ago. More and more it looks like
    they will never be restored.

    I don't know why gnu memcpy is huge. I don't even know if it is
    really *that* huge. But several KB is number that I had seen
    stated by other people.

    I stated in one of these messages that I have seen an 11KB memmove in
    glibc. Let's see:

    objdump -t /debian8/usr/lib/x86_64-linux-gnu/libc.a|grep .text|grep
    'memmove' 00000000000001a0 g i .text 0000000000000047
    __libc_memmove 0000000000000000 g F .text 000000000000019f __memmove_sse2 00000000000001a0 g i .text 0000000000000047
    memmove 0000000000000000 g F .text.ssse3 0000000000000009 __memmove_chk_ssse3 0000000000000010 g F .text.ssse3
    0000000000002b67 __memmove_ssse3 0000000000000000 g F .text.ssse3
    0000000000000009 __memmove_chk_ssse3_back 0000000000000010 g F .text.ssse3 0000000000002b06 __memmove_ssse3_back ...

    Yes, 11111 bytes for __memmove_ssse3. Debian 8 is one of the systems
    I used at the time.

    Let's see how it looks in Debian 12:

    objdump -t /usr/lib/x86_64-linux-gnu/libc.a|grep .text|grep
    'memmove'|grep -v wmemmove 0000000000000000 l F .text
    00000000000000f6 __libc_memmove_ifunc 0000000000000000 g i .text 00000000000000f6 __libc_memmove 0000000000000000 g i .text 00000000000000f6 memmove 0000000000000010 g F .text.avx
    000000000000002f __memmove_avx_unaligned 0000000000000080 g F
    .text.avx 00000000000006de __memmove_avx_unaligned_erms
    0000000000000010 g F .text.avx.rtm 000000000000002d __memmove_avx_unaligned_rtm 0000000000000080 g F .text.avx.rtm 00000000000006df __memmove_avx_unaligned_erms_rtm 0000000000000020 g
    F .text.avx512 0000000000000009
    __memmove_chk_avx512_no_vzeroupper 0000000000000030 g F
    .text.avx512 000000000000073b __memmove_avx512_no_vzeroupper 0000000000000010 g F .text.evex512 0000000000000037 __memmove_avx512_unaligned 0000000000000080 g F .text.evex512 00000000000007a0 __memmove_avx512_unaligned_erms 0000000000000020 g
    F .text 0000000000000009 __memmove_chk_erms 0000000000000030 g
    F .text 000000000000002d __memmove_erms 0000000000000010 g F
    .text.evex 0000000000000034 __memmove_evex_unaligned
    0000000000000080 g F .text.evex 00000000000007bb __memmove_evex_unaligned_erms 0000000000000010 g F .text
    0000000000000028 __memmove_sse2_unaligned 0000000000000080 g F
    .text 0000000000000552 __memmove_sse2_unaligned_erms
    0000000000000040 g F .text.ssse3 0000000000000f3d
    __memmove_ssse3 0000000000000000 g F .text 000000000000000e __memmove_chk

    So __memmove_ssse3 is no longer that big ("only" 3901 bytes); it's
    still the biggest implementation, but many others are quite a bit
    bigger than the 0x113=275 bytes of my ssememmove.

    - anton

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Michael S on Fri Mar 14 12:43:27 2025
    Michael S <already5chosen@yahoo.com> writes:
    On Wed, 12 Mar 2025 16:46:36 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    Michael S <already5chosen@yahoo.com> writes:
    On Wed, 12 Mar 2025 11:28:36 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote: =20
    My experiments were with the code in
    <https://github.com/AntonErtl/move/>. =20
    ...
    I posted performance results in
    <2017Sep19.082137@mips.complang.tuwien.ac.at>
    <2017Sep20.184358@mips.complang.tuwien.ac.at>
    <2017Sep23.174313@mips.complang.tuwien.ac.at>
    ...
    http://al.howardknight.net helped me to see that start of the message,
    but not the full message.=20

    That's deplorable. The postings with the second and third Message-Id
    are delivered from http://al.howardknight.net in full. For <2017Sep19.082137@mips.complang.tuwien.ac.at>, the remaining parts
    (including a few lines still shown by http://al.howardknight.net) are:

    |K8 (Athlon 64 X2 4400+), glibc 2.3.6
    | 1 8 32 64 128 256 512 1K 2K 4K 8K 16K block size
    | 21 28 54 90 162 307 595 1171 2325 4632 9244 18467 repmovsb
    | 17 40 69 80 104 161 253 433 794 1514 2955 5836 memmove
    | 24 31 57 82 98 129 199 323 570 1064 2053 4032 memcpy
    | 21 28 53 87 155 292 566 1113 2206 4394 8768 17516 repmovsb aligned | 17 40 33 37 46 68 118 234 451 834 1635 3237 memmove aligned
    | 24 31 56 45 54 72 120 193 338 627 1207 2367 memcpy aligned
    | 17 27 53 89 161 306 594 1171 2325 4629 9248 18461 repmovsb blksz-1 | 17 37 61 81 105 152 251 433 792 1513 2952 5825 memmove blksz-1
    | 20 30 56 83 100 130 202 325 572 1067 2054 4030 memcpy blksz-1
    |
    |K10 (Phenom II X2 560), glibc 2.19
    | 1 8 32 64 128 256 512 1K 2K 4K 8K 16K block size
    | 15 22 48 84 157 309 566 1080 2107 4161 8270 16487 repmovsb
    | 16 35 56 69 104 152 262 456 839 1604 3135 6201 memmove
    | 16 19 13 19 31 68 114 226 408 774 1505 2968 memcpy
    | 14 21 48 85 158 122 154 219 348 606 1122 2155 repmovsb aligned
    | 16 39 35 38 46 63 95 190 364 664 1268 2583 memmove aligned
    | 19 21 13 20 25 56 89 177 306 566 1084 2121 memcpy aligned
    | 14 21 47 83 155 300 565 1079 2106 4160 8269 16487 repmovsb blksz-1 | 17 32 55 68 91 156 261 454 837 1602 3131 6190 memmove blksz-1
    | 17 23 13 18 30 69 114 228 411 774 1508 2966 memcpy blksz-1
    |
    |Zen (Ryzen 5 1600X), glibc 2.24
    | 1 8 32 64 128 256 512 1K 2K 4K 8K 16K block size
    | 25 33 57 105 110 119 140 184 321 599 1160 2324 repmovsb
    | 13 14 13 14 30 42 65 107 175 325 600 1222 memmove
    | 10 10 11 12 30 43 67 113 185 329 604 1226 memcpy
    | 25 33 57 83 87 95 111 143 207 335 594 1136 repmovsb aligned
    | 12 13 12 13 16 24 40 72 136 264 536 1094 memmove aligned
    | 11 11 12 11 21 27 42 74 139 267 541 1092 memcpy aligned
    | 23 32 56 90 110 120 140 184 321 600 1160 2324 repmovsb blksz-1
    | 13 13 14 13 30 42 67 108 176 325 599 1219 memmove blksz-1
    | 10 10 11 12 31 43 67 113 185 331 604 1221 memcpy blksz-1
    |
    |Zen (Ryzen 5 1600X), glibc 2.3.6 (-static)
    | 1 8 32 64 128 256 512 1K 2K 4K 8K 16K block size
    | 25 32 56 106 111 119 140 184 321 600 1161 2334 repmovsb
    | 10 18 29 36 49 77 132 263 501 940 1816 3581 memmove
    | 26 34 59 80 88 102 133 198 342 599 1114 2182 memcpy
    | 25 33 56 85 89 97 113 145 209 337 595 1145 repmovsb aligned
    | 10 18 20 19 24 40 72 137 286 542 1054 2110 memmove aligned
    | 26 34 59 50 55 70 100 165 311 567 1079 2126 memcpy aligned
    | 22 32 56 90 111 119 142 184 321 600 1161 2338 repmovsb blksz-1
    | 8 16 29 36 49 76 131 261 499 938 1814 3582 memmove blksz-1
    | 24 33 58 82 88 101 134 198 345 602 1117 2184 memcpy blksz-1

    And eternal-september is still struggling with restoration of its
    archives after the crash of 9 months ago. More and more it looks like
    they will never be restored.

    My impression pretty soon after the event was that it would not
    happen. Given that they lost the mapping message-id <-> article
    number, the insertion of the old messages would have been disruptive
    to clients that work with the article number. It was bad enough as it
    was.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)