Forum: Too Lazy BBS

What's purpose of "gather" instructions?

From Branimir Maksimovic@branimir.maksimovic@nospicedham.gmail.com to comp.lang.asm.x86 on Thu May 27 10:37:25 2021

From Newsgroup: comp.lang.asm.x86

I tried with them recenlty and they are slow, slow,
slower then manualy loading ;)
I mean like "loop" instruction, uselless ;)
--
current job title: senior software engineer
skills: x86 aasembler,c++,c,rust,go,nim,haskell...

press any key to continue or any other to quit...

--- Synchronet 3.21d-Linux NewsLink 1.2

From Terje Mathisen@terje.mathisen@nospicedham.tmsw.no to comp.lang.asm.x86 on Thu May 27 16:23:59 2021

From Newsgroup: comp.lang.asm.x86

Branimir Maksimovic wrote:

I tried with them recenlty and they are slow, slow,
slower then manualy loading ;)
I mean like "loop" instruction, uselless ;)

Gather is supposed to run at minimum one word per cycle, but preferably
all loads that come from the same cache line should happen in a single
cycle, so that looking up stuff in a compact structure should be
reasonably fast, and much faster than scalar loads.

The first Larrabee CPU had gather implemented in an external chip, so it
was effectively a coprocessor. The idea was that you would setup a bunch
of these as part of a big processing loop, then stream the results through.

I.e. typical GPU optimizing for bandwidth, not latency.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

--- Synchronet 3.21d-Linux NewsLink 1.2

From anton@anton@nospicedham.mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.asm.x86 on Thu May 27 14:56:51 2021

From Newsgroup: comp.lang.asm.x86

Branimir Maksimovic <branimir.maksimovic@nospicedham.gmail.com> writes:

I tried with them recenlty and they are slow, slow,
slower then manualy loading ;)
I mean like "loop" instruction, uselless ;)

Possible explanations:

1) An instruction set designer thought that this could be implemented
better than by using scalar loads, but

a) the hardware designers did not get around to it.
b) the hardware designers tried, but the result was buggy, and was
disabled in delivered hardware.

Still, there is a slight benefit to having these instructions: If
there ever is a useful hardware implementation, software people can
use it in the knowledge that their code will at least run on a
variety of hardware (some may have a switch between using gather
instructions and scalar code, but not everyone can afford
development time for all CPU variations).

2) The instruction already worked better than the scalar code in the
Xeon Phi (I dimly remember reading something like that, although
looking at the cycle numbers I found the claim questionable), and
was added to other CPUs to support software that uses the
instruction. The problem with this theory is that Xeon Phi
supports (a variant of) AVX-512, but the Haswell and Skylake
(client) support only AVX2.

- anton
--
M. Anton Ertl Some things have to be seen to be believed anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen http://www.complang.tuwien.ac.at/anton/home.html

--- Synchronet 3.21d-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Geek2
  Sun May 17 07:06:15 2026
  from Euclid, Oh via Telnet
- Geek2
  Sat May 16 21:25:04 2026
  from Euclid, Oh via Telnet
- Jas Hud
  Sat May 16 00:50:28 2026
  from Bbs.Eob-Bbs.Com,wi via Telnet
- Geek2
  Fri May 15 19:53:20 2026
  from Euclid, Oh via Telnet

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	65
Nodes:	6 (0 / 6)
Uptime:	09:22:59
Calls:	862
Files:	1,311
D/L today:	2 files (6,679K bytes)
Messages:	265,083

What's purpose of "gather" instructions?

Who's Online

Recent Visitors

System Info