Forum: Too Lazy BBS

Fwd: Fast lookup of 234 possible 1/2/3 character codes

From Robert Prins@robert@nospicedham.prino.org to comp.lang.asm.x86 on Mon Mar 20 20:58:39 2023

From Newsgroup: comp.lang.asm.x86

On 2023-03-09 15:28, Robert Prins wrote:

On 2023-03-09 10:19, Terje Mathisen wrote:

This is much better!

Thank you!

I still think you should pay more attention to the cost of branches, i.e. a >> sequential scan of a shortish segment of the country array is very cheap, on
the order of 2-3 clock cycles/country, while a single mispredicted branch cost
8-20 cycles depending upon the CPU you are using.

An AMD FX8150 and an Intel i7-4710MQ.

Assuming this really is so important, why not "waste" a bit more lookup table
RAM for a full 2-char index?

It's Utterly Unimportant (with two huge "U"s), the program as-is runs in 0.32
seconds clock-time, but it's just interesting to see how far I can go in pure
Pascal, to translate that subsequently into in-line assembler, where I'm in many
cases harking back to Turbo Pascal 3 times, by coding post-Pentium instructions
as "db" sequences, the original code dates back to 1994, and pretty amazingly,
many of my original record structures are very AVX friendly!

Anyway, given that my ['A'..'Z'] array had one position open, I've used that to
replace the general last-matched country with a per-initial character last-matched country, and that had cut the chop-count even more:

Original full range-a-a-a-a-a-a-a-a-a-a-a-a-a-a-a-a : 40,362
Low/high per initial character (pic): 14,907
As previous + first most used pic-a-a :-a 2,871
As previous + cache pic-a-a-a-a-a-a-a-a-a-a-a-a :-a 1,747

And now I'm going to think slightly outside the box, but will have to knock up a
bit of REXX to actually test this, by what might be counter-intuitive, upping
the low and/or high value of some ranges into the previous/next (or, unlikely,
but never say never, pre-previous or over-next) range, so that an initial first
lookup immediately finds the second most-used country, without generating an excessive number of additional lookups for even less used countries.

E.g. for countries starting with "L" I currently only have three, in order of
use, "LT", "LV" (74 lookups), and "L" (45 lookups), both requiring three chops
for a total of 357 chops. If I were to extend the "L" range into the "M" range
so that a first chop returns "LV" I save 148 chops, which means that even if I
need up to three more chops for "L", I still save time, and for this specific
case, extending the "Lx" range to "MEX" gives me single-chop hits on "LV" and
still the same three-chop hits on "L", so reducing the number of chops to 209.
In this specific case, I could go even further, and take the lo-range back into
the "K" countries and get to "L" in just two chops.

Doing the same for the also easy to try three additional (after "GB") "G*" countries saves me another 96 chops if I put "GR", the second most used, in the
initial middle of the range (by extending the high value to "IR"), without affecting "GE" and "GH"chop counts.

I'm going to see how this works out before doing anything else, as this change
would be easy to implement in Pascal and in my PL/I version of the same on z/OS,
I keep the two and they should give the same output, if they don't I know that
I've screwed up somewhere!

And the optimal for the current (and probably near future) set of data is now

1,173

I did write some REXX to scan both the input data and data that was passed into
the "srch_cnty" routine. I then extended the chop intervals by the length-of-the-interval + 20 on both the left and right (obviously not going below 0 or above 234) and let that run, marking the lowest number of chops and shortest intervals (not less than zero) and that gave me the above number, just
over 2.9% of the original.

I'm going to leave it at that for now. The new intervals will, as I said, unlikely change a lot in the near future, I may encounter a few more nationalities of drivers and visit a few more countries, but in small numbers they won't materially affect the current chopping intervals.

This is what I got out for the per interval scans:

+++ -1- min max 1/2
// 'A' 15 A A AFG AZ 8 (ARK)
// 'B' 18 B B AZ D 32 (BY ) -2-
// 'C' 15 CZ <> CH C DJI 42 (CO )
// 'D' 6 D D CYM EAK 51 (DK ) -2-
// 'E' 12 E E DZ GB 64 (EST) -2-
// 'F' 6 F F FIN FSM 70 (FL ) -2-
// 'G' 14 GB GB G IR 84 (GR ) -2-
// 'H' 5 H <> HR H IR 91 (HR )
// 'I' 7 I <> IL IL J 96 (IRL) -2-
// 'J' 2 J J JA JA 100 (JA )
// 'K' 10 KZ KZ IRQ KZ 103 (KGZ)
// 'L' 7 LT <> L KS MNE 117 (LV ) -2-
// 'M' 15 MA MAL M NA 126 (MK )
// 'N' 11 NL NL LS PNG 133 (N ) -2-
// 'O' 1 OM OM 144 (OM )
// 'P' 11 PL <> PK NZ PY 149 (PK )
// 'Q' 1 Q Q 156 (Q )
// 'R' 24 RO <> RL RA SGP 171 (RMM)
// 'S' 19 S S RMM T 185 (SF ) -2-
// 'T' 12 TR TR T USA 204 (TJ )
// 'U' 5 UA <> USA UA V 214 (USA) -2-
// 'V' 3 VN VN V VU 218 (VN )
// 'W' 9 WAN WAN UZ WV 222 (WAN)
// 'X'
// 'Y' 3 YU YU YAR YV 230 (YU )
// 'Z' 3 ZA ZA Z ZW 233 (ZA )

In other words, if it's not the last scanned value (-1-), I next try the most frequent (+++) value, and after that I scan the sometimes longer interval that is created to (mostly) find the second most occurring value on the first chop, and in some cases even the third most on the second chop, manually added the I-know-these as "-2-" above, and obviously a last found value is saved in the "-1-" position.

And yes, now that I have the code, it wouldn't be too hard to adapt it to actually only include the combination of 95 nationalities of drivers and countries I have visited, and reduce the chop-count even more. Obviously it is highly unlikely to ever beat a look-up table, but the advantage of this is that
it's easily portable between x86 (little endian) and z/OS (big endian).

Robert
--
Robert AH Prins
robert(a)prino(d)org
The hitchhiking grandfather - https://prino.neocities.org/
Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html

--- Synchronet 3.21d-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Geek2
  Thu Jul 2 11:41:05 2026
  from Euclid, Oh via Telnet
- Hannibal
  Thu Jul 2 05:49:27 2026
  from Des Moines via SSH
- Geek2
  Wed Jul 1 16:31:20 2026
  from Euclid, Oh via Telnet
- Hannibal
  Tue Jun 30 16:45:42 2026
  from Des Moines via SSH

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	70
Nodes:	6 (0 / 6)
Uptime:	38:05:33
Calls:	948
Calls today:	2
Files:	1,325
Messages:	280,561

Fwd: Fast lookup of 234 possible 1/2/3 character codes

Who's Online

Recent Visitors

System Info