From Newsgroup: alt.lang.asm
On 2025-08-18 07:47, R.Wieser wrote:
Robert,
There's the obvious, from the existing code:
[snip code]
You're a bit light on information, so I have to make some assumptions :
"il_dloc" is a constant, holding the value 63
Your current strings all have the length 63 (in their first byte)
I read a full line, use 2 YMM moves to move the 63(+1) characters into the string, set string[0] to 63, and scan. Empty strings are not possible, shortest
ones are 4 characters.
Danger will robinson, danger !
Assume an empty string, padded with 63 spaces. ECX will count down to Zero, and only you being lucky that the string-length byte is *not* 32 the loops check will exit.
Just imagine /someone/ has stored 32 spaces (padded with another 31 spaces) and correctly set the strings length. Yep, the string-length byte would
look like another space, causing ECX to underflow and wrap around. (don't
say never, as muphies law tries to tell us. :-) )
iow, for code less likely to bomb you need to check for ECX underflowing (becoming less than One) too.
The program is processing my own data, and there might be a handful of others using it, a few weeks ago, for the first time in a couple of years someone asked
for a copy, but still hasn't used it. It's written (nominally) in Virtual Pascal, but probably well over 90% is nowadays inline assembler, including significant use of post-Pentium (MMX, SSEx and even AVX instructions) Source can
be found at <
https://prino.neocities.org/miscellaneous/hitchtech.html>, in lift32bit.rar
-- part #2
Obviously I could just look for four trailing blanks (EAX), add 3 to ecx
on non-blank, look for two TB's (AX), and then for 1 TB (AL), but is there >> anything cleverer?
Not that I know of.
Other than getting rid of that "add 3 to ecx on non-blank" that is : assume that ECX points to the /last/ to-check character (start with il_dloc + 4)
and check it, and the three chars before it.
Don't forget to check for ECX underflow.
Though if speed is the target, you could take a look at "scasd" (moving "backwards" over the string), followed by a "scasw" and "scasb". More
setup (and teardown) needed, but /possibly/ faster in execution.
I think the overhead of SCASx is way too high for such short strings.
FWIW, the program runs in less than 0.5 seconds in the assembler-ised version, and in 0.75 seconds in the 99.9% pure Pascal version, and speeding it up, ha, ha, ha, is just something to keep my mind engaged.
Robert
PS: And no, given this reply by a long-time experienced Pascal user, the pure Pascal Version will not work in FPC and I've never tried to compile it in Delphi
6 or the old free TurboDelphi. Then again, neither of them would ever be able to
perform the equivalent of my manual optimisations.
--
Robert AH Prins
robert(a)prino(d)org
The hitchhiking grandfather -
https://prino.neocities.org/
Some REXX code for use on z/OS -
https://prino.neocities.org/zOS/zOS-Tools.html
--- Synchronet 3.21a-Linux NewsLink 1.2