• Re: Reverse SCAN SPLIT

    From albert@spenarnc.xs4all.nl@21:1/5 to dxforth@gmail.com on Sat Oct 19 13:36:37 2024
    In article <fea4967aee99bd0d235a21fbd6253e167b5690a9@i2pn2.org>,
    dxf <dxforth@gmail.com> wrote:
    On 17/10/2024 9:48 pm, albert@spenarnc.xs4all.nl wrote:
    In article <73ec4c8359439c78d77d4fce31fc50b2@www.novabbs.com>,
    mhx <mhx@iae.nl> wrote:
    On Thu, 17 Oct 2024 8:28:26 +0000, albert@spenarnc.xs4all.nl wrote:

    In article <nnd$231969a2$24a04042@87f25e33f755b9dd>,
    [..]
    Compare to what I'm doing. Promoting the actual API specification
    so that you can decide whether you want to actually use it.

    $/


    STACKEFFECT: sc c --- sc1 sc2

    DESCRIPTION: []

    Find the first c in the string constant sc and split it at that
    address. Return the strings after and before c into sc1 and sc2
    respectively. If the character is not present sc1 is a null string
    (its address is zero) and sc2 is the original string. Both sc1 and
    sc2 may be empty strings (i.e. their count is zero), if c is the
    last or first character in sc .

    Wil Baden chose to keep c in sc2. Do you have a reason to
    remove it?

    It seems logical to remove it. I normally use lots of
    `1 /STRING' and `-LEADING' or `-TRAILING' sequences in further
    processing of Split-At-Char results, but not always.
    Maybe because an empty sc2 is less informative than an sc2 of
    size 1?

    In the rare case that you want the delimiter :
    "orang utan" BL $/
    ( *utan" "orang" )
    you simply do
    1+
    ( *utan" "orang " )

    Applying 1+ is not foolproof in the case of empty sc2.
    Realize that
    " " BL $/ results in (OKAY)
    "" ""
    "" BL $/ results in (this goes wrong)
    0.0 ""

    You mean if the character is missing, resulting in a sc1
    that is a null-string.
    If the character is present 1+ works all the time.

    If the character is not present, you shouldn't do that, right.
    In that case sc1 is a null string, (not merely empty) and
    you should test sc1 first.
    That could happen if you split a linux files on linefeeds and the
    last linefeed is missing. Normally I would do
    BEGIN ^J $/ TYPE CR OVER 0= UNTIL
    Or splitting on $A.
    BEGIN $A $/ TYPE $A EMIT OVER 0= UNTIL

    Groetjes Albert
    --
    Temu exploits Christians: (Disclaimer, only 10 apostles)
    Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
    Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
    And Gifts For Friends Family And Colleagues.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ahmed@21:1/5 to All on Mon Oct 7 10:03:28 2024
    And with 00 for hours and minutes when they are absent

    : :t ( add cnt -- add 2 1 | add1 2 add2 2 2 | add1 2 add2 2 add3 2 3)
    0 -rot bounds dup >r swap do
    i c@ [char] : = if 1+ i 1+ 2 rot then
    1 -loop 1+ r> 2 rot ;

    : .t ( n --)
    case
    1 of ." 00 hrs" space ." 00 min" space type space ." sec" endof
    2 of ." 00 hrs" space type space ." min" space type space ." sec"
    endof
    3 of type space ." hrs" space type space ." min" space type space
    " sec" endof
    endcase ;


    s" 10:20:30" :t .t 10 hrs 20 min 30 sec ok
    s" 20:30" :t .t 00 hrs 20 min 30 sec ok
    s" 30" :t .t 00 hrs 00 min 30 sec ok

    Ahmed

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ahmed@21:1/5 to All on Mon Oct 7 09:55:34 2024
    What about this:

    : :t ( add cnt -- add 2 1 | add1 2 add2 2 2 | add1 2 add2 2 add3 2 3)
    0 -rot bounds dup >r swap do
    i c@ [char] : = if 1+ i 1+ 2 rot then
    1 -loop 1+ r> 2 rot ;

    : .t ( n --)
    case
    1 of type space ." sec" endof
    2 of type space ." min" space type space ." sec" endof
    3 of type space ." hrs" space type space ." min" space type space
    " sec" endof
    endcase ;


    s" 10:20:30" :t .t 10 hrs 20 min 30 sec
    s" 20:30" :t .t 20 min 30 sec
    s" 30" :t .t 30 sec

    Ahmed

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ahmed@21:1/5 to dxf on Mon Oct 7 19:25:20 2024
    On Mon, 7 Oct 2024 12:07:16 +0000, dxf wrote:

    ..

    Interesting. I'd do the numeric conversion in the main routine if
    possible.
    There's a parsing issue with s" :30"

    And what about this:


    : :t ( add cnt -- add 2 1 | add1 2 add2 2 2 | add1 2 add2 2 add3 2 3)
    bounds ( end start)
    dup ( end start start)
    >r ( end start ) ( r: start)
    swap ( start end ) ( r: start)
    dup ( start end pa)
    -rot ( pa start end )
    do ( pa)
    i ( pa add)
    c@ ( pa c)
    [char] : = ( pa f)
    if ( pa)
    i ( pa add)
    - ( pa-add)
    dup ( pa-add pa-add)
    2 ( pa-add pa-add 2)
    > ( pa-add t|f)
    if ( pa-add)
    drop ( )
    i ( add)
    dup ( add add)
    1+ ( add add+1)
    2 ( add add+1 2)
    rot ( add+1 2 add)
    else ( pa-add)
    1 = if ( )
    s" 00" ( add 2)
    i ( add 2 add)
    else ( )
    i ( add)
    dup 1+ 1 ( add add+1 1)
    rot ( add+1 1 add)
    then
    then
    then
    -1 +loop ( ... add+1 1|2 pa)
    r> ( pa start)
    tuck ( start pa start)
    - ( start pa-st)
    dup 0= if 2drop s" 00" then
    ;


    : .t ( s_add s_cnt m_add m_cnt h_add h_cnt)
    type space ." hr" space
    type space ." min" space
    type space ." sec"
    ;

    with stack juggling !!!!!!!!!!

    Some tests:


    s" 10:1:2" :t .t 10 hr 1 min 2 sec ok
    s" :10:" :t .t 00 hr 10 min 00 sec ok
    s" ::" :t .t 00 hr 00 min 00 sec ok
    s" ::1" :t .t 00 hr 00 min 1 sec ok
    s" :10:1" :t .t 00 hr 10 min 1 sec ok
    s" :10:" :t .t 00 hr 10 min 00 sec ok
    s" 10:10:" :t .t 10 hr 10 min 00 sec ok
    s" 10::" :t .t 10 hr 00 min 00 sec ok

    Ahmed

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ahmed@21:1/5 to dxf on Tue Oct 8 06:02:06 2024
    On Tue, 8 Oct 2024 2:58:47 +0000, dxf wrote:
    ..
    swap dup -rot --> over

    Changed

    ...
    But "5" needs to work :)

    I think now it works.

    Here is the new version:


    : :t ( add cnt -- add 2 1 | add1 2 add2 2 2 | add1 2 add2 2 add3 2 3)
    0 ( add cnt n)
    -rot ( n add cnt)
    bounds ( n end start)
    dup ( n end start start)
    >r ( n end start ) ( r: start)
    over ( n pa start end )
    do ( n pa)
    i ( n pa add)
    c@ ( n pa c)
    [char] : = ( n pa f)
    if ( n pa)
    swap ( pa n)
    1+ ( pa n+1)
    i ( pa n+1 add)
    rot ( n+1 add pa)
    swap ( n+1 pa add)
    - ( n+1 pa-add)
    dup ( n+1 pa-add pa-add)
    2 ( n+1 pa-add pa-add 2)
    > ( n+1 t|f)
    if ( n+1 pa-add)
    drop ( n+1)
    i ( n+1 add)
    swap ( add n+1)
    i ( add n+1 add)
    1+ ( add n+1 add+1)
    2 ( add n+1 add+1 2)
    rot ( add add+1 2 n+1)
    >r ( add add+1 2 ) ( r: n+1)
    rot ( add+1 2 add)
    r> ( add+1 2 add n+1)
    swap ( add+1 2 n+1 add)
    else ( n+1 pa-add)
    1 = if ( n+1)
    s" 00" ( n+1 add 2)
    rot ( add 2 n+1)
    i ( add 2 n+1 add)
    else ( n+1)
    i ( n+1 add)
    swap ( add n+1)
    i 1+ 1 ( add n+1 add+1 1)
    rot ( add add+1 1 n+1)
    >r ( add add+1 1) ( r: n+1)
    rot ( add+1 1 add)
    r> ( add+1 1 add n+1)
    swap ( add+1 1 n+1 add)
    then
    then
    then
    -1 +loop ( ... add+1 1|2 n pa)
    r> ( n pa start)
    tuck ( n start pa start)
    - ( n start pa-st)
    dup 0= if 2drop else rot 1+ then ;


    : .t ( n --)
    case
    1 of ." 00 hr" space ." 00 min" space type space ." sec" endof
    2 of ." 00 hr" space type space ." min" space type space ." sec"
    endof
    3 of type space ." hr" space type space ." min" space type space
    " sec" endof
    endcase ;

    Some tests:

    s" " :t .t ok
    s" 1" :t .t 00 hr 00 min 1 sec ok
    s" 15" :t .t 00 hr 00 min 15 sec ok
    s" :15" :t .t 00 hr 00 min 15 sec ok
    s" 2:15" :t .t 00 hr 2 min 15 sec ok
    s" 20:15" :t .t 00 hr 20 min 15 sec ok
    s" :20:15" :t .t 00 hr 20 min 15 sec ok
    s" 3:20:15" :t .t 3 hr 20 min 15 sec ok
    s" 13:20:15" :t .t 13 hr 20 min 15 sec ok
    s" 1:2:1" :t .t 1 hr 2 min 1 sec ok
    s" 1::1" :t .t 1 hr 00 min 1 sec ok
    s" ::1" :t .t 00 hr 00 min 1 sec ok
    s" :1:" :t .t 00 hr 1 min 00 sec ok
    s" 1:1:" :t .t 1 hr 1 min 00 sec ok
    s" 1::" :t .t 1 hr 00 min 00 sec ok
    s" :1" :t .t 00 hr 00 min 1 sec ok
    s" :" :t .t 00 hr 00 min 00 sec ok
    s" ::" :t .t 00 hr 00 min 00 sec ok


    Ahmed

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From albert@spenarnc.xs4all.nl@21:1/5 to dxforth@gmail.com on Thu Oct 10 10:00:02 2024
    In article <5c65a8f1fdfc3e9937a825842fe23dc2758f48ef@i2pn2.org>,
    dxf <dxforth@gmail.com> wrote:
    Earlier I mentioned scanning in reverse. Here's an implementation.

    [undefined] dxforth [if]
    : \CHAR ( a u -- a2 u2 c ) 1- 2dup + c@ ;
    [then]

    \ As for SCAN but scan from end
    : SCAN< ( a u c -- a2 u2 | a 0 )
    r over swap begin dup while \char r@ = until 1+ then
    rot drop rdrop ;

    Compare that with the meticulously designed, exhaustedly specified
    and eminently useful -- $/ -- .
    After 40 years it has not taken over the world ...

    NAME: $/

    STACKEFFECT: sc c --- sc1 sc2

    DESCRIPTION: []

    Find the first c in the string constant sc and split it at that
    address. Return the strings after and before c into sc1 and sc2
    respectively. If the character is not present sc1 is a null string
    (its address is zero) and sc2 is the original string. Both sc1 and sc2
    may be empty strings (i.e. their count is zero), if c is the last or
    first character in sc .
    (sc is c-addr len )


    The subtle difference between an empty string (a-add 0 ) and
    a null-string ( 0 0 ) allows you to handle empty lines in a file
    gracefully.



    \ As for SPLIT but scan from end. Latter string is topmost.
    : SPLIT< ( a u c -- a2 u2 a3 u3 )
    r 2dup r> scan< 2swap 2 pick /string ;

    If you go for SPLIT< from end define SCAN< .
    I have named it $\

    Get the name of an executable from the source file:
    "aap.frt" &. $\ 2DROP TYPE
    aap OK

    These words are elementary and should be defined in the core
    in assembler, possibly (Intel) taking advantage of the string
    words.

    SSLAS0:
    POP AX _C{ char}
    POP CX _C{ count}
    MOV BX,CX
    POP DI _C{ addr}
    OR DI,DI _C{Clear zero flag.}
    MOV DX,DI _C{ Copy}
    CLD _C{ INC DIRECTION}
    REPNZ SCASB _C{ Compare BYTE} <<<<<<<<<<<
    JZ SSLAS1
    <loads of stuff to handle the corner cases)



    <SNIP>

    Groetjes Albert
    --
    Temu exploits Christians: (Disclaimer, only 10 apostles)
    Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
    Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
    And Gifts For Friends Family And Colleagues.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From albert@spenarnc.xs4all.nl@21:1/5 to the.beez.speaks@gmail.com on Thu Oct 17 10:28:26 2024
    In article <nnd$231969a2$24a04042@87f25e33f755b9dd>,
    Hans Bezemer <the.beez.speaks@gmail.com> wrote:
    On 10-10-2024 11:57, dxf wrote:
    I wasn't aware you had reverse split.
    I was coming to the conclusion SCAN< as I defined it was of little
    value on it's own and planned to subsume it into reverse split.
    OTOH a reverse SCAN that gave the same results as forward SCAN might
    be useful.

    I got a full load of the whole shebang - since I have to parse some
    crazy stuff sometimes. It may not be pretty, but it served me well
    through the years (since 2004).

    Basically I can scan whatever I like however I like it:

    ---8<---
    : (NO) NOT ;
    : (YES) ;

    defer is-type ( c -- f)

    : (-tokenize) ( a1 n2 xt -- a2 n2 )
    is ?not begin dup while 2dup 1- chars + c@ is-type ?not while 1- repeat
    ;
    ( a1 n2 xt -- a2 n2)
    : (tokenize) is ?not begin dup while over c@ is-type ?not while chop
    repeat ;
    : scan> ['] (no) (tokenize) ; ( a1 n1 -- a2 n2 )
    : scan< ['] (no) (-tokenize) ; ( a1 n1 -- a2 n2 )
    : skip> ['] (yes) (tokenize) ; ( a1 n1 -- a2 n2 )
    : skip< ['] (yes) (-tokenize) ; ( a1 n1 -- a2 n2 )
    : split> 2dup scan> 2swap >r over r> swap - ;
    : split< dup >r scan< 2dup chars + -rot r> over - -rot ;
    ( a1 n1 -- a2 n2 a3 n3)
    ---8<---

    It's still 4tH stuff so your mileage may vary.

    Compare to what I'm doing. Promoting the actual API specification
    so that you can decide whether you want to actually use it.

    $/


    STACKEFFECT: sc c --- sc1 sc2

    DESCRIPTION: []

    Find the first c in the string constant sc and split it at that
    address. Return the strings after and before c into sc1 and sc2
    respectively. If the character is not present sc1 is a null string
    (its address is zero) and sc2 is the original string. Both sc1 and sc2
    may be empty strings (i.e. their count is zero), if c is the last or
    first character in sc .


    Hans Bezemer

    --
    Temu exploits Christians: (Disclaimer, only 10 apostles)
    Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
    Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
    And Gifts For Friends Family And Colleagues.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to All on Thu Oct 17 09:16:09 2024
    All good if the input string contains tokens that are delimited
    by special characters. This is not always the case, especially
    if tokens follow each other directly without a gap (can happen
    when OCR scanning documents, for example). Then you need
    lexical tokenisers which are more difficult to implement.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From albert@spenarnc.xs4all.nl@21:1/5 to mhx on Thu Oct 17 12:48:24 2024
    In article <73ec4c8359439c78d77d4fce31fc50b2@www.novabbs.com>,
    mhx <mhx@iae.nl> wrote:
    On Thu, 17 Oct 2024 8:28:26 +0000, albert@spenarnc.xs4all.nl wrote:

    In article <nnd$231969a2$24a04042@87f25e33f755b9dd>,
    [..]
    Compare to what I'm doing. Promoting the actual API specification
    so that you can decide whether you want to actually use it.

    $/


    STACKEFFECT: sc c --- sc1 sc2

    DESCRIPTION: []

    Find the first c in the string constant sc and split it at that
    address. Return the strings after and before c into sc1 and sc2
    respectively. If the character is not present sc1 is a null string
    (its address is zero) and sc2 is the original string. Both sc1 and
    sc2 may be empty strings (i.e. their count is zero), if c is the
    last or first character in sc .

    Wil Baden chose to keep c in sc2. Do you have a reason to
    remove it?

    It seems logical to remove it. I normally use lots of
    `1 /STRING' and `-LEADING' or `-TRAILING' sequences in further
    processing of Split-At-Char results, but not always.
    Maybe because an empty sc2 is less informative than an sc2 of
    size 1?

    In the rare case that you want the delimiter :
    "orang utan" BL $/
    ( *utan" "orang" )
    you simply do
    1+
    ( *utan" "orang " )

    The ideom ... $/ DUP WHILE ... is so pervasive that I
    must insist that an empty sc2 is immensely informative.


    -marcel

    Groetjes Albert
    --
    Temu exploits Christians: (Disclaimer, only 10 apostles)
    Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
    Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
    And Gifts For Friends Family And Colleagues.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From mhx@21:1/5 to albert@spenarnc.xs4all.nl on Thu Oct 17 10:29:32 2024
    On Thu, 17 Oct 2024 8:28:26 +0000, albert@spenarnc.xs4all.nl wrote:

    In article <nnd$231969a2$24a04042@87f25e33f755b9dd>,
    [..]
    Compare to what I'm doing. Promoting the actual API specification
    so that you can decide whether you want to actually use it.

    $/


    STACKEFFECT: sc c --- sc1 sc2

    DESCRIPTION: []

    Find the first c in the string constant sc and split it at that
    address. Return the strings after and before c into sc1 and sc2
    respectively. If the character is not present sc1 is a null string
    (its address is zero) and sc2 is the original string. Both sc1 and
    sc2 may be empty strings (i.e. their count is zero), if c is the
    last or first character in sc .

    Wil Baden chose to keep c in sc2. Do you have a reason to
    remove it?

    It seems logical to remove it. I normally use lots of
    `1 /STRING' and `-LEADING' or `-TRAILING' sequences in further
    processing of Split-At-Char results, but not always.
    Maybe because an empty sc2 is less informative than an sc2 of
    size 1?

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)