• Re: ( Substring function in Python, Lisp) -- [Hijack] contains [hijk]

    From Madhu@enometh@meer.net to comp.lang.lisp on Mon Feb 17 12:21:53 2025
    From Newsgroup: comp.lang.lisp

    * Paul Rubin <87bjv1vaos.fsf@nightsong.com> :
    Wrote on Sun, 16 Feb 2025 13:31:15 -0800:

    Madhu <enometh@meer.net> writes:
    This becomes the Longest Increasing Subsequence Problem

    Still not quite, the subsequence is supposed to be consecutive, not just increasing. Like "abcde" not "acegi".

    Hen's posts really bring out the my reading comprehension skills.

    "Longest Increasing Consecutive Subsequence Problem"

    Google finds plenty of code but I guess it is not of any research
    interest.

    FWIW My code is pretty clunky, somehow TIME SBCL says it conses 0 bytes
    when I run it on individual words without interning although when I'm explicitly manipulating plists and arrays.

    (defun intern-lics (word &optional hash-table dp)
    "Returns the longest increasing consective sequence (lics) of WORD.
    If DP is supplied it should be an array with fill-pointer with enough
    capacity to contain (LENGTH WORD) integers. If hash-table is supplied
    it should be an EQUAL hashtable with STRING keys and LIST values.
    WORD is pushed onto the list which is the hash-value of the lics.
    WORD is assumed consist of lowercase ASCII characters.
    Ignores subsequences of length 1."
    (let* ((n (length word))
    (map nil) ;; (getf map c) -> last index of c in word
    (dp ;; dp[i] = length of the lics that end at index i of word
    (cond (dp (setf (fill-pointer dp) n) dp)
    (t (make-array n))))
    (maximum -1) ;;the length of the lics
    (index -1) ;;index in word where where the lics ends
    )
    (loop for i from 0 for c across word
    for prev = (code-char (1- (char-code c)))
    ;; if prev is present in word before the ith index, then c
    ;; will add to the increasing subsequence which has prev.
    for prev-idx = (getf map prev)
    for newl = (if prev-idx
    (1+ (elt dp prev-idx))
    1)
    do
    (setf (elt dp i) newl)
    (setf (getf map c) i)
    (when (< maximum (elt dp i))
    (setq maximum (elt dp i))
    (setq index i)))
    (when (> maximum 2) ; otherwise not interesting
    (let ((subseq (map 'string 'code-char
    (reverse (loop for i downfrom (char-code (elt word index))
    repeat maximum
    collect i)))))
    (when hash-table
    (pushnew word (gethash subseq hash-table nil)
    :test #'equal))
    subseq))))


    (intern-lics "absconder") ;"abcde"
    (intern-lics "afterstudy") ; => "rstu"

    (time (loop repeat 10 do (intern-lics "absconder")))
    #||
    0.000 seconds of real time
    0.000074 seconds of total run time (0.000073 user, 0.000001 system)
    100.00% CPU
    148,248 processor cycles
    0 bytes consed
    ||#

    ;; 0 bytes? rly?

    (defvar $a(slurp-lines "/usr/share/dict/words"))
    (length $a)
    ;; => 234937
    (map-into $a 'string-downcase $a)
    (setq $a (delete-duplicates $a :test #'equalp))
    (length $a)
    ;; => 233615
    (defvar $h (make-hash-table :test #'equal))
    (clrhash $h)
    (setq $array (make-array 124 :fill-pointer t))
    (time (map nil (lambda (w) (intern-lics w $h $array)) $a))

    #||
    1.916 seconds of real time
    1.916782 seconds of total run time (1.916782 user, 0.000000 system)
    100.05% CPU
    4,047,899,870 processor cycles
    61,527,200 bytes consed
    ||#

    (hash-table-count $h)
    ;; => 44
    (setq $keys (sort (hash-keys $h) #'string<))
    (sort (group2 $keys :test #'equal :key #'length) #'< :key #'car)
    ((3 "xyz" "wxy" "uvw" "tuv" "stu" "rst" "qrs" "pqr" "opq" "nop" "mno" "lmn"
    "klm" "jkl" "ijk" "hij" "ghi" "fgh" "efg" "def" "cde" "bcd" "abc")
    (4 "stuv" "rstu" "qrst" "nopq" "mnop" "lmno" "klmn" "ijkl" "hijk" "fghi"
    "efgh" "defg" "cdef" "bcde" "abcd")
    (5 "rstuv" "lmnop" "klmno" "efghi" "defgh" "abcde"))


    (gethash "abcde" $h)
    ("oxylabracidae" "cerambycidae" "carbacidometer" "bambocciade" "amoebicide"
    "ambuscader" "ambuscade" "amblycephalidae" "abstractedness" "absconder"
    "abscondence" "abscondedly" "absconded" "aborticide" "abjectedness")
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Madhu@enometh@meer.net to comp.lang.lisp on Mon Feb 17 12:39:05 2025
    From Newsgroup: comp.lang.lisp

    * HenHanna <507d06f8062ea2f4dc1b226979b23021@www.novabbs.com> :
    Wrote on Sun, 16 Feb 2025 18:33:58 +0000:
    On Sun, 16 Feb 2025 15:43:20 +0000, Madhu wrote:

    [Badly proofread, sorry. The letters "LCA" are meaningless in this
    context. it could be LIS (longest increasing subsequence)]

    Using a suitable implementaion in a stupid way:
    (defun lca (string)
    (map 'string 'code-char (lca::longest-inc-seq (map 'list 'char-code string))))

    and running it on to extract into a hashtable with the keys as lcas

    (hash-table-count $h2)
    ;; => 20437

    (gethash "abcde" $h2)
    ("oxylabracidae" "cerambycidae" "bambocciade" "amoebicide" "ambuscade" "absconded" "aborticide")


    (sort (mapcar (lambda (x) (cons (car x) (length (cdr x))))
    (group2 (hash-keys $h2) :test #'= :key #'length))
    #'< :key #'car)

    ;; "length of lca . number of words"
    ((2 . 241) (3 . 1596) (4 . 4833) (5 . 7024) (6 . 4961) (7 . 1545) (8 .
    217)
    (9 . 19) (10 . 1))
    ____________

    (9 . 19) (10 . 1))

    wow.... I'd love to know what these Longest words are!
    who is the (sole) Grand winner?

    This historgram was of just the keys, or the subsequences, not the
    words. The longest increasing subsequence was of length 10,
    "achilopsty" and the word was

    (gethash "achilopsty" $h2)
    ;; => ("tarsochiloplasty"), T

    #||
    lrwxrwxrwx 1 root root 4 Dec 28 2020 /usr/share/dict/words -> web2
    -rw-r--r-- 1 root root 2486824 Oct 21 2000 /usr/share/dict/web2
    ||#



    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.lisp on Sun Feb 16 23:44:22 2025
    From Newsgroup: comp.lang.lisp

    Madhu <enometh@meer.net> writes:
    FWIW My code is pretty clunky, somehow TIME SBCL says it conses 0 bytes
    when I run it on individual words without interning although when I'm explicitly manipulating plists and arrays.

    Yikes. I did a Python version that conses a lot but is fairly concise,
    I thought. From the looks of things, your /usr/share/dict/words is
    larger than mine. Mine has about 100k entries and this code took about
    0.7 seconds on my laptop, not great, but obviously many optimizations
    are possible with the same basic approach:

    from collections import defaultdict

    def runlen(word:str) -> int:
    d = defaultdict(int)
    def pred(c): return chr(ord(c)-1) # unicode predecessor
    for c in word:
    d[c] = d[pred(c)] + 1
    return max((n,c) for c,n in d.items())

    def main():
    for w in open('/usr/share/dict/words'):
    n,c = runlen(w.strip())
    if n>=5: print(w.strip(),n,c)

    main()
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From HenHanna@HenHanna@dev.null to rec.puzzles,comp.lang.lisp,comp.lang.python on Mon Feb 17 11:50:35 2025
    From Newsgroup: comp.lang.lisp

    On Mon, 17 Feb 2025 7:09:05 +0000, Madhu wrote:

    * HenHanna <507d06f8062ea2f4dc1b226979b23021@www.novabbs.com> :
    Wrote on Sun, 16 Feb 2025 18:33:58 +0000:
    On Sun, 16 Feb 2025 15:43:20 +0000, Madhu wrote:

    [Badly proofread, sorry. The letters "LCA" are meaningless in this
    context. it could be LIS (longest increasing subsequence)]

    Using a suitable implementaion in a stupid way:
    (defun lca (string)
    (map 'string 'code-char (lca::longest-inc-seq (map 'list 'char-code
    string))))

    and running it on to extract into a hashtable with the keys as lcas

    (hash-table-count $h2)
    ;; => 20437

    (gethash "abcde" $h2)
    ("oxylabracidae" "cerambycidae" "bambocciade" "amoebicide" "ambuscade"
    "absconded" "aborticide")


    (sort (mapcar (lambda (x) (cons (car x) (length (cdr x))))
    (group2 (hash-keys $h2) :test #'= :key #'length))
    #'< :key #'car)

    ;; "length of lca . number of words"
    ((2 . 241) (3 . 1596) (4 . 4833) (5 . 7024) (6 . 4961) (7 . 1545) (8 .
    217)
    (9 . 19) (10 . 1))
    ____________

    (9 . 19) (10 . 1))

    wow.... I'd love to know what these Longest words are!
    who is the (sole) Grand winner?

    This historgram was of just the keys, or the subsequences, not the
    words. The longest increasing subsequence was of length 10,
    "achilopsty" and the word was

    (gethash "achilopsty" $h2)
    ;; => ("tarsochiloplasty"), T

    #||
    lrwxrwxrwx 1 root root 4 Dec 28 2020 /usr/share/dict/words -> web2 -rw-r--r-- 1 root root 2486824 Oct 21 2000 /usr/share/dict/web2
    ||#

    ______________

    tarsochiloplasty

    thank you... that word is in the dictionary I'm using.

    This looks good too: abdominoplasty


    whch contains abd in opsty
    ach il opsty


    abdominoplasty -- is shorter and its meaning is kinda obvious!
    --- Synchronet 3.21d-Linux NewsLink 1.2