Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 35 |
Nodes: | 6 (0 / 6) |
Uptime: | 29:44:52 |
Calls: | 322 |
Calls today: | 1 |
Files: | 959 |
Messages: | 81,836 |
Posted today: | 3 |
A few weeks ago, i was curious to see What English words contained ( >abcd... ) consecutive letters of the alphabet.
defrag defg
defang defg
defog defg
hijack hijk
________________________________(Is there such a word containing 5
letters? )
Does it come standard in some library (package) in Python or
(Gauche)Scheme?
defrag defg
defang defg
defog defg
hijack hijk
________________________________(Is there such a word containing 5
letters? )
(intern-runs "hijacks")
;; => (0 . 3) - longest substr of length 3 at position 0
"firefighting" and "prizefighting" both have EFGHI which has length 5.
$ awk 'BEGIN {for(i=97; i<=118; i++) printf("%c.*%c.*%c.*%c.*%c\n", i,
i+1, i+2, i+3, i+4);}' | while read e; do grep -i $e
/usr/share/dict/words; done
richard@cogsci.ed.ac.uk (Richard Tobin) writes:
$ awk 'BEGIN {for(i=97; i<=118; i++) printf("%c.*%c.*%c.*%c.*%c\n", i,
i+1, i+2, i+3, i+4);}' | while read e; do grep -i $e
/usr/share/dict/words; done
Nice! That picked up "Kilimanjaro" which my more complicated Python
script missed, because it didn't case-fold.
Madhu <enometh@meer.net> writes:
(intern-runs "hijacks")
;; => (0 . 3) - longest substr of length 3 at position 0
I think that is not the question that was asked.
Although OP used the
term "substring", I think what was actually wanted is usually called a subsequence, i.e. the characters in it don't have to be consecutive. So "hijacks" has HIJK which has length 4. "firefighting" and
"prizefighting" both have EFGHI which has length 5.
* Paul Rubin <874j0ucpn1.fsf@nightsong.com> :
Wrote on Sat, 15 Feb 2025 23:30:58 -0800:
Madhu <enometh@meer.net> writes:
(intern-runs "hijacks")
;; => (0 . 3) - longest substr of length 3 at position 0
I think that is not the question that was asked.
oops.
Although OP used the
term "substring", I think what was actually wanted is usually called a
subsequence, i.e. the characters in it don't have to be consecutive. So
"hijacks" has HIJK which has length 4. "firefighting" and
"prizefighting" both have EFGHI which has length 5.
This becomes the Longest Increasing Subsequence Problem
https://en.wikipedia.org/wiki/Longest_increasing_subsequence
There are several good writeups about this on the web and lecture notes,
much better than the implementation I once cobbled up from the wiki
In common lisp: https://kaygun.github.io/clean/2014-10-26-longest_increasing_subsequence.html https://kaygun.github.io/clean/2015-10-26-longest_increasing_subsequence_revisited.html
Using a suitable implementaion in a stupid way:
(defun lca (string)
(map 'string 'code-char (lca::longest-inc-seq (map 'list 'char-code
string))))
and running it on to extract into a hashtable with the keys as lcas
(hash-table-count $h2)
;; => 20437
(gethash "abcde" $h2)
("oxylabracidae" "cerambycidae" "bambocciade" "amoebicide" "ambuscade"
"absconded" "aborticide")
(sort (mapcar (lambda (x) (cons (car x) (length (cdr x))))
(group2 (hash-keys $h2) :test #'= :key #'length))
#'< :key #'car)
;; "length of lca . number of words"
((2 . 241) (3 . 1596) (4 . 4833) (5 . 7024) (6 . 4961) (7 . 1545) (8 .
217)
(9 . 19) (10 . 1))
(9 . 19) (10 . 1))
$ awk 'BEGIN {for(i=97; i<=118; i++) printf("%c.*%c.*%c.*%c.*%c\n", i,
i+1, i+2, i+3, i+4);}' | while read e; do grep -i $e
/usr/share/dict/words; done
Nice! That picked up "Kilimanjaro" which my more complicated Python
script missed, because it didn't case-fold.
The Substring function is a nice Programming puzzle.
          Why did I have to write it myself?
                    Does it come standard in some library (package)
                      in Python or (Gauche)Scheme?
________________
A few weeks ago, i was curious to see What English words contained ( abcd... ) consecutive letters of the alphabet.
       defrag         defg
       defang         defg
       defog          defg
       hijack         hijk
________________________________(Is there such a word containing 5
letters? )
           Am i too Abced-minded ????
This becomes the Longest Increasing Subsequence Problem
Madhu <enometh@meer.net> writes:
This becomes the Longest Increasing Subsequence Problem
Still not quite, the subsequence is supposed to be consecutive, not just increasing. Like "abcde" not "acegi".
((3 "xyz" "wxy" "uvw" "tuv" "stu" "rst" "qrs" "pqr" "opq" "nop" "mno" "lmn""klm" "jkl" "ijk" "hij" "ghi" "fgh" "efg" "def" "cde" "bcd" "abc")
On Sun, 16 Feb 2025 15:43:20 +0000, Madhu wrote:
Using a suitable implementaion in a stupid way:
(defun lca (string)
(map 'string 'code-char (lca::longest-inc-seq (map 'list 'char-code string))))
and running it on to extract into a hashtable with the keys as lcas
(hash-table-count $h2)
;; => 20437
(gethash "abcde" $h2)
("oxylabracidae" "cerambycidae" "bambocciade" "amoebicide" "ambuscade" "absconded" "aborticide")
(sort (mapcar (lambda (x) (cons (car x) (length (cdr x))))
(group2 (hash-keys $h2) :test #'= :key #'length))
#'< :key #'car)
;; "length of lca . number of words"
((2 . 241) (3 . 1596) (4 . 4833) (5 . 7024) (6 . 4961) (7 . 1545) (8 .
217)
(9 . 19) (10 . 1))
____________
(9 . 19) (10 . 1))
wow.... I'd love to know what these Longest words are!
who is the (sole) Grand winner?
FWIW My code is pretty clunky, somehow TIME SBCL says it conses 0 bytes
when I run it on individual words without interning although when I'm explicitly manipulating plists and arrays.
* HenHanna <507d06f8062ea2f4dc1b226979b23021@www.novabbs.com> :
Wrote on Sun, 16 Feb 2025 18:33:58 +0000:
On Sun, 16 Feb 2025 15:43:20 +0000, Madhu wrote:
[Badly proofread, sorry. The letters "LCA" are meaningless in this
context. it could be LIS (longest increasing subsequence)]
Using a suitable implementaion in a stupid way:
(defun lca (string)
(map 'string 'code-char (lca::longest-inc-seq (map 'list 'char-code
string))))
and running it on to extract into a hashtable with the keys as lcas
(hash-table-count $h2)
;; => 20437
(gethash "abcde" $h2)
("oxylabracidae" "cerambycidae" "bambocciade" "amoebicide" "ambuscade"
"absconded" "aborticide")
(sort (mapcar (lambda (x) (cons (car x) (length (cdr x))))
(group2 (hash-keys $h2) :test #'= :key #'length))
#'< :key #'car)
;; "length of lca . number of words"
((2 . 241) (3 . 1596) (4 . 4833) (5 . 7024) (6 . 4961) (7 . 1545) (8 .
217)
(9 . 19) (10 . 1))
____________
(9 . 19) (10 . 1))
wow.... I'd love to know what these Longest words are!
who is the (sole) Grand winner?
This historgram was of just the keys, or the subsequences, not the
words. The longest increasing subsequence was of length 10,
"achilopsty" and the word was
(gethash "achilopsty" $h2)
;; => ("tarsochiloplasty"), T
#||
lrwxrwxrwx 1 root root 4 Dec 28 2020 /usr/share/dict/words -> web2 -rw-r--r-- 1 root root 2486824 Oct 21 2000 /usr/share/dict/web2
||#
tarsochiloplasty