Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 43 |
Nodes: | 6 (0 / 6) |
Uptime: | 98:25:22 |
Calls: | 290 |
Files: | 905 |
Messages: | 76,483 |
Here is a simple solution, which assumes that only one way exists to
split a word, and that a simple heuristic suffices to disambiguate
between possible matches at a given point (implemented are first-match
and longest-first-match, via the operations SIMPLE-FIND-PART and GREEDY-FIND-PART).
If this isn't sufficient, you can either try to change the code below
to support some form of backtracking/non-determinism, or you can check
out Screamer, which is an extension to Common Lisp for non-determinstic programming, which makes this task much easier IMHO. Screamer is by
Jeffrey Mark Siskind (http://www.neci.nj.nec.com/homepages/qobi).
I'd recommend using Screamer, since I'd imagine you will want to
process your word fragments further, and most things NLP will imply
some non-determinism.
Regs, Pierre.
;;; Utility function
(defun starts-with (string start-string &key (start 0))
(let ((start-length (+ start (length start-string))))
(and (>= (length string) start-length)
(string-equal string start-string :start1 start
:end1 start-length))))
;;; The different part-finders, which implement first-match and
;;; longest-first-match heuristics respectively.
(defun simple-find-part (string part-list &key (start 0))
(dolist (part part-list)
(when (starts-with string part :start start)
(return part))))
(defun greedy-find-part (string part-list &key (start 0))
(loop with result = nil
with length = 0
for part in part-list
do
(when (and (starts-with string part :start start)
(> (length part) length))
(setq result part
length (length part)))
finally
(return result)))
;;; The main function.
(defun break-apart (word part-finder &rest part-lists)
(loop with word-length = (length word)
for index = 0 then (+ index (length part))
for part-list in part-lists
for part = (funcall part-finder word part-list :start index)
never (or (not part) (>= index word-length))
collect part into result
finally
(return (and (= index word-length)
result))))
;;; Examples
#|
* (break-apart "astronaut" #'simple-find-part
'("as" "co" "ast") '("tro" "ro" "mp")
'("na" "ut") '("ut" "er"))
("as" "tro" "na" "ut")
* (break-apart "astronaut" #'greedy-find-part
'("as" "co" "ast") '("tro" "ro" "mp")
'("na" "ut") '("ut" "er"))
("ast" "ro" "na" "ut")
* (break-apart "astronaut" #'simple-find-part
'("as" "co" "ast") '("tro" "mp")
'("na" "ut") '("ut" "er"))
("as" "tro" "na" "ut")
* (break-apart "astronaut" #'greedy-find-part
'("as" "co" "ast") '("tro" "mp")
'("na" "ut") '("ut" "er"))
NIL