• Soundex Algorithm in AWK

    From Mike Sanders@21:1/5 to All on Fri Aug 23 05:11:33 2024
    # Soundex Algorithm in AWK: Michael Sanders 2024
    # example usage: awk -f soundex.awk < words.txt
    # see also: https://en.wikipedia.org/wiki/Soundex

    { print $0 " : " soundex($0) }

    function soundex(word, i, code, c, firstLetter, lastCode, buf) {
    word = toupper(word) # convert word to uppercase
    firstLetter = substr(word, 1, 1)
    code = buf = ""

    # map of letters to soundex digits
    for (i = 2; i <= length(word); i++) {
    c = substr(word, i, 1)
    if (c ~ /[BFPV]/) code = "1"
    else if (c ~ /[CGJKQSXZ]/) code = "2"
    else if (c ~ /[DT]/) code = "3"
    else if (c ~ /[L]/) code = "4"
    else if (c ~ /[MN]/) code = "5"
    else if (c ~ /[R]/) code = "6"
    else code = "" # skip A, E, I, O, U, H, W, Y

    # ignore consecutive identical codes
    if (code != lastCode && code != "") {
    buf = buf code
    lastCode = code
    }
    }

    # combine 1st letter with buf, pad with zeros or truncate to 4 characters
    return substr(firstLetter buf "000", 1, 4)
    }

    # eof

    --
    :wq
    Mike Sanders

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)