Fast approximate string matching

Problem:
You want to write a helper program for Hangman game. it would need to find all the possible word candidates from a very large dictionary. It would also need to be faster than blink of an eye.

Solution:
Simply store the dictionary into BK-tree, search the tree and filter the results.

\
\ 8th demo code
\
needs nk/gui

22 font:system font:new "font1" font:atlas! drop
22 1.5 n:* constant ROW-HEIGHT
22 1.2 n:* constant LIST-ROW-HEIGHT


: metric  \ s1 s2 -- n
  true s:dist ;

\ Import dictionary as BK-tree
"words.bk" app:asset >s tree:parse ' metric tree:cmp! constant words

: split-join+offsets?  \  s1 -- s2 a
  a:new swap "_" >r
  repeat
    r@ s:search null? if
      drop
      break
    else
      _swap dup n:1+ >r a:push swap s:len r@ n:> !if
        rdrop
        break
      else
        r>
      then
    then
  again
  r> s:/ "" a:join swap ;

: compare  \  s1 a s2 -- s1 a T
  s:len 3 pick s:len nip 3 pick a:len nip n:+ n:= if
    swap ( '_ s:! ) a:each! swap "_" s:/ "" a:join
    2 pick s:=
  else
    drop false
  then ;

: candidates?  \  s -- a
  dup>r split-join+offsets? a:len words r> rot tree:search nip
  ' compare a:filter 2nip ;

a:new constant list-data

: getcount \ -- n
  list-data a:len n:1- nip ;

: getitem \ n -- s
  list-data swap a:@ nip ;

: new-win
  {
    name: "main",
    title: "Word search",           
    wide: 0.5,                     
    high: 0.5,                   
    bg: "white"
  }
  nk:win
  "list" nk:list-new nk:m! ;

: list-item  \ s --
  nk:TEXT_LEFT "black" nk:label-colored ;

: search
  list-data a:clear "word" nk:get candidates? a:+ drop ;

: main-render
  {
    bg: "white",
    font: "f1",
    flags: [ @nk:WINDOW_NO_SCROLLBAR ],
    word: ""
  }

  nk:begin
    null  { rows: [@ROW-HEIGHT, -1], cols: [0.8, 0.2], cgap: 4, rgap: 4 } nk:layout-grid-begin
      0 1 0 1 nk:grid rect>local nk:grid-push
        "word" nk:get 31 nk:EDIT_FIELD nk:PLUGIN_FILTER_DEFAULT nk:edit-string if
          "word" swap nk:set
        else
          drop
        then
      0 1 1 1 nk:grid nk:rect>local nk:grid-push
        "Search" ' search nk:button-label
      1 1 0 2 nk:grid nk:rect>local nk:grid-push
        "list" nk:TEXT_LEFT LIST-ROW-HEIGHT nk:getcount "list" nk:m@ dup>r nk:list-begin if
          LIST-ROW-HEIGHT 1 nk:layout-row-dynamic
            (
              getitem nk:list-item
            ) r@ nk:list-range drop loop
          r@ nk:list-end
        then rdrop
    nk:layout-grid-end
  nk:end ;

: app:main
  new-win ' main-render -1 nk:render-loop ;
``´

I make extensive use of this technique in C# to do things like make capitalization / accenting / punctuation exceptions to simple title casing of individual words when creating a “presentation name” from an upper case and un-punctuated version of a business name (the keys are title case words and the value is the presentable version; it’s blindingly fast). I also use a dictionary to quickly find rules for exact or starts-with matching on various phrases when looking for applicable in-memory rules to apply.

I frankly have no idea what this syntax in the post is actually doing, but partial matches can be done with a partitioned dictionary where the key is the first 2 to 4 characters and then the value is a List that you just iterate to find the match from a finite subset of the universe of possible matches.

In .NET and some other runtimes, you can take advantage of runtime string interning to substantially reduce the memory footprint of such structures as well, at a modest cost of initialization speed – for keys and values whose cardinality are well known.