Fast approximate string matching

You want to write a helper program for Hangman game. it would need to find all the possible word candidates from a very large dictionary. It would also need to be faster than blink of an eye.

Simply store the dictionary into BK-tree, search the tree and filter the results.

\ 8th demo code
needs nk/gui

22 font:system font:new "font1" font:atlas! drop
22 1.5 n:* constant ROW-HEIGHT
22 1.2 n:* constant LIST-ROW-HEIGHT

: metric  \ s1 s2 -- n
  true s:dist ;

\ Import dictionary as BK-tree
"words.bk" app:asset >s tree:parse ' metric tree:cmp! constant words

: split-join+offsets?  \  s1 -- s2 a
  a:new swap "_" >r
    r@ s:search null? if
      _swap dup n:1+ >r a:push swap s:len r@ n:> !if
  r> s:/ "" a:join swap ;

: compare  \  s1 a s2 -- s1 a T
  s:len 3 pick s:len nip 3 pick a:len nip n:+ n:= if
    swap ( '_ s:! ) a:each! swap "_" s:/ "" a:join
    2 pick s:=
    drop false
  then ;

: candidates?  \  s -- a
  dup>r split-join+offsets? a:len words r> rot tree:search nip
  ' compare a:filter 2nip ;

a:new constant list-data

: getcount \ -- n
  list-data a:len n:1- nip ;

: getitem \ n -- s
  list-data swap a:@ nip ;

: new-win
    name: "main",
    title: "Word search",           
    wide: 0.5,                     
    high: 0.5,                   
    bg: "white"
  "list" nk:list-new nk:m! ;

: list-item  \ s --
  nk:TEXT_LEFT "black" nk:label-colored ;

: search
  list-data a:clear "word" nk:get candidates? a:+ drop ;

: main-render
    bg: "white",
    font: "f1",
    flags: [ @nk:WINDOW_NO_SCROLLBAR ],
    word: ""

    null  { rows: [@ROW-HEIGHT, -1], cols: [0.8, 0.2], cgap: 4, rgap: 4 } nk:layout-grid-begin
      0 1 0 1 nk:grid rect>local nk:grid-push
        "word" nk:get 31 nk:EDIT_FIELD nk:PLUGIN_FILTER_DEFAULT nk:edit-string if
          "word" swap nk:set
      0 1 1 1 nk:grid nk:rect>local nk:grid-push
        "Search" ' search nk:button-label
      1 1 0 2 nk:grid nk:rect>local nk:grid-push
        "list" nk:TEXT_LEFT LIST-ROW-HEIGHT nk:getcount "list" nk:m@ dup>r nk:list-begin if
          LIST-ROW-HEIGHT 1 nk:layout-row-dynamic
              getitem nk:list-item
            ) r@ nk:list-range drop loop
          r@ nk:list-end
        then rdrop
  nk:end ;

: app:main
  new-win ' main-render -1 nk:render-loop ;

I make extensive use of this technique in C# to do things like make capitalization / accenting / punctuation exceptions to simple title casing of individual words when creating a “presentation name” from an upper case and un-punctuated version of a business name (the keys are title case words and the value is the presentable version; it’s blindingly fast). I also use a dictionary to quickly find rules for exact or starts-with matching on various phrases when looking for applicable in-memory rules to apply.

I frankly have no idea what this syntax in the post is actually doing, but partial matches can be done with a partitioned dictionary where the key is the first 2 to 4 characters and then the value is a List that you just iterate to find the match from a finite subset of the universe of possible matches.

In .NET and some other runtimes, you can take advantage of runtime string interning to substantially reduce the memory footprint of such structures as well, at a modest cost of initialization speed – for keys and values whose cardinality are well known.