Best syntax to filter a list

DaveS · 13 March 2023 18:35

Screen Shot 2023-03-13 at 11.29.22 AM

What would you expect to type to filter the list to what you might be looking for?
Note : the Table is populated with a “SELECT” statement in SQL

have the App add “%” to each end of entry and use like
require the user to enter any leading or trailing “%”

With #1… .if the user typed “man”… it would search for “%man%”
otherwise it would search for only what the user entered… and if they were not savvy enough to know a bit of SQL, they might never realize why they can’t find something.

npalardy · 13 March 2023 18:36

I’d use what the user enters it as if they entered %man%

BUT I would order by by Levenshtein distance or something so that closer matches show first

DaveS · 13 March 2023 18:55

nah

SELECT * FROM manpages WHERE manpage LIKE "%<user>%" ORDER BY manpage


besides... "closer"? closer to what?

npalardy · 13 March 2023 19:06

closer to the originally entered string

so a result of MAN would be a “distance” of 0 from man (no characters need to be added ore removed)

a result of nm would be at least 3

the first m changes to n - distance 1
the a changes to m - distance 1
the last n is removed - distance 1

EDIT: overall Levenshtein distance is fairly easy to compute

I probably have Xojo code somewhere that does it

Seems DaveS posted it once

DaveS · 13 March 2023 19:12

yeah… I new what Levenshtien is… I used it in Oracle back in the day…

Just wasn’t sure how you meant to apply it… gonna have to think on that one…
it could slow things down too much

npalardy · 13 March 2023 19:15

unfortunately its not built in to sqlite

soundex may be depending on how it was compiled

in the IDE filter I just computed it and then sorted

it was just an in memory db so speed didnt seem to be horrible unless you used really short filters like single letters

jmadren · 13 March 2023 19:29

Levenshtein distance is more useful for “fuzzy” matches rather than for filtering. Like in spell checkers. SQLite does have a spell check extension available - search for sqlite spellfix.

Personally, I would assume the user knows how to spell what he’s looking for and only wants entries that contain the exact search term. I would use two queries, the first that matches with only a trailing ‘%’, ordered by the returned item, then a second query using leading and trailing ‘%’, excluding entries already matched in the first query.

npalardy · 13 March 2023 19:40

I’m imagining the filter in the Xojo IDE which does, or did, use Levenshtein distance for ordering the results

results that more closely match the filter are towards the top of the list
this that are further away are further down the list

From my Kitchen Sink Starter

Allowing a user to enter something explicitly that includes wildcards wouldn’t be horrible either

jmadren · 13 March 2023 20:16

Except that’s not how they appear to me. To me, they just all have the word “string” in them, with the only difference being how long the entire item is. Again, sorting all that begin with “string” at the top, then sorting the remaining (separately) would make more sense. Then I can focus on or skip entire sections that are similar instead of having to read each and every item.

npalardy · 13 March 2023 20:28

Fair
I think if you look closely you will see that items that require fewer changes to become the search string String appear closer to the top than others that require more changes

StringUtils requires removal of “Utils” - remove 5 characters at the end
StringExtensions require removing “Extensions” - removing 10 characters at the end
If it was just lexical sorting Extensions should appear before utils
But it doesnt

It is subtle but it seemed useful
Maybe this is more illustrative
if the search is for “string” then I get the same results
But titles that are more similar to tring appear closer to the top

jmadren · 13 March 2023 21:10

How are the top entries “more similar”? It’s just length!

And yes, I know exactly what it was doing, and it’s why I was pointing out that unless there are entries that contain, for example, “strange” or even “ring” for it to fuzzy match to, then all it’s really doing is ordering by character length (same as “number of characters that must be removed”). In other words, this list could simply be created using “where name like ‘%tring%’ order by length(name)” without needing an LD function.

That’s why I state that LD is only good for fuzzy matches, such as when trying to find a “correct” entry for a misspelled word.

Note how your result list it the same for both searches.

And yes, IMO in this particular case, lexical sorting is actually better (but only sorting within each “group” as I laid out before).

npalardy · 13 March 2023 21:38

Length is relevant because that influences how many changes are required and so alters the LD distance

I suppose the IDE’s filter could have been made to find everything that had some maximum distance and that would have been a better fuzzy match

But that also would require using a version of sqlite that had such a thing built in - which it doesnt by default - or by adding a custom sqlite function
Neither of which Xojo did/does

You can of course file a feature request for that

edit : sqlite HAS an extension
https://www.sqlite.org/spellfix1.html
but

The spellfix1 virtual table is not included in the SQLite amalgamation and is not a part of any standard SQLite build. It is a loadable extension.

Xojo uses the amalgamation affect

DaveS · 13 March 2023 21:52

Thanks for the input… I think I have decided to go this way

Screen Shot 2023-03-13 at 2.50.33 PM

DaveDuke · 15 March 2023 08:49

Not sure if SQLite supports regex but I have found in other sql languages that it’s significantly faster. In Postgres you use ~ “search” maybe SQLite supports regex searching? If someone knows.

npalardy · 15 March 2023 12:07

There is glob, regex, and match
https://sqlite.org/lang_expr.html#glob

HOWEVER,

No regexp() user function is defined by default and so use of the REGEXP operator will normally result in an error message.
The default match() function implementation raises an exception and is not really useful for anything.

so GLOB is the only one that seems to be available in all builds

DaveS · 15 March 2023 14:28

Thanks for all the input… this seems to work just fine, and way fast enough

   var SQL="SELECT * "
    + " FROM manPages "
    + " WHERE isHidden=0"
    if lastFilter != "" {
        var zFilter = ""
        switch currentFilterMode {
            case menuTAGS.tagFILTER_CONTAINS.rawValue   : zFilter = "%\(lastFilter)%"
            case menuTAGS.tagFILTER_BEGINSWITH.rawValue : zFilter = "\(lastFilter)%"
            case menuTAGS.tagFILTER_ENDSWITH.rawValue   : zFilter = "%\(lastFilter)"
            case menuTAGS.tagFILTER_EQUALTO.rawValue    : zFilter = "\(lastFilter)"
            default : break
        }
        SQL+=" AND manpage Like '\(zFilter)'"
    }
    SQL += " ORDER BY isFAVORITE desc,manPage"