Tweak fuzzy matching parameters
Fuzzy matching have a number of configurable parameters. The default values in the initial implementation may not be optimal.
What should the default values be?
At the time of writing, the configurable parameters are:
--no-fuzzy: disables fuzzy matching
--fuzzy-min-length=VALUE: search strings shorter than this will not be fuzzy matched. Default: 3.
--fuzzy-max-length-discrepancy=VALUE: maximum allowed difference between the fuzzy match, and the search string. Default: 2.
--fuzzy-max-distance=VALUE: maximum allowed levenshtein distance between the fuzzy match and the search string. Default: 1.
I disagree there should be tuneable parameters for fuzzy matching.
I think whatever
fzf does by default is great. Copy that. Leave some constants in the source code if people feel a strong need to tune them, but otherwise, keep
fuzzel simple by not exposing these tuning options.
Let me ponder this for a while. And perhaps look at fzf's implementation. It's possible it's too different, and that it uses other parameters, in which case we can't copy them. If so, having command line options, at least for a while/couple of releases, will make it a whole lot easier to tune them.
Here's a test case that current works poorly with
fuzzel but works well with
fzf and Rofi's fuzzy mode:
I tried using this as a dmenu drop-in for
dmenu-lpass, which helps select passwords from the Lastpass password manager.
I have a many entries that include some version of my name, like
**firstname.lastname@example.org**. But the entry I needed to search for contained AUR and only exactly one entry returned that string.
With Rofi's fuzzy match or
fzf, search for "AUR" returns an exact match, but with the current tuning of
fuzzel, the entry I'm looking for isn't even in the top 10 results. Even when I try to turn down the fuzziness of
fuzzel, I still don't get the exact match for "AUR" in the top 10:
The exact match should be a better result than the fuzzy matches.
Hmm, yeah, that obviously makes sense. That requires a different change than tweaking the parameters however; what you're after is a different sorting mechanism, where exact matches are sorted before fuzzy matches.
Still, this is something that does make a lot of sense, so let's try to implement this.
@dnkl -- I can confirm all the fixes in the last 24 hours work. Thanks!
Here's another place that fuzzy matching can improve.
I have an entry named "AmigoFox". With both
rofi --matching fuzzy, I can match this string by typing [Fox Amigo].
*So, both words match exactly, but not in that order. *
With the default
fuzzel tuning, no match is found. Here's a command to test with:
echo -e "a\n\nb\nc\nAmigoFox" | fuzzel --dmenu echo -e "a\n\nb\nc\nAmigoFox" | fzf echo -e "a\n\nb\nc\nAmigoFox" | rofi -matching fuzzy
I also accidentally tested what happens when a blank line is included. Turns out all three treat an empty line as a valid entry, although it doesn't seem useful to me.
echo -e "a\n\nb\nc\nAmigoFox" | fzf
At least fzf doesn't really appear to be doing fuzzy matching in this case. Instead, it's matching each word that you type, whereas fuzzel matches the entire string.
Or something like that. It obviously does fuzzy matching as well.
Let me test a bit more. I might decide to release what we currently have, and do improvements like this one later.
The best part about how fzf performs matching imho is that when i try to match:
I usually go left to right typing only parts that should belong to one file only.
This gives me option to further narrow down my search even if i dont know what I m looking for.
In this case it would be something like
and this also works with spaces in between. (
us sh myf rs)
This is behavior I totally miss in fuzzel.
I for example use it to replay already played youtube videos, another case where many share the title prefix but differ in suffix and I m unable to use it like this.
Am I describing different matching type? Because what I see it does now is that it allows me to make mistakes in what I type (to some extent)
I think that simply separating the input query along spaces and matching the parts separately could go a long way. I find myself mistyping things a lot less, which makes the Levenshtein distance much less useful, but I would often like to search for multiple parts of the string.
For example type "Bi Stu" to find "Bitwig Studio" among my applications or "swa tap" to find
swaymsg "input * tap enabled" among my quick commands. (Similarly "blu di" for
bluetoothctl disconnect, "sli smi" for "Slightly Smiling Face" among emojis and so on...)
(Also, one other application that behaves this way and could be looked at for implementation ideas is
Deleting a branch is permanent. It CANNOT be undone. Continue?