Tweak fuzzy matching parameters #76

Open
opened 8 months ago by dnkl · 10 comments
dnkl commented 8 months ago
Owner

Fuzzy matching have a number of configurable parameters. The default values in the initial implementation may not be optimal.

What should the default values be?

At the time of writing, the configurable parameters are:

  • --no-fuzzy: disables fuzzy matching
  • --fuzzy-min-length=VALUE: search strings shorter than this will not be fuzzy matched. Default: 3.
  • --fuzzy-max-length-discrepancy=VALUE: maximum allowed difference between the fuzzy match, and the search string. Default: 2.
  • --fuzzy-max-distance=VALUE: maximum allowed levenshtein distance between the fuzzy match and the search string. Default: 1.
Fuzzy matching have a number of configurable parameters. The default values in the initial implementation may not be optimal. What _should_ the default values be? At the time of writing, the configurable parameters are: * `--no-fuzzy`: disables fuzzy matching * `--fuzzy-min-length=VALUE`: search strings shorter than this will **not** be fuzzy matched. **Default: 3**. * `--fuzzy-max-length-discrepancy=VALUE`: maximum allowed difference between the fuzzy match, and the search string. **Default: 2**. * `--fuzzy-max-distance=VALUE`: maximum allowed levenshtein distance between the fuzzy match and the search string. **Default: 1**.
dnkl added the
enhancement
question
labels 8 months ago

I disagree there should be tuneable parameters for fuzzy matching.

I think whatever fzf does by default is great. Copy that. Leave some constants in the source code if people feel a strong need to tune them, but otherwise, keep fuzzel simple by not exposing these tuning options.

I disagree there should be tuneable parameters for fuzzy matching. I think whatever `fzf` does by default is great. Copy that. Leave some constants in the source code if people feel a strong need to tune them, but otherwise, keep `fuzzel` simple by not exposing these tuning options.
Poster
Owner

Let me ponder this for a while. And perhaps look at fzf's implementation. It's possible it's too different, and that it uses other parameters, in which case we can't copy them. If so, having command line options, at least for a while/couple of releases, will make it a whole lot easier to tune them.

Let me ponder this for a while. And perhaps look at fzf's implementation. It's possible it's too different, and that it uses other parameters, in which case we _can't_ copy them. If so, having command line options, at least for a while/couple of releases, will make it a whole lot easier to tune them.

Here's a test case that current works poorly with fuzzel but works well with fzf and Rofi's fuzzy mode:

I tried using this as a dmenu drop-in for dmenu-lpass, which helps select passwords from the Lastpass password manager.

I have a many entries that include some version of my name, like **markstos** or **mark@stosberg.com**. But the entry I needed to search for contained AUR and only exactly one entry returned that string.

With Rofi's fuzzy match or fzf, search for "AUR" returns an exact match, but with the current tuning of fuzzel, the entry I'm looking for isn't even in the top 10 results. Even when I try to turn down the fuzziness of fuzzel, I still don't get the exact match for "AUR" in the top 10:

  --fuzzy-max-length-discrepancy=0 --fuzzy-max-distance=0

The exact match should be a better result than the fuzzy matches.

Here's a test case that current works poorly with `fuzzel` but works well with `fzf` and Rofi's fuzzy mode: I tried using this as a dmenu drop-in for `dmenu-lpass`, which helps select passwords from the Lastpass password manager. I have a many entries that include some version of my name, like `**markstos**` or `**mark@stosberg.com**`. But the entry I needed to search for contained **AUR** and *only exactly one entry returned that string.* With Rofi's fuzzy match or `fzf`, search for "AUR" returns an exact match, but with the current tuning of `fuzzel`, the entry I'm looking for isn't even in the top 10 results. Even when I try to turn down the fuzziness of `fuzzel`, I still don't get the exact match for "AUR" in the top 10: ``` --fuzzy-max-length-discrepancy=0 --fuzzy-max-distance=0 ``` The exact match should be a better result than the fuzzy matches.
Poster
Owner

Hmm, yeah, that obviously makes sense. That requires a different change than tweaking the parameters however; what you're after is a different sorting mechanism, where exact matches are sorted before fuzzy matches.

Still, this is something that does make a lot of sense, so let's try to implement this.

Hmm, yeah, that obviously makes sense. That requires a different change than tweaking the parameters however; what you're after is a different sorting mechanism, where exact matches are sorted before fuzzy matches. Still, this is something that does make a lot of sense, so let's try to implement this.
Poster
Owner

what you're after is a different sorting mechanism, where exact matches are sorted before fuzzy matches.

f4101774d7 implements this.

> what you're after is a different sorting mechanism, where exact matches are sorted before fuzzy matches. https://codeberg.org/dnkl/fuzzel/commit/f4101774d73aef5757b38b08dd5262b2d5477746 implements this.

@dnkl -- I can confirm all the fixes in the last 24 hours work. Thanks!

Here's another place that fuzzy matching can improve.

I have an entry named "AmigoFox". With both fzf or rofi --matching fuzzy, I can match this string by typing [Fox Amigo].

*So, both words match exactly, but not in that order. *

With the default fuzzel tuning, no match is found. Here's a command to test with:

echo -e "a\n\nb\nc\nAmigoFox" | fuzzel --dmenu
echo -e "a\n\nb\nc\nAmigoFox" | fzf
echo -e "a\n\nb\nc\nAmigoFox" | rofi -matching fuzzy

I also accidentally tested what happens when a blank line is included. Turns out all three treat an empty line as a valid entry, although it doesn't seem useful to me.

@dnkl -- I can confirm all the fixes in the last 24 hours work. Thanks! Here's another place that fuzzy matching can improve. I have an entry named "AmigoFox". With both `fzf` or `rofi --matching fuzzy`, I can match this string by typing [Fox Amigo]. *So, both words match exactly, but not in that order. * With the default `fuzzel` tuning, no match is found. Here's a command to test with: ``` echo -e "a\n\nb\nc\nAmigoFox" | fuzzel --dmenu echo -e "a\n\nb\nc\nAmigoFox" | fzf echo -e "a\n\nb\nc\nAmigoFox" | rofi -matching fuzzy ``` I also accidentally tested what happens when a blank line is included. Turns out all three treat an empty line as a valid entry, although it doesn't seem useful to me.
Poster
Owner

echo -e "a\n\nb\nc\nAmigoFox" | fzf

At least fzf doesn't really appear to be doing fuzzy matching in this case. Instead, it's matching each word that you type, whereas fuzzel matches the entire string.

Or something like that. It obviously does fuzzy matching as well.

Let me test a bit more. I might decide to release what we currently have, and do improvements like this one later.

> echo -e "a\n\nb\nc\nAmigoFox" | fzf At least fzf doesn't really appear to be doing fuzzy matching in this case. Instead, it's matching each _word_ that you type, whereas fuzzel matches the entire string. Or something like that. It obviously does fuzzy matching as well. Let me test a bit more. I might decide to release what we currently have, and do improvements like this one later.

The best part about how fzf performs matching imho is that when i try to match:

/usr/share/myfolder/myfile.rs

I usually go left to right typing only parts that should belong to one file only.
This gives me option to further narrow down my search even if i dont know what I m looking for.
In this case it would be something like

usshmyfrs

and this also works with spaces in between. (us sh myf rs)

This is behavior I totally miss in fuzzel.
I for example use it to replay already played youtube videos, another case where many share the title prefix but differ in suffix and I m unable to use it like this.

Am I describing different matching type? Because what I see it does now is that it allows me to make mistakes in what I type (to some extent)

The best part about how fzf performs matching imho is that when i try to match: ``` /usr/share/myfolder/myfile.rs ``` I usually go left to right typing only parts that should belong to one file only. This gives me option to further narrow down my search even if i dont know what I m looking for. In this case it would be something like ``` usshmyfrs ``` and this also works with spaces in between. (`us sh myf rs`) This is behavior I totally miss in fuzzel. I for example use it to replay already played youtube videos, another case where many share the title prefix but differ in suffix and I m unable to use it like this. Am I describing different matching type? Because what I see it does now is that it allows me to make mistakes in what I type (to some extent)

I think that simply separating the input query along spaces and matching the parts separately could go a long way. I find myself mistyping things a lot less, which makes the Levenshtein distance much less useful, but I would often like to search for multiple parts of the string.

For example type "Bi Stu" to find "Bitwig Studio" among my applications or "swa tap" to find swaymsg "input * tap enabled" among my quick commands. (Similarly "blu di" for bluetoothctl disconnect, "sli smi" for "Slightly Smiling Face" among emojis and so on...)

(Also, one other application that behaves this way and could be looked at for implementation ideas is bemenu.)

I think that simply separating the input query along spaces and matching the parts separately could go a long way. I find myself mistyping things a lot less, which makes the Levenshtein distance much less useful, but I would often like to search for multiple parts of the string. For example type _"Bi Stu"_ to find _"Bitwig Studio"_ among my applications or _"swa tap"_ to find ` swaymsg "input * tap enabled"` among my quick commands. (Similarly _"blu di"_ for `bluetoothctl disconnect`, _"sli smi"_ for _"Slightly Smiling Face"_ among emojis and so on...) (Also, one other application that behaves this way and could be looked at for implementation ideas is `bemenu`.)
Poster
Owner

My plan moving forward is to improve this iteratively. I will probably do the next release with what we currently have, and then improve upon that.

One of the first (and easiest things) is to do what @isti115 suggests; matching space separated sub-parts separately.

My plan moving forward is to improve this iteratively. I will probably do the next release with what we currently have, and then improve upon that. One of the first (and easiest things) is to do what @isti115 suggests; matching space separated sub-parts separately.
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date

No due date set.

Dependencies

No dependencies set.

Reference: dnkl/fuzzel#76
Loading…
There is no content yet.