Grapheme shaping using libutf8proc #100

Manually merged
dnkl merged 30 commits from harfbuzz into master 4 months ago
dnkl commented 1 year ago
Owner

This is a POC for grapheme shaping. That is, we shape individual grapheme clusters, but not whole text runs. This means we do get support for e.g. emoji ZWJ sequences, but we do not get support for e.g. ligatures.

Unicode grapheme cluster segmentation is done with libutf8proc, and text shaping is done with fcft_grapheme_rasterize() (available in fcft-2.3.0 and later).

This renders several classes of Unicode sequences correctly, but not all. The biggest problem is finding out the display width of the grapheme; glibc's wcswidth() is just the sum of wcwidth() for each character, which is wrong in many cases.

For now, I've implemented a dumb my_wcswidth() that never returns anything larger than 2. Reverted to using glibc's wcswidth() for now, as that causes the least breakage with client applications.

Another problem is graphemes that result in multiple glyphs that are supposed to be rendered after each other, e.g. 👩‍👩‍👧👶; these are currently rendered on top of each other, just like . This is somewhat working now.

utf8proc is an optional dependency. Whether to compile foot with support for grapheme cluster segmentation or not is controlled with the -Dgrapheme-clustering=disabled|enabled|auto meson command line option.

There is also a run-time option, tweak.grapheme-shaping (disabled by default) that enables/disables grapheme shaping. When disabled, old-style zero-width combining is done (i.e. zero-width codepoints are appended to the last cell) instead of cluster segmentation, and the rendering engine will not use fcft_grapheme_rasterize().

Thus, you can get partial shaping support without utf8proc; you can still enable tweak.grapheme-shaping and foot will use fcft_grapheme_rasterize() on all cells with more than a single base character. However, since foot cannot do grapheme cluster segmentation without utf8proc, we are limited to the most basic sequences. In practice, combining marks (which foot's renderer handles even without grapheme shaping).

In other words, to be able to render emoji sequences, you need both a foot build with utf8proc support, and an fcft build with HarfBuzz support, and enable tweak.grapheme-shaping.

TODO

  • Display width - use wcswidth() or implement our own? We use wcswidth().
  • Multi-glyph positioning - odd values from harfbuzz, and seems to depend on font
  • Weird vertical glyph positioning data from HarfBuzz - fcft currently ignores these.
  • Maximum grapheme cluster length? Or allocate dynamically? Keep allocating dynamically.
  • Text/emoji presentation selection. This is done in fcft.
  • Add compile time option to enable/disable grapheme shaping (-Dgrapheme-clustering)
  • Add runtime configuration option to enable/disable grapheme shaping (tweak.grapheme-shaping).
  • Make utf8proc an optional dependency (required if -Dgrapheme-clustering=enabled).
  • Detect when fcft is harfbuzz capable.
  • Restore old combining marks handling, to be used both then utf8proc is not available/have been disabled, or when grapheme shaping has been disabled in the configuration.
  • Decide which OTF tags to enable. Taken from fontconfig.
  • Run benchmarks to verify foot hasn't gotten slower (when grapheme shaping is disabled).
  • Variant selectors.

@sterni

This is a POC for grapheme shaping. That is, we shape individual grapheme clusters, but not whole text runs. This means we **do** get support for e.g. emoji ZWJ sequences, but we do **not** get support for e.g. ligatures. Unicode grapheme cluster segmentation is done with [libutf8proc](https://github.com/JuliaStrings/utf8proc), and text shaping is done with `fcft_grapheme_rasterize()` (available in fcft-2.3.0 and later). This renders several classes of Unicode sequences correctly, but not all. The biggest problem is finding out the display width of the grapheme; glibc's `wcswidth()` is just the sum of `wcwidth()` for each character, which is wrong in many cases. ~~For now, I've implemented a dumb `my_wcswidth()` that never returns anything larger than 2.~~ Reverted to using glibc's `wcswidth()` for now, as that causes the least breakage with client applications. ~~Another problem is graphemes that result in multiple glyphs that are supposed to be rendered **after** each other, e.g. 👩‍👩‍👧‍👶; these are currently rendered on top of each other, just like `g̈`.~~ This is somewhat working now. utf8proc is an optional dependency. Whether to compile foot with support for grapheme cluster segmentation or not is controlled with the `-Dgrapheme-clustering=disabled|enabled|auto` meson command line option. There is also a run-time option, `tweak.grapheme-shaping` (**disabled** by default) that enables/disables grapheme shaping. When disabled, old-style zero-width combining is done (i.e. zero-width codepoints are appended to the last cell) instead of cluster segmentation, and the rendering engine will **not** use `fcft_grapheme_rasterize()`. Thus, you can get _partial_ shaping support _without_ utf8proc; you can still enable `tweak.grapheme-shaping` and foot will use `fcft_grapheme_rasterize()` on all cells with more than a single base character. However, since foot cannot do grapheme cluster segmentation without utf8proc, we are limited to the most basic sequences. In practice, combining marks (which foot's renderer handles even without grapheme shaping). In other words, to be able to render emoji sequences, you need **both** a foot build with utf8proc support, **and** an fcft build with HarfBuzz support, **and** enable `tweak.grapheme-shaping`. **TODO** * [x] ~~Display width - use `wcswidth()` or implement our own?~~ We use `wcswidth()`. * [x] Multi-glyph positioning - odd values from harfbuzz, and seems to depend on font * [x] Weird vertical glyph positioning data from HarfBuzz - fcft currently ignores these. * [x] ~~Maximum grapheme cluster length? Or allocate dynamically?~~ Keep allocating dynamically. * [x] ~~Text/emoji presentation selection.~~ This is done in fcft. * [x] Add compile time option to enable/disable grapheme shaping (`-Dgrapheme-clustering`) * [x] Add runtime configuration option to enable/disable grapheme shaping (`tweak.grapheme-shaping`). * [x] Make utf8proc an optional dependency (required if `-Dgrapheme-clustering=enabled`). * [x] Detect when fcft is harfbuzz capable. * [x] Restore old combining marks handling, to be used both then utf8proc is not available/have been disabled, or when grapheme shaping has been disabled in the configuration. * [x] ~~Decide which OTF tags to enable.~~ Taken from fontconfig. * [x] Run benchmarks to verify foot hasn't gotten slower (when grapheme shaping is disabled). * [x] Variant selectors. @sterni
dnkl added the
enhancement
label 1 year ago
dnkl commented 1 year ago
Poster
Owner

This renders several classes of Unicode sequences correctly, but not all. The biggest problem is finding out the display width of the grapheme; glibc's wcswidth() is just the sum of wcwidth() for each character, which is wrong in many cases.

For now, I've implemented a dumb my_wcswidth() that never returns anything larger than 2.

For this to be done correctly, you probably won't get around touching UnicodeData.txt. kitty for example generates header files for lookup operations from it.

My current guess is that it's actually relatively simple if you have already split the graphemes: The first code point decides the (initial) width then you'd have code points which decrease the width (e. g. text representation variation character after emojis I suppose), increase it (e. g. emoji representation variation character if default representation is text) or leave the width untouched. This is only my guess currently (also it seems to me at first glance that kitty does it like this) and I am not sure if it works for all cases, unicode is huge after all.

Another problem is graphemes that result in multiple glyphs that are supposed to be rendered after each other, e.g. 👩‍👩‍👧👶; these are currently rendered on top of each other, just like .

Just to understand you correctly: In the case that there's no family glyph in the font they should be rendered after each other, but if it is missing they still are?!

> This renders several classes of Unicode sequences correctly, but not all. The biggest problem is finding out the display width of the grapheme; glibc's `wcswidth()` is just the sum of `wcwidth()` for each character, which is wrong in many cases. > > For now, I've implemented a dumb `my_wcswidth()` that never returns anything larger than 2. For this to be done correctly, you probably won't get around touching `UnicodeData.txt`. kitty for example [generates header files](https://github.com/kovidgoyal/kitty/blob/master/gen-wcwidth.py#L494) for lookup operations from it. My current guess is that it's actually relatively simple if you have already split the graphemes: The first code point decides the (initial) width then you'd have code points which decrease the width (e. g. text representation variation character after emojis I suppose), increase it (e. g. emoji representation variation character if default representation is text) or leave the width untouched. This is _only_ my guess currently (also it seems to me at first glance that kitty does it like this) and I am not sure if it works for all cases, unicode is huge after all. > Another problem is graphemes that result in multiple glyphs that are supposed to be rendered **after** each other, e.g. 👩‍👩‍👧‍👶; these are currently rendered on top of each other, just like `g̈`. Just to understand you correctly: In the case that there's no family glyph in the font they should be rendered after each other, but if it is missing they still are?!
sterni reviewed 1 year ago
util.h Outdated
my_wcswidth(const wchar_t *s, size_t n)
{
int ret = wcswidth(s, n);
return max(0, min(ret, 2));
sterni commented 1 year ago
Poster

Have you tried something like:

static inline int
my_wcswidth(const wchar_t *s, size_t n)
{
  int ret = 0;
  for(size_t i = 0; i < n; i++) {
    ret += utf8proc_charwidth((utf8proc_int32_t) s[i]);
  }
  
  return max(0, ret);
}

This would be roughly the approach I outlined in my comment without having to do nasty code generation ourselves. Something else we could try is to base this on character categories.

Have you tried something like: ```c static inline int my_wcswidth(const wchar_t *s, size_t n) { int ret = 0; for(size_t i = 0; i < n; i++) { ret += utf8proc_charwidth((utf8proc_int32_t) s[i]); } return max(0, ret); } ``` This would be roughly the approach I outlined in my comment without having to do nasty code generation ourselves. Something else we could try is to base this on character categories.
dnkl commented 1 year ago
Poster
Owner

That wont work because there are sequences where each individual character has a non-zero width, but when joined, the width is not the sum of the individual characters.

Several emoji+ZWJ+pictographic sequences and (flag) tag sequences behave like this I believe.

That wont work because there are sequences where each individual character has a non-zero width, but when joined, the width is **not** the sum of the individual characters. Several emoji+ZWJ+pictographic sequences and (flag) tag sequences behave like this I believe.
sterni commented 1 year ago
Poster

Right, of course.

Right, of course.

Another problem is graphemes that result in multiple glyphs that are supposed to be rendered after each other, e.g. 👩‍👩‍👧👶; these are currently rendered on top of each other, just like g̈.

Judging from the codeberg markdown editor the issue was one ZWJ to much:
i. e. woman<ZWJ>woman<ZWJ>girl<ZWJ>baby instead of woman<ZWJ>woman<ZWJ>girlbaby. Probably harfbuzz doesn't know that you can't add a baby to a family ZWJ-sequence (see emoji-zwj-sequences.txt) and utf8proc probably just continues the grapheme after a ZWJ.

A similar case without the last ZWJ works fine (see attachments):

Prelude Data.Char> test <- readFile "/home/lukas/foot-test.txt"
Prelude Data.Char> putStr test
👨‍👨‍👦👶
Prelude Data.Char> map ord test  -- code points
[128104,8205,128104,8205,128102,128118,10]
> Another problem is graphemes that result in multiple glyphs that are supposed to be rendered after each other, e.g. 👩‍👩‍👧‍👶; these are currently rendered on top of each other, just like g̈. Judging from the codeberg markdown editor the issue was one ZWJ to much: i. e. `woman<ZWJ>woman<ZWJ>girl<ZWJ>baby` instead of `woman<ZWJ>woman<ZWJ>girlbaby`. Probably harfbuzz doesn't know that you can't add a baby to a family ZWJ-sequence (see [emoji-zwj-sequences.txt](https://www.unicode.org/Public/emoji/13.0/emoji-zwj-sequences.txt)) and utf8proc probably just continues the grapheme after a ZWJ. A similar case without the last ZWJ works fine (see attachments): ```haskell Prelude Data.Char> test <- readFile "/home/lukas/foot-test.txt" Prelude Data.Char> putStr test 👨‍👨‍👦👶 Prelude Data.Char> map ord test -- code points [128104,8205,128104,8205,128102,128118,10] ```

Also I managed to get foot to segfault and abort while looking through emoji-test.txt for testing.

It seems to crash if glyphs are missing, since this is a line it crashes when rendering (see the attached log).

1F972                                      ; fully-qualified     # 🥲 E13.0 smiling face with tear

Also there is another crash which I can trigger by running head -n 300 emoji-test.txt a couple of times. It doesn't always crash and often with different errors. I suspect there are characters missing in unrendered parts of the output which sometimes causes issues. Some of the crashes I got where:

info: fcft.c:607: /nix/store/yfmmkp58xfv1vrzrzsgnn4f0k1h5md6v-google-fonts-2019-07-14/share/fonts/truetype/AdobeBlank-Regular.ttf: size=10.00pt/24px, dpi=173.51
info: fcft.c:607: /nix/store/vdwhbj433rlsdlknwpmyq3sk9gv3x1ac-noto-fonts-emoji-unstable-2019-10-22/share/fonts/noto/NotoColorEmoji.ttf: size=45.23pt/109px, dpi=173.51
info: fcft.c:607: /nix/store/vdwhbj433rlsdlknwpmyq3sk9gv3x1ac-noto-fonts-emoji-unstable-2019-10-22/share/fonts/noto/NotoColorEmoji.ttf: size=45.23pt/109px, dpi=173.51
free(): corrupted unsorted chunks
fish: “./result/bin/foot” terminated by signal SIGABRT (Abort)
info: vt.c:611: wanted = 2, 20
info: vt.c:611: wanted = 2, 20
info: vt.c:611: wanted = 2, 20
info: vt.c:611: wanted = 2, 20
info: vt.c:611: wanted = 2, 20
info: vt.c:611: wanted = 2, 20
info: vt.c:611: wanted = 2, 20
info: vt.c:611: wanted = 2, 20
info: vt.c:611: wanted = 2, 20
corrupted size vs. prev_size
fish: “./result/bin/foot” terminated by signal SIGABRT (Abort)

It also segfaulted once like in the initial example. I can't really narrow this down to a specific page, but I could provide you with a backtrace from gdb or similar if you can't reproduce these issues.

Also I managed to get `foot` to segfault and abort while looking through `emoji-test.txt` for testing. It seems to crash if glyphs are missing, since this is a line it crashes when rendering (see the attached log). ``` 1F972 ; fully-qualified # 🥲 E13.0 smiling face with tear ``` Also there is another crash which I can trigger by running `head -n 300 emoji-test.txt` a couple of times. It doesn't always crash and often with different errors. I suspect there are characters missing in unrendered parts of the output which sometimes causes issues. Some of the crashes I got where: ``` info: fcft.c:607: /nix/store/yfmmkp58xfv1vrzrzsgnn4f0k1h5md6v-google-fonts-2019-07-14/share/fonts/truetype/AdobeBlank-Regular.ttf: size=10.00pt/24px, dpi=173.51 info: fcft.c:607: /nix/store/vdwhbj433rlsdlknwpmyq3sk9gv3x1ac-noto-fonts-emoji-unstable-2019-10-22/share/fonts/noto/NotoColorEmoji.ttf: size=45.23pt/109px, dpi=173.51 info: fcft.c:607: /nix/store/vdwhbj433rlsdlknwpmyq3sk9gv3x1ac-noto-fonts-emoji-unstable-2019-10-22/share/fonts/noto/NotoColorEmoji.ttf: size=45.23pt/109px, dpi=173.51 free(): corrupted unsorted chunks fish: “./result/bin/foot” terminated by signal SIGABRT (Abort) ``` ``` info: vt.c:611: wanted = 2, 20 info: vt.c:611: wanted = 2, 20 info: vt.c:611: wanted = 2, 20 info: vt.c:611: wanted = 2, 20 info: vt.c:611: wanted = 2, 20 info: vt.c:611: wanted = 2, 20 info: vt.c:611: wanted = 2, 20 info: vt.c:611: wanted = 2, 20 info: vt.c:611: wanted = 2, 20 corrupted size vs. prev_size fish: “./result/bin/foot” terminated by signal SIGABRT (Abort) ``` It also segfaulted once like in the initial example. I can't really narrow this down to a specific page, but I could provide you with a backtrace from gdb or similar if you can't reproduce these issues.
Collaborator

ZWJ sequences seem like a bit of a misfeature to me. There's no standard way for a terminal and a client to agree on the width of any given sequence and, according to the Unicode spec, implementions are basically free to invent any number of their own sequences.

I was thinking how to solve this issue in my text editor and the only reasonable approach I've come up with so far is to not support them and to deliberately not emit the ZWJ codepoints to the terminal. Otherwise, the editor and the terminal can end up with completely different ideas of the cursor position.

ZWJ sequences seem like a bit of a misfeature to me. There's no standard way for a terminal and a client to agree on the width of any given sequence and, according to the Unicode spec, implementions are basically free to invent any number of their own sequences. I was thinking how to solve this issue in my text editor and the only reasonable approach I've come up with so far is to not support them and to deliberately not emit the ZWJ codepoints to the terminal. Otherwise, the editor and the terminal can end up with completely different ideas of the cursor position.
dnkl commented 1 year ago
Poster
Owner

@craigbarnes think of this branch as an experiment; it helps with testing fcft+harfbuzz, and it lets me see just what kinds of hoops we need to jump through in foot.

I was fully aware there would be a display width problem. And it is already clear that the problem is much worse than I had anticipated.

And it's not just ZWJ; there are other sequences without ZWJs that are still considered a single grapheme. Skin modifiers, for example.

As long as we have to use special libraries, or even parse the Unicode spec ourselves and implement our own functions, i.e. as long as there's no common interface between the terminal and the application, this feature will not be enabled by default, if it gets merged at all.

@craigbarnes think of this branch as an experiment; it helps with testing fcft+harfbuzz, and it lets me see just what kinds of hoops we need to jump through in foot. I was fully aware there would be a display width problem. And it is already clear that the problem is much worse than I had anticipated. And it's not _just_ ZWJ; there are other sequences _without_ ZWJs that are still considered a single grapheme. Skin modifiers, for example. As long as we have to use special libraries, or even parse the Unicode spec ourselves and implement our own functions, i.e. as long as there's no _common_ interface between the terminal and the application, this feature will **not** be enabled by default, **if** it gets merged at all.
dnkl commented 1 year ago
Poster
Owner

@sterni

Another problem is graphemes that result in multiple glyphs that are supposed to be rendered after each other, e.g. 👩‍👩‍👧👶; these are currently rendered on top of each other, just like .

Just to understand you correctly: In the case that there's no family glyph in the font they should be rendered after each other, but if it is missing they still are?!

In this case, as in several others, I think it depends on the font. In my case, harfbuzz returned two glyphs, one family-like and one baby. These are supposed to be rendered after each other (but are still considered a single grapheme). Other fonts may have a single glyph for it.

The problem is I'm not (yet) getting any positioning information from harfbuzz, and I'm just rendering all the glyphs I get on top of each other. There's probably a way to get positioning offsets from harfbuzz, that needs to be applied to the glyphs.

@sterni >> Another problem is graphemes that result in multiple glyphs that are supposed to be rendered **after** each other, e.g. 👩‍👩‍👧‍👶; these are currently rendered on top of each other, just like `g̈`. > Just to understand you correctly: In the case that there's no family glyph in the font they should be rendered after each other, but if it is missing they still are?! In this case, as in several others, I think it depends on the font. In my case, harfbuzz returned two glyphs, one family-like and one baby. These are supposed to be rendered after each other (but are still considered a single grapheme). Other fonts may have a single glyph for it. The problem is I'm not (yet) getting any positioning information from harfbuzz, and I'm just rendering all the glyphs I get on top of each other. There's probably a way to get positioning offsets from harfbuzz, that needs to be applied to the glyphs.
dnkl commented 1 year ago
Poster
Owner

Judging from the codeberg markdown editor the issue was one ZWJ to much:
i. e. womanwomangirlbaby instead of womanwomangirlbaby. Probably harfbuzz doesn’t know that you can’t add a baby to a family ZWJ-sequence (see emoji-zwj-sequences.txt) and utf8proc probably just continues the grapheme after a ZWJ.

That could very well be it. The sequence was listed with a non-standard warning, if I remember correctly.

Still, the glyphs shouldn't be rendered on top of each other :)

> Judging from the codeberg markdown editor the issue was one ZWJ to much: i. e. woman<ZWJ>woman<ZWJ>girl<ZWJ>baby instead of woman<ZWJ>woman<ZWJ>girlbaby. Probably harfbuzz doesn’t know that you can’t add a baby to a family ZWJ-sequence (see emoji-zwj-sequences.txt) and utf8proc probably just continues the grapheme after a ZWJ. That could very well be it. The sequence was listed with a non-standard warning, if I remember correctly. Still, the glyphs shouldn't be rendered on top of each other :)
dnkl commented 1 year ago
Poster
Owner

@sterni regarding the crash, it would be helpful if you could get a strack trace. I'm seeing a crash in Freetype/libpng, for what appears to be a valid glyph index:

#0  0x00007ffff668bf2a in __memmove_sse2_unaligned_erms () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007ffff63711be in ?? () from /usr/lib/libpng16.so.16
No symbol table info available.
#2  0x00007ffff6363a58 in png_read_row () from /usr/lib/libpng16.so.16
No symbol table info available.
#3  0x00007ffff63655a2 in png_read_image () from /usr/lib/libpng16.so.16
No symbol table info available.
#4  0x00007ffff72afded in ?? () from /usr/lib/libfreetype.so.6
No symbol table info available.
#5  0x00007ffff72aff81 in ?? () from /usr/lib/libfreetype.so.6
No symbol table info available.
#6  0x00007ffff72ae124 in ?? () from /usr/lib/libfreetype.so.6
No symbol table info available.
#7  0x00007ffff72b7a76 in ?? () from /usr/lib/libfreetype.so.6
No symbol table info available.
#8  0x00007ffff728a362 in ?? () from /usr/lib/libfreetype.so.6
No symbol table info available.
#9  0x00007ffff726dd14 in FT_Load_Glyph () from /usr/lib/libfreetype.so.6
No symbol table info available.
#10 0x000055555599fe14 in glyph_for_index (inst=0x60b00001bc60, index=563, subpixel=FCFT_SUBPIXEL_NONE, glyph=0x604000045550)
    at ../../subprojects/fcft/fcft.c:987
        pix = 0x0
        data = 0x0
        err = 0
        render_flags = 2
        bgr = false
        unlock_ft_lock = false
        bitmap = 0x612000010158
        pix_format = PIXMAN_a1
        width = 7
        rows = 15
        stride = 4
        __PRETTY_FUNCTION__ = "glyph_for_index"
        x = 0
        y = 11
#11 0x00005555559a4998 in glyph_for_wchar (inst=0x60b00001bc60, wc=128071 L'👇', subpixel=FCFT_SUBPIXEL_NONE, glyph=0x604000045550)
    at ../../subprojects/fcft/fcft.c:1273
        idx = 563
        ret = 254
#12 0x00005555559a6d5f in fcft_glyph_rasterize (_font=0x611000011840, wc=128071 L'👇', subpixel=FCFT_SUBPIXEL_NONE)
    at ../../subprojects/fcft/fcft.c:1430
        it = 0x606000011a20
        it_next = 0x606000011a80
        font = 0x611000011840
        entry = 0x61d000011938
        glyph = 0x604000045550
        __PRETTY_FUNCTION__ = "fcft_glyph_rasterize"
        noone = true
        got_glyph = false

Edit: it appears to crash for different reasons every time. Probably memory corruption.

Edit 2: I can run through it if I set footrc:workers=0. So it's a threading/locking issue.

Edit 3: should be fixed now :)

@sterni regarding the crash, it would be helpful if you could get a strack trace. I'm seeing a crash in Freetype/libpng, for what appears to be a valid glyph index: ``` #0 0x00007ffff668bf2a in __memmove_sse2_unaligned_erms () from /usr/lib/libc.so.6 No symbol table info available. #1 0x00007ffff63711be in ?? () from /usr/lib/libpng16.so.16 No symbol table info available. #2 0x00007ffff6363a58 in png_read_row () from /usr/lib/libpng16.so.16 No symbol table info available. #3 0x00007ffff63655a2 in png_read_image () from /usr/lib/libpng16.so.16 No symbol table info available. #4 0x00007ffff72afded in ?? () from /usr/lib/libfreetype.so.6 No symbol table info available. #5 0x00007ffff72aff81 in ?? () from /usr/lib/libfreetype.so.6 No symbol table info available. #6 0x00007ffff72ae124 in ?? () from /usr/lib/libfreetype.so.6 No symbol table info available. #7 0x00007ffff72b7a76 in ?? () from /usr/lib/libfreetype.so.6 No symbol table info available. #8 0x00007ffff728a362 in ?? () from /usr/lib/libfreetype.so.6 No symbol table info available. #9 0x00007ffff726dd14 in FT_Load_Glyph () from /usr/lib/libfreetype.so.6 No symbol table info available. #10 0x000055555599fe14 in glyph_for_index (inst=0x60b00001bc60, index=563, subpixel=FCFT_SUBPIXEL_NONE, glyph=0x604000045550) at ../../subprojects/fcft/fcft.c:987 pix = 0x0 data = 0x0 err = 0 render_flags = 2 bgr = false unlock_ft_lock = false bitmap = 0x612000010158 pix_format = PIXMAN_a1 width = 7 rows = 15 stride = 4 __PRETTY_FUNCTION__ = "glyph_for_index" x = 0 y = 11 #11 0x00005555559a4998 in glyph_for_wchar (inst=0x60b00001bc60, wc=128071 L'👇', subpixel=FCFT_SUBPIXEL_NONE, glyph=0x604000045550) at ../../subprojects/fcft/fcft.c:1273 idx = 563 ret = 254 #12 0x00005555559a6d5f in fcft_glyph_rasterize (_font=0x611000011840, wc=128071 L'👇', subpixel=FCFT_SUBPIXEL_NONE) at ../../subprojects/fcft/fcft.c:1430 it = 0x606000011a20 it_next = 0x606000011a80 font = 0x611000011840 entry = 0x61d000011938 glyph = 0x604000045550 __PRETTY_FUNCTION__ = "fcft_glyph_rasterize" noone = true got_glyph = false ``` **Edit**: it appears to crash for different reasons every time. Probably memory corruption. **Edit 2**: I can run through it if I set `footrc:workers=0`. So it's a threading/locking issue. **Edit 3**: should be fixed now :)

Can't reproduce the old crash on the branch anymore but a new one:

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./result/bin/foot...
(No debugging symbols found in ./result/bin/foot)
(gdb) run
Starting program: /nix/store/fqxvmhwdky3a1w6s612yb2kpqmd1d20n-foot-unstable/bin/foot 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/aqq6367snc1zh3fs1pc4j4zm5h80vkkz-glibc-2.31/lib/libthread_db.so.1".
info: main.c:315: version: 1.4.4
info: main.c:322: arch: x86_64/64-bit, 
info: config.c:1658: loading configuration from /home/lukas/.config/footrc
info: main.c:337: locale: en_US.UTF-8
info: wayland.c:1049: eDP-1: 1920x1080+0x0@60Hz 0x226D 12.70" scale=1 PPI=174x180 (physical) PPI=174x180 (logical), DPI=173.51
[Detaching after fork from child process 7080]
info: wayland.c:1185: requesting SSD decorations
[New Thread 0x7fffe9bb5700 (LWP 7081)]
info: fcft.c:205: fcft: 2.2.6
info: [New Thread 0x7fffe93b4700 (LWP 7082)]
fcft.c:215: fontconfig: 2.13.92
info: fcft.c:221: freetype: 2.10.2
[New Thread 0x7fffe8bb3700 (LWP 7083)]
[New Thread 0x7fffe3fff700 (LWP 7084)]
info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono-Bold.ttf: size=10.00pt/24px, dpi=173.51
info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono.ttf: size=10.00pt/24px, dpi=173.51
info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono-Oblique.ttf: size=10.00pt/24px, dpi=173.51
info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono-BoldOblique.ttf: size=10.00pt/24px, dpi=173.51
[Thread 0x7fffe93b4700 (LWP 7082) exited]
[Thread 0x7fffe9bb5700 (LWP 7081) exited]
[Thread 0x7fffe8bb3700 (LWP 7083) exited]
info: terminal.c:557: cell width=14, height=29
info: terminal.c:498: using 4 rendering threads
[Thread 0x7fffe3fff700 (LWP 7084) exited]
[New Thread 0x7fffe3fff700 (LWP 7088)]
[New Thread 0x7fffe8bb3700 (LWP 7089)]
[New Thread 0x7fffe93b4700 (LWP 7090)]
[New Thread 0x7fffe9bb5700 (LWP 7091)]
info: wayland.c:646: using SSD decorations
info: wayland.c:1331: cursor theme: (null), size: 24, scale: 1
 err: render.c:2207: failed to load xcursor pointer 'text'
info: fcft.c:607: /nix/store/yfmmkp58xfv1vrzrzsgnn4f0k1h5md6v-google-fonts-2019-07-14/share/fonts/truetype/AdobeBlank-Regular.ttf: size=10.00pt/24px, dpi=173.51
info: fcft.c:607: /nix/store/vdwhbj433rlsdlknwpmyq3sk9gv3x1ac-noto-fonts-emoji-unstable-2019-10-22/share/fonts/noto/NotoColorEmoji.ttf: size=45.23pt/109px, dpi=173.51
info: vt.c:612: wanted = 2, 20
 err: fcft.c:988: /nix/store/yfmmkp58xfv1vrzrzsgnn4f0k1h5md6v-google-fonts-2019-07-14/share/fonts/truetype/AdobeBlank-Regular.ttf: failed to load glyph #1396: unknown error

Thread 8 "foot:render:3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe93b4700 (LWP 7090)]
0x00007ffff7e36670 in pixman_image_get_format () from /nix/store/dahdfj53m84rjlzjhhg1kqlzw4wr8z7n-pixman-0.38.4/lib/libpixman-1.so.0
(gdb) backtrace 
#0  0x00007ffff7e36670 in pixman_image_get_format () from /nix/store/dahdfj53m84rjlzjhhg1kqlzw4wr8z7n-pixman-0.38.4/lib/libpixman-1.so.0
#1  0x000000000041b070 in render_cell ()
#2  0x000000000041dfbd in render_worker_thread ()
#3  0x00007ffff7cc3fa2 in start_thread () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libpthread.so.0
#4  0x00007ffff7bf4d2f in clone () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6

Inside foot I executed:

tail -n +212 emoji-test.txt | head -n 1

which is this line:

1F573 FE0F                                 ; fully-qualified     # 🕳️ E0.7 hole
Can't reproduce the old crash on the branch anymore but a new one: ``` For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./result/bin/foot... (No debugging symbols found in ./result/bin/foot) (gdb) run Starting program: /nix/store/fqxvmhwdky3a1w6s612yb2kpqmd1d20n-foot-unstable/bin/foot [Thread debugging using libthread_db enabled] Using host libthread_db library "/nix/store/aqq6367snc1zh3fs1pc4j4zm5h80vkkz-glibc-2.31/lib/libthread_db.so.1". info: main.c:315: version: 1.4.4 info: main.c:322: arch: x86_64/64-bit, info: config.c:1658: loading configuration from /home/lukas/.config/footrc info: main.c:337: locale: en_US.UTF-8 info: wayland.c:1049: eDP-1: 1920x1080+0x0@60Hz 0x226D 12.70" scale=1 PPI=174x180 (physical) PPI=174x180 (logical), DPI=173.51 [Detaching after fork from child process 7080] info: wayland.c:1185: requesting SSD decorations [New Thread 0x7fffe9bb5700 (LWP 7081)] info: fcft.c:205: fcft: 2.2.6 info: [New Thread 0x7fffe93b4700 (LWP 7082)] fcft.c:215: fontconfig: 2.13.92 info: fcft.c:221: freetype: 2.10.2 [New Thread 0x7fffe8bb3700 (LWP 7083)] [New Thread 0x7fffe3fff700 (LWP 7084)] info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono-Bold.ttf: size=10.00pt/24px, dpi=173.51 info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono.ttf: size=10.00pt/24px, dpi=173.51 info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono-Oblique.ttf: size=10.00pt/24px, dpi=173.51 info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono-BoldOblique.ttf: size=10.00pt/24px, dpi=173.51 [Thread 0x7fffe93b4700 (LWP 7082) exited] [Thread 0x7fffe9bb5700 (LWP 7081) exited] [Thread 0x7fffe8bb3700 (LWP 7083) exited] info: terminal.c:557: cell width=14, height=29 info: terminal.c:498: using 4 rendering threads [Thread 0x7fffe3fff700 (LWP 7084) exited] [New Thread 0x7fffe3fff700 (LWP 7088)] [New Thread 0x7fffe8bb3700 (LWP 7089)] [New Thread 0x7fffe93b4700 (LWP 7090)] [New Thread 0x7fffe9bb5700 (LWP 7091)] info: wayland.c:646: using SSD decorations info: wayland.c:1331: cursor theme: (null), size: 24, scale: 1 err: render.c:2207: failed to load xcursor pointer 'text' info: fcft.c:607: /nix/store/yfmmkp58xfv1vrzrzsgnn4f0k1h5md6v-google-fonts-2019-07-14/share/fonts/truetype/AdobeBlank-Regular.ttf: size=10.00pt/24px, dpi=173.51 info: fcft.c:607: /nix/store/vdwhbj433rlsdlknwpmyq3sk9gv3x1ac-noto-fonts-emoji-unstable-2019-10-22/share/fonts/noto/NotoColorEmoji.ttf: size=45.23pt/109px, dpi=173.51 info: vt.c:612: wanted = 2, 20 err: fcft.c:988: /nix/store/yfmmkp58xfv1vrzrzsgnn4f0k1h5md6v-google-fonts-2019-07-14/share/fonts/truetype/AdobeBlank-Regular.ttf: failed to load glyph #1396: unknown error Thread 8 "foot:render:3" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffe93b4700 (LWP 7090)] 0x00007ffff7e36670 in pixman_image_get_format () from /nix/store/dahdfj53m84rjlzjhhg1kqlzw4wr8z7n-pixman-0.38.4/lib/libpixman-1.so.0 (gdb) backtrace #0 0x00007ffff7e36670 in pixman_image_get_format () from /nix/store/dahdfj53m84rjlzjhhg1kqlzw4wr8z7n-pixman-0.38.4/lib/libpixman-1.so.0 #1 0x000000000041b070 in render_cell () #2 0x000000000041dfbd in render_worker_thread () #3 0x00007ffff7cc3fa2 in start_thread () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libpthread.so.0 #4 0x00007ffff7bf4d2f in clone () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6 ``` Inside foot I executed: ``` tail -n +212 emoji-test.txt | head -n 1 ``` which is this line: ``` 1F573 FE0F ; fully-qualified # 🕳️ E0.7 hole ```

The other crash looks like this:

Starting program: /nix/store/fqxvmhwdky3a1w6s612yb2kpqmd1d20n-foot-unstable/bin/foot 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/aqq6367snc1zh3fs1pc4j4zm5h80vkkz-glibc-2.31/lib/libthread_db.so.1".
info: main.c:315: version: 1.4.4
info: main.c:322: arch: x86_64/64-bit, 
info: config.c:1658: loading configuration from /home/lukas/.config/footrc
info: main.c:337: locale: en_US.UTF-8
info: wayland.c:1049: eDP-1: 1920x1080+0x0@60Hz 0x226D 12.70" scale=1 PPI=174x180 (physical) PPI=174x180 (logical), DPI=173.51
[Detaching after fork from child process 7562]
info: wayland.c:1185: requesting SSD decorations
[New Thread 0x7fffe9bb5700 (LWP 7563)]
info: fcft.c:205: fcft: 2.2.6
info: fcft.c:215: fontconfig: 2.13.92
info: fcft.c:221: freetype: 2.10.2
[New Thread 0x7fffe93b4700 (LWP 7564)]
[New Thread 0x7fffe8bb3700 (LWP 7565)]
[New Thread 0x7fffe3fff700 (LWP 7566)]
info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono-Bold.ttf: size=10.00pt/24px, dpi=173.51
info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono.ttf: size=10.00pt/24px, dpi=173.51
info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono-BoldOblique.ttf: size=10.00pt/24px, dpi=173.51
info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono-Oblique.ttf: size=10.00pt/24px, dpi=173.51
[Thread 0x7fffe93b4700 (LWP 7564) exited]
[Thread 0x7fffe9bb5700 (LWP 7563) exited]
[Thread 0x7fffe3fff700 (LWP 7566) exited]
info: terminal.c:557: cell width=14, height=29
info: terminal.c:498: using 4 rendering threads
[New Thread 0x7fffe3fff700 (LWP 7570)]
[Thread 0x7fffe8bb3700 (LWP 7565) exited]
[New Thread 0x7fffe8bb3700 (LWP 7571)]
[New Thread 0x7fffe93b4700 (LWP 7572)]
[New Thread 0x7fffe9bb5700 (LWP 7573)]
info: wayland.c:646: using SSD decorations
info: wayland.c:1331: cursor theme: (null), size: 24, scale: 1
 err: render.c:2207: failed to load xcursor pointer 'text'
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 3, 20
info: vt.c:612: wanted = 4, 20
info: vt.c:612: wanted = 5, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 3, 20
info: vt.c:612: wanted = 4, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 3, 20
info: vt.c:612: wanted = 4, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 3, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: vt.c:612: wanted = 2, 20
info: fcft.c:607: /nix/store/yfmmkp58xfv1vrzrzsgnn4f0k1h5md6v-google-fonts-2019-07-14/share/fonts/truetype/AdobeBlank-Regular.ttf: size=10.00pt/24px, dpi=173.51
info: fcft.c:607: /nix/store/vdwhbj433rlsdlknwpmyq3sk9gv3x1ac-noto-fonts-emoji-unstable-2019-10-22/share/fonts/noto/NotoColorEmoji.ttf: size=45.23pt/109px, dpi=173.51
info: fcft.c:607: /nix/store/vdwhbj433rlsdlknwpmyq3sk9gv3x1ac-noto-fonts-emoji-unstable-2019-10-22/share/fonts/noto/NotoColorEmoji.ttf: size=45.23pt/109px, dpi=173.51
info: fcft.c:607: /nix/store/vdwhbj433rlsdlknwpmyq3sk9gv3x1ac-noto-fonts-emoji-unstable-2019-10-22/share/fonts/noto/NotoColorEmoji.ttf: size=45.23pt/109px, dpi=173.51
info: fcft.c:607: /nix/store/vdwhbj433rlsdlknwpmyq3sk9gv3x1ac-noto-fonts-emoji-unstable-2019-10-22/share/fonts/noto/NotoColorEmoji.ttf: size=45.23pt/109px, dpi=173.51
free(): corrupted unsorted chunks

Thread 6 "foot:render:1" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe3fff700 (LWP 7570)]
0x00007ffff7b3508a in raise () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6
(gdb) backtrace 
#0  0x00007ffff7b3508a in raise () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6
#1  0x00007ffff7b1f528 in abort () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6
#2  0x00007ffff7b768a8 in __libc_message () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6
#3  0x00007ffff7b7da0a in malloc_printerr () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6
#4  0x00007ffff7b7f8c1 in _int_free () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6
#5  0x00007ffff79f2a57 in png_destroy_read_struct () from /nix/store/zjrnaa0c9hiqkc21ny5p5k3ijk731prv-libpng-apng-1.6.37/lib/libpng16.so.16
#6  0x00007ffff7a807df in Load_SBit_Png () from /nix/store/6nlkl2jqywpyxwk7c8il8kr5d3fkplpz-freetype-2.10.2/lib/libfreetype.so.6
#7  0x00007ffff7a80c01 in tt_sbit_decoder_load_png () from /nix/store/6nlkl2jqywpyxwk7c8il8kr5d3fkplpz-freetype-2.10.2/lib/libfreetype.so.6
#8  0x00007ffff7a7ee03 in tt_sbit_decoder_load_image () from /nix/store/6nlkl2jqywpyxwk7c8il8kr5d3fkplpz-freetype-2.10.2/lib/libfreetype.so.6
#9  0x00007ffff7a885d5 in tt_face_load_sbit_image () from /nix/store/6nlkl2jqywpyxwk7c8il8kr5d3fkplpz-freetype-2.10.2/lib/libfreetype.so.6
#10 0x00007ffff7a5b682 in TT_Load_Glyph () from /nix/store/6nlkl2jqywpyxwk7c8il8kr5d3fkplpz-freetype-2.10.2/lib/libfreetype.so.6
#11 0x00007ffff7a40b94 in FT_Load_Glyph () from /nix/store/6nlkl2jqywpyxwk7c8il8kr5d3fkplpz-freetype-2.10.2/lib/libfreetype.so.6
#12 0x00007ffff7ce1a62 in glyph_for_index () from /nix/store/bmgxagf5gsmrvw2yh5r5hj3nnmjy2zya-fcft-unstable/lib/libfcft.so.3
#13 0x00007ffff7ce3931 in fcft_glyph_rasterize () from /nix/store/bmgxagf5gsmrvw2yh5r5hj3nnmjy2zya-fcft-unstable/lib/libfcft.so.3
#14 0x000000000041adb8 in render_cell ()
#15 0x000000000041dfbd in render_worker_thread ()
#16 0x00007ffff7cc3fa2 in start_thread () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libpthread.so.0
#17 0x00007ffff7bf4d2f in clone () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6
(gdb) 

After head -n 300 emoji-test.txt.

The other crash looks like this: ``` Starting program: /nix/store/fqxvmhwdky3a1w6s612yb2kpqmd1d20n-foot-unstable/bin/foot [Thread debugging using libthread_db enabled] Using host libthread_db library "/nix/store/aqq6367snc1zh3fs1pc4j4zm5h80vkkz-glibc-2.31/lib/libthread_db.so.1". info: main.c:315: version: 1.4.4 info: main.c:322: arch: x86_64/64-bit, info: config.c:1658: loading configuration from /home/lukas/.config/footrc info: main.c:337: locale: en_US.UTF-8 info: wayland.c:1049: eDP-1: 1920x1080+0x0@60Hz 0x226D 12.70" scale=1 PPI=174x180 (physical) PPI=174x180 (logical), DPI=173.51 [Detaching after fork from child process 7562] info: wayland.c:1185: requesting SSD decorations [New Thread 0x7fffe9bb5700 (LWP 7563)] info: fcft.c:205: fcft: 2.2.6 info: fcft.c:215: fontconfig: 2.13.92 info: fcft.c:221: freetype: 2.10.2 [New Thread 0x7fffe93b4700 (LWP 7564)] [New Thread 0x7fffe8bb3700 (LWP 7565)] [New Thread 0x7fffe3fff700 (LWP 7566)] info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono-Bold.ttf: size=10.00pt/24px, dpi=173.51 info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono.ttf: size=10.00pt/24px, dpi=173.51 info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono-BoldOblique.ttf: size=10.00pt/24px, dpi=173.51 info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono-Oblique.ttf: size=10.00pt/24px, dpi=173.51 [Thread 0x7fffe93b4700 (LWP 7564) exited] [Thread 0x7fffe9bb5700 (LWP 7563) exited] [Thread 0x7fffe3fff700 (LWP 7566) exited] info: terminal.c:557: cell width=14, height=29 info: terminal.c:498: using 4 rendering threads [New Thread 0x7fffe3fff700 (LWP 7570)] [Thread 0x7fffe8bb3700 (LWP 7565) exited] [New Thread 0x7fffe8bb3700 (LWP 7571)] [New Thread 0x7fffe93b4700 (LWP 7572)] [New Thread 0x7fffe9bb5700 (LWP 7573)] info: wayland.c:646: using SSD decorations info: wayland.c:1331: cursor theme: (null), size: 24, scale: 1 err: render.c:2207: failed to load xcursor pointer 'text' info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 3, 20 info: vt.c:612: wanted = 4, 20 info: vt.c:612: wanted = 5, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 3, 20 info: vt.c:612: wanted = 4, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 3, 20 info: vt.c:612: wanted = 4, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 3, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: vt.c:612: wanted = 2, 20 info: fcft.c:607: /nix/store/yfmmkp58xfv1vrzrzsgnn4f0k1h5md6v-google-fonts-2019-07-14/share/fonts/truetype/AdobeBlank-Regular.ttf: size=10.00pt/24px, dpi=173.51 info: fcft.c:607: /nix/store/vdwhbj433rlsdlknwpmyq3sk9gv3x1ac-noto-fonts-emoji-unstable-2019-10-22/share/fonts/noto/NotoColorEmoji.ttf: size=45.23pt/109px, dpi=173.51 info: fcft.c:607: /nix/store/vdwhbj433rlsdlknwpmyq3sk9gv3x1ac-noto-fonts-emoji-unstable-2019-10-22/share/fonts/noto/NotoColorEmoji.ttf: size=45.23pt/109px, dpi=173.51 info: fcft.c:607: /nix/store/vdwhbj433rlsdlknwpmyq3sk9gv3x1ac-noto-fonts-emoji-unstable-2019-10-22/share/fonts/noto/NotoColorEmoji.ttf: size=45.23pt/109px, dpi=173.51 info: fcft.c:607: /nix/store/vdwhbj433rlsdlknwpmyq3sk9gv3x1ac-noto-fonts-emoji-unstable-2019-10-22/share/fonts/noto/NotoColorEmoji.ttf: size=45.23pt/109px, dpi=173.51 free(): corrupted unsorted chunks Thread 6 "foot:render:1" received signal SIGABRT, Aborted. [Switching to Thread 0x7fffe3fff700 (LWP 7570)] 0x00007ffff7b3508a in raise () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6 (gdb) backtrace #0 0x00007ffff7b3508a in raise () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6 #1 0x00007ffff7b1f528 in abort () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6 #2 0x00007ffff7b768a8 in __libc_message () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6 #3 0x00007ffff7b7da0a in malloc_printerr () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6 #4 0x00007ffff7b7f8c1 in _int_free () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6 #5 0x00007ffff79f2a57 in png_destroy_read_struct () from /nix/store/zjrnaa0c9hiqkc21ny5p5k3ijk731prv-libpng-apng-1.6.37/lib/libpng16.so.16 #6 0x00007ffff7a807df in Load_SBit_Png () from /nix/store/6nlkl2jqywpyxwk7c8il8kr5d3fkplpz-freetype-2.10.2/lib/libfreetype.so.6 #7 0x00007ffff7a80c01 in tt_sbit_decoder_load_png () from /nix/store/6nlkl2jqywpyxwk7c8il8kr5d3fkplpz-freetype-2.10.2/lib/libfreetype.so.6 #8 0x00007ffff7a7ee03 in tt_sbit_decoder_load_image () from /nix/store/6nlkl2jqywpyxwk7c8il8kr5d3fkplpz-freetype-2.10.2/lib/libfreetype.so.6 #9 0x00007ffff7a885d5 in tt_face_load_sbit_image () from /nix/store/6nlkl2jqywpyxwk7c8il8kr5d3fkplpz-freetype-2.10.2/lib/libfreetype.so.6 #10 0x00007ffff7a5b682 in TT_Load_Glyph () from /nix/store/6nlkl2jqywpyxwk7c8il8kr5d3fkplpz-freetype-2.10.2/lib/libfreetype.so.6 #11 0x00007ffff7a40b94 in FT_Load_Glyph () from /nix/store/6nlkl2jqywpyxwk7c8il8kr5d3fkplpz-freetype-2.10.2/lib/libfreetype.so.6 #12 0x00007ffff7ce1a62 in glyph_for_index () from /nix/store/bmgxagf5gsmrvw2yh5r5hj3nnmjy2zya-fcft-unstable/lib/libfcft.so.3 #13 0x00007ffff7ce3931 in fcft_glyph_rasterize () from /nix/store/bmgxagf5gsmrvw2yh5r5hj3nnmjy2zya-fcft-unstable/lib/libfcft.so.3 #14 0x000000000041adb8 in render_cell () #15 0x000000000041dfbd in render_worker_thread () #16 0x00007ffff7cc3fa2 in start_thread () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libpthread.so.0 #17 0x00007ffff7bf4d2f in clone () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6 (gdb) ``` After `head -n 300 emoji-test.txt`.
dnkl commented 1 year ago
Poster
Owner

@sterni not sure if you noticed my updates in #100 I think all crashes are fixed - you just need to update fcft.

@sterni not sure if you noticed my updates in https://codeberg.org/dnkl/foot/issues/100#issuecomment-76706, but I _think_ all crashes are fixed - you just need to update fcft.
dnkl changed title from WIP: text shaping using libutf8proc + fcft/harfbuzz to WIP: grapheme shaping using libutf8proc + fcft/harfbuzz 1 year ago
dnkl changed title from WIP: grapheme shaping using libutf8proc + fcft/harfbuzz to WIP: POC: grapheme shaping using libutf8proc + fcft/harfbuzz 1 year ago

I can still reproduce the first one.

I can still reproduce the [first one](https://codeberg.org/dnkl/foot/pulls/100#issuecomment-76724).
dnkl commented 1 year ago
Poster
Owner

@sterni works for me, but I'm using Joypixels. Will try with Noto Color Emoji.

@sterni works for me, but I'm using Joypixels. Will try with Noto Color Emoji.
dnkl commented 1 year ago
Poster
Owner

@sterni hmm, I still cannot reproduce. Does the crash happen every time? Or just now and then? Is the backtrace the same every time?

Can you post a "full" backtrace, bt full?

@sterni hmm, I still cannot reproduce. Does the crash happen every time? Or just now and then? Is the backtrace the same every time? Can you post a "full" backtrace, `bt full`?

The crash happens everytime I run: tail -n +212 emoji-test.txt | head -n 1, with this emoji-test.txt.

foot and fcft built from their respective harfbuzz branches.

Full backtrace (everytime the same as far as I can remember):

Reading symbols from ./result/bin/foot...
(gdb) run
Starting program: /nix/store/zqr2a4rdqd1qynf4j9lg9h7r5jm471kd-foot-unstable/bin/foot 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/aqq6367snc1zh3fs1pc4j4zm5h80vkkz-glibc-2.31/lib/libthread_db.so.1".
info: main.c:315: version: 1.4.4
info: main.c:322: arch: x86_64/64-bit, 
info: config.c:1658: loading configuration from /home/lukas/.config/footrc
info: main.c:337: locale: en_US.UTF-8
info: wayland.c:1049: eDP-1: 1920x1080+0x0@60Hz 0x226D 12.70" scale=1 PPI=174x180 (physical) PPI=174x180 (logical), DPI=173.51
[Detaching after fork from child process 20940]
info: wayland.c:1185: requesting SSD decorations
[New Thread 0x7fffe9bb4700 (LWP 20941)]
info: fcft.c:205: fcft: 2.2.6
info: fcft.c:215: fontconfig: 2.13.92
info: fcft.c:221: freetype: 2.10.2
[New Thread 0x7fffe13b3700 (LWP 20942)]
[New Thread 0x7fffe928f700 (LWP 20943)]
[New Thread 0x7fffe8a8e700 (LWP 20944)]
info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono.ttf: size=10.00pt/24px, dpi=173.51
info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono-Bold.ttf: size=10.00pt/24px, dpi=173.51
info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono-BoldOblique.ttf: size=10.00pt/24px, dpi=173.51
info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono-Oblique.ttf: size=10.00pt/24px, dpi=173.51
[Thread 0x7fffe9bb4700 (LWP 20941) exited]
[Thread 0x7fffe13b3700 (LWP 20942) exited]
[Thread 0x7fffe8a8e700 (LWP 20944) exited]
[Thread 0x7fffe928f700 (LWP 20943) exited]
info: terminal.c:557: cell width=14, height=29
info: terminal.c:498: using 4 rendering threads
[New Thread 0x7fffe8a8e700 (LWP 21004)]
[New Thread 0x7fffe928f700 (LWP 21005)]
[New Thread 0x7fffe13b3700 (LWP 21006)]
[New Thread 0x7fffe9bb4700 (LWP 21009)]
info: wayland.c:646: using SSD decorations
info: wayland.c:1331: cursor theme: (null), size: 24, scale: 1
 err: render.c:2209: failed to load xcursor pointer 'text'
info: vt.c:612: wanted = 2, 20
info: fcft.c:607: /nix/store/yfmmkp58xfv1vrzrzsgnn4f0k1h5md6v-google-fonts-2019-07-14/share/fonts/truetype/AdobeBlank-Regular.ttf: size=10.00pt/24px, dpi=173.51
 err: fcft.c:988: /nix/store/yfmmkp58xfv1vrzrzsgnn4f0k1h5md6v-google-fonts-2019-07-14/share/fonts/truetype/AdobeBlank-Regular.ttf: failed to load glyph #1396: unknown error
info: fcft.c:1541: OLD: y-advance: 1077418302, y-offset: 0
info: fcft.c:1542: OLD: x-advance: -1813430637, x-offset: 0
info: fcft.c:1541: OLD: y-advance: 0, y-offset: 0
info: fcft.c:1542: OLD: x-advance: 0, x-offset: -1

Thread 9 "foot:render:4" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe9bb4700 (LWP 21009)]
0x00007ffff7e36670 in pixman_image_get_format () from /nix/store/dahdfj53m84rjlzjhhg1kqlzw4wr8z7n-pixman-0.38.4/lib/libpixman-1.so.0
(gdb) bt full
#0  0x00007ffff7e36670 in pixman_image_get_format () from /nix/store/dahdfj53m84rjlzjhhg1kqlzw4wr8z7n-pixman-0.38.4/lib/libpixman-1.so.0
No symbol table info available.
#1  0x000000000041af47 in render_cell (term=term@entry=0x4fa330, pix=pix@entry=0x51be00, row=row@entry=0x529f40, col=col@entry=1, row_no=row_no@entry=6, 
    has_cursor=has_cursor@entry=false) at ../render.c:475
        glyph = 0x7fffe40b0c80
        i = 0
        cell = <optimized out>
        width = <optimized out>
        height = <optimized out>
        x = 16
        y = 176
        __PRETTY_FUNCTION__ = "render_cell"
        is_selected = <optimized out>
        _fg = <optimized out>
        _bg = <optimized out>
        fg = {red = 56540, green = 56540, blue = 52428, alpha = 65535}
        bg = {red = 4369, green = 4369, blue = 4369, alpha = 65535}
        font = 0x7fffe400aad0
        composed = <optimized out>
        single = 0x0
        glyphs = <optimized out>
        glyph_count = 2
        cols_left = <optimized out>
        cell_cols = 1
        clip = {extents = {x1 = 16, y1 = 176, x2 = 30, y2 = 205}, data = 0x0}
        clr_pix = 0x7fffe41e8360
#2  0x000000000041e01d in render_row (cursor_col=-1, row_no=6, row=0x529f40, pix=0x51be00, term=0x4fa330) at ../render.c:877
        col = 1
#3  render_worker_thread (_ctx=<optimized out>) at ../render.c:927
        row = 0x529f40
        cursor_col = -1
        row_no = 6
        buf = 0x54fcb0
        frame_done = false
        cursor = {col = 0, row = 7}
        ctx = <optimized out>
        term = 0x4fa330
        my_id = <optimized out>
        proc_title = "foot:render:4\000\000"
        start = 0x4fb320
        done = 0x4fb340
--Type <RET> for more, q to quit, c to continue without paging--
        lock = 0x4fb360
        __PRETTY_FUNCTION__ = "render_worker_thread"
#4  0x00007ffff7cc2fa2 in start_thread () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libpthread.so.0
No symbol table info available.
#5  0x00007ffff7bf3d2f in clone () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6
No symbol table info available.
(gdb) 
The crash happens everytime I run: `tail -n +212 emoji-test.txt | head -n 1`, with this [emoji-test.txt](https://www.unicode.org/Public/emoji/13.0/emoji-test.txt). `foot` and `fcft` built from their respective `harfbuzz` branches. Full backtrace (everytime the same as far as I can remember): ``` Reading symbols from ./result/bin/foot... (gdb) run Starting program: /nix/store/zqr2a4rdqd1qynf4j9lg9h7r5jm471kd-foot-unstable/bin/foot [Thread debugging using libthread_db enabled] Using host libthread_db library "/nix/store/aqq6367snc1zh3fs1pc4j4zm5h80vkkz-glibc-2.31/lib/libthread_db.so.1". info: main.c:315: version: 1.4.4 info: main.c:322: arch: x86_64/64-bit, info: config.c:1658: loading configuration from /home/lukas/.config/footrc info: main.c:337: locale: en_US.UTF-8 info: wayland.c:1049: eDP-1: 1920x1080+0x0@60Hz 0x226D 12.70" scale=1 PPI=174x180 (physical) PPI=174x180 (logical), DPI=173.51 [Detaching after fork from child process 20940] info: wayland.c:1185: requesting SSD decorations [New Thread 0x7fffe9bb4700 (LWP 20941)] info: fcft.c:205: fcft: 2.2.6 info: fcft.c:215: fontconfig: 2.13.92 info: fcft.c:221: freetype: 2.10.2 [New Thread 0x7fffe13b3700 (LWP 20942)] [New Thread 0x7fffe928f700 (LWP 20943)] [New Thread 0x7fffe8a8e700 (LWP 20944)] info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono.ttf: size=10.00pt/24px, dpi=173.51 info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono-Bold.ttf: size=10.00pt/24px, dpi=173.51 info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono-BoldOblique.ttf: size=10.00pt/24px, dpi=173.51 info: fcft.c:607: /nix/store/574qz01bkjk61457qcpgv143668g9b1l-dejavu-fonts-2.37/share/fonts/truetype/DejaVuSansMono-Oblique.ttf: size=10.00pt/24px, dpi=173.51 [Thread 0x7fffe9bb4700 (LWP 20941) exited] [Thread 0x7fffe13b3700 (LWP 20942) exited] [Thread 0x7fffe8a8e700 (LWP 20944) exited] [Thread 0x7fffe928f700 (LWP 20943) exited] info: terminal.c:557: cell width=14, height=29 info: terminal.c:498: using 4 rendering threads [New Thread 0x7fffe8a8e700 (LWP 21004)] [New Thread 0x7fffe928f700 (LWP 21005)] [New Thread 0x7fffe13b3700 (LWP 21006)] [New Thread 0x7fffe9bb4700 (LWP 21009)] info: wayland.c:646: using SSD decorations info: wayland.c:1331: cursor theme: (null), size: 24, scale: 1 err: render.c:2209: failed to load xcursor pointer 'text' info: vt.c:612: wanted = 2, 20 info: fcft.c:607: /nix/store/yfmmkp58xfv1vrzrzsgnn4f0k1h5md6v-google-fonts-2019-07-14/share/fonts/truetype/AdobeBlank-Regular.ttf: size=10.00pt/24px, dpi=173.51 err: fcft.c:988: /nix/store/yfmmkp58xfv1vrzrzsgnn4f0k1h5md6v-google-fonts-2019-07-14/share/fonts/truetype/AdobeBlank-Regular.ttf: failed to load glyph #1396: unknown error info: fcft.c:1541: OLD: y-advance: 1077418302, y-offset: 0 info: fcft.c:1542: OLD: x-advance: -1813430637, x-offset: 0 info: fcft.c:1541: OLD: y-advance: 0, y-offset: 0 info: fcft.c:1542: OLD: x-advance: 0, x-offset: -1 Thread 9 "foot:render:4" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffe9bb4700 (LWP 21009)] 0x00007ffff7e36670 in pixman_image_get_format () from /nix/store/dahdfj53m84rjlzjhhg1kqlzw4wr8z7n-pixman-0.38.4/lib/libpixman-1.so.0 (gdb) bt full #0 0x00007ffff7e36670 in pixman_image_get_format () from /nix/store/dahdfj53m84rjlzjhhg1kqlzw4wr8z7n-pixman-0.38.4/lib/libpixman-1.so.0 No symbol table info available. #1 0x000000000041af47 in render_cell (term=term@entry=0x4fa330, pix=pix@entry=0x51be00, row=row@entry=0x529f40, col=col@entry=1, row_no=row_no@entry=6, has_cursor=has_cursor@entry=false) at ../render.c:475 glyph = 0x7fffe40b0c80 i = 0 cell = <optimized out> width = <optimized out> height = <optimized out> x = 16 y = 176 __PRETTY_FUNCTION__ = "render_cell" is_selected = <optimized out> _fg = <optimized out> _bg = <optimized out> fg = {red = 56540, green = 56540, blue = 52428, alpha = 65535} bg = {red = 4369, green = 4369, blue = 4369, alpha = 65535} font = 0x7fffe400aad0 composed = <optimized out> single = 0x0 glyphs = <optimized out> glyph_count = 2 cols_left = <optimized out> cell_cols = 1 clip = {extents = {x1 = 16, y1 = 176, x2 = 30, y2 = 205}, data = 0x0} clr_pix = 0x7fffe41e8360 #2 0x000000000041e01d in render_row (cursor_col=-1, row_no=6, row=0x529f40, pix=0x51be00, term=0x4fa330) at ../render.c:877 col = 1 #3 render_worker_thread (_ctx=<optimized out>) at ../render.c:927 row = 0x529f40 cursor_col = -1 row_no = 6 buf = 0x54fcb0 frame_done = false cursor = {col = 0, row = 7} ctx = <optimized out> term = 0x4fa330 my_id = <optimized out> proc_title = "foot:render:4\000\000" start = 0x4fb320 done = 0x4fb340 --Type <RET> for more, q to quit, c to continue without paging-- lock = 0x4fb360 __PRETTY_FUNCTION__ = "render_worker_thread" #4 0x00007ffff7cc2fa2 in start_thread () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libpthread.so.0 No symbol table info available. #5 0x00007ffff7bf3d2f in clone () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6 No symbol table info available. (gdb) ```
dnkl commented 1 year ago
Poster
Owner

Hmm, ok, I have a hunch. I've pushed updates to both fcft and foot. Let me know if that stops the crash.

Hmm, ok, I have a hunch. I've pushed updates to both fcft and foot. Let me know if that stops the crash.

Now slightly different:

Thread 8 "foot:render:3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe93b3700 (LWP 4112)]
0x000000000041adeb in render_cell (term=term@entry=0x4f9e00, pix=pix@entry=0x51e7a0, row=row@entry=0x517a10, col=col@entry=23, row_no=row_no@entry=7, 
    has_cursor=has_cursor@entry=false) at ../render.c:447
447	   int cell_cols = glyph_count > 0 ? max(1, min(glyphs[0]->cols, cols_left)) : 1;
(gdb) bt full
#0  0x000000000041adeb in render_cell (term=term@entry=0x4f9e00, pix=pix@entry=0x51e7a0, row=row@entry=0x517a10, col=col@entry=23, row_no=row_no@entry=7, 
    has_cursor=has_cursor@entry=false) at ../render.c:447
        cell = <optimized out>
        width = 14
        height = 29
        x = 324
        y = 205
        __PRETTY_FUNCTION__ = "render_cell"
        is_selected = <optimized out>
        _fg = <optimized out>
        _bg = <optimized out>
        fg = {red = 56540, green = 56540, blue = 52428, alpha = 65535}
        bg = {red = 4369, green = 4369, blue = 4369, alpha = 65535}
        font = 0x7fffe4006730
        composed = 0x0
        single = 0x0
        glyphs = 0x7fffe0201120
        glyph_count = 2
        cols_left = 21
        cell_cols = <optimized out>
        clip = {extents = {x1 = 338, y1 = 205, x2 = 352, y2 = 234}, data = 0x0}
        clr_pix = <optimized out>
#1  0x000000000041dffd in render_row (cursor_col=-1, row_no=7, row=0x517a10, pix=0x51e7a0, term=0x4f9e00) at ../render.c:879
        col = 23
#2  render_worker_thread (_ctx=<optimized out>) at ../render.c:929
        row = 0x517a10
        cursor_col = -1
        row_no = 7
        buf = 0x54f780
        frame_done = false
        cursor = {col = 38, row = 9}
        ctx = <optimized out>
        term = 0x4f9e00
        my_id = <optimized out>
        proc_title = "foot:render:3\000\000"
        start = 0x4fadf0
        done = 0x4fae10
        lock = 0x4fae30
        __PRETTY_FUNCTION__ = "render_worker_thread"
#3  0x00007ffff7cc2fa2 in start_thread () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libpthread.so.0
No symbol table info available.
#4  0x00007ffff7bf3d2f in clone () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6
No symbol table info available.
Now slightly different: ``` Thread 8 "foot:render:3" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffe93b3700 (LWP 4112)] 0x000000000041adeb in render_cell (term=term@entry=0x4f9e00, pix=pix@entry=0x51e7a0, row=row@entry=0x517a10, col=col@entry=23, row_no=row_no@entry=7, has_cursor=has_cursor@entry=false) at ../render.c:447 447 int cell_cols = glyph_count > 0 ? max(1, min(glyphs[0]->cols, cols_left)) : 1; (gdb) bt full #0 0x000000000041adeb in render_cell (term=term@entry=0x4f9e00, pix=pix@entry=0x51e7a0, row=row@entry=0x517a10, col=col@entry=23, row_no=row_no@entry=7, has_cursor=has_cursor@entry=false) at ../render.c:447 cell = <optimized out> width = 14 height = 29 x = 324 y = 205 __PRETTY_FUNCTION__ = "render_cell" is_selected = <optimized out> _fg = <optimized out> _bg = <optimized out> fg = {red = 56540, green = 56540, blue = 52428, alpha = 65535} bg = {red = 4369, green = 4369, blue = 4369, alpha = 65535} font = 0x7fffe4006730 composed = 0x0 single = 0x0 glyphs = 0x7fffe0201120 glyph_count = 2 cols_left = 21 cell_cols = <optimized out> clip = {extents = {x1 = 338, y1 = 205, x2 = 352, y2 = 234}, data = 0x0} clr_pix = <optimized out> #1 0x000000000041dffd in render_row (cursor_col=-1, row_no=7, row=0x517a10, pix=0x51e7a0, term=0x4f9e00) at ../render.c:879 col = 23 #2 render_worker_thread (_ctx=<optimized out>) at ../render.c:929 row = 0x517a10 cursor_col = -1 row_no = 7 buf = 0x54f780 frame_done = false cursor = {col = 38, row = 9} ctx = <optimized out> term = 0x4f9e00 my_id = <optimized out> proc_title = "foot:render:3\000\000" start = 0x4fadf0 done = 0x4fae10 lock = 0x4fae30 __PRETTY_FUNCTION__ = "render_worker_thread" #3 0x00007ffff7cc2fa2 in start_thread () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libpthread.so.0 No symbol table info available. #4 0x00007ffff7bf3d2f in clone () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6 No symbol table info available. ```
dnkl commented 1 year ago
Poster
Owner

@sterni yet another fix to fcft pushed. Sorry for making you do all this...

@sterni yet another fix to **fcft** pushed. Sorry for making you do all this...

That fixed it! emoji-test.txt won't make foot crash anymore. There are some strange rendering quirks right now but I am not sure what causes them, will need further investigation:

  • The astronaut ZWJ sequences don't work
  • Some flag sequences don't work, some do.

Possibly missing fallback fonts not sure.

That fixed it! `emoji-test.txt` won't make foot crash anymore. There are some strange rendering quirks right now but I am not sure what causes them, will need further investigation: * The astronaut ZWJ sequences don't work * Some flag sequences don't work, some do. Possibly missing fallback fonts not sure.
dnkl commented 1 year ago
Poster
Owner

I think a lot depends on the font. For example, if you by astronaut ZWJ mean https://emojipedia.org/man-astronaut/ (with all the various skin modifiers), then it works for me (with Joypixels).

The flags issue is a bug. The same flag sometimes renders correctly, sometimes as only one of the regional indicators.

I think a lot depends on the font. For example, if you by astronaut ZWJ mean https://emojipedia.org/man-astronaut/ (with all the various skin modifiers), then it works for me (with Joypixels). The flags issue is a bug. The same flag sometimes renders correctly, sometimes as only one of the regional indicators.
dnkl commented 1 year ago
Poster
Owner

The flags issue is a bug. The same flag sometimes renders correctly, sometimes as only one of the regional indicators.

These should now render much better.

> The flags issue is a bug. The same flag sometimes renders correctly, sometimes as only one of the regional indicators. These should now render much better.

Another curious crash, this time with Unifont as the only font. footrc:

font=Unifont,Unifont Upper

Crash caused by tail -n +44 emoji-test.txt | head -n 1, which is slightly smiling face which is curiously included in Unifont Upper.

The crash doesn't happen with this footrc:

font=DejaVu Sans Mono:size=10,Noto Color Emoji:size=10,Twitter Color Emoji:size=10,Noto Emoji:size=10
Thread 6 "foot:render:1" received signal SIGSEGV, Segmentation fault.
--Type <RET> for more, q to quit, c to continue without paging--
[Switching to Thread 0x7fffe3fff700 (LWP 16110)]
0x000000000041adeb in render_cell (term=term@entry=0x4fb1c0, 
    pix=pix@entry=0x52b000, row=row@entry=0x503650, col=col@entry=67, 
    row_no=row_no@entry=8, has_cursor=has_cursor@entry=false)
    at ../render.c:447
447	   int cell_cols = glyph_count > 0 ? max(1, min(glyphs[0]->cols, cols_left)) : 1;
(gdb) bt full
#0  0x000000000041adeb in render_cell (term=term@entry=0x4fb1c0, 
    pix=pix@entry=0x52b000, row=row@entry=0x503650, col=col@entry=67, 
    row_no=row_no@entry=8, has_cursor=has_cursor@entry=false)
    at ../render.c:447
        cell = <optimized out>
        width = 10
        height = 20
        x = 672
        y = 162
        __PRETTY_FUNCTION__ = "render_cell"
        is_selected = <optimized out>
        _fg = <optimized out>
        _bg = <optimized out>
        fg = {red = 56540, green = 56540, blue = 52428, alpha = 65535}
        bg = {red = 4369, green = 4369, blue = 4369, alpha = 65535}
        font = 0x7fffe400d200
        composed = 0x0
        single = 0x0
        glyphs = 0x7fffe3ffec78
--Type <RET> for more, q to quit, c to continue without paging--c
        glyph_count = 1
        cols_left = 26
        cell_cols = <optimized out>
        clip = {extents = {x1 = 682, y1 = 162, x2 = 692, y2 = 182}, data = 0x0}
        clr_pix = <optimized out>
#1  0x000000000041dffd in render_row (cursor_col=-1, row_no=8, row=0x503650, pix=0x52b000, term=0x4fb1c0) at ../render.c:879
        col = 67
#2  render_worker_thread (_ctx=<optimized out>) at ../render.c:929
        row = 0x503650
        cursor_col = -1
        row_no = 8
        buf = 0x558470
        frame_done = false
        cursor = {col = 0, row = 10}
        ctx = <optimized out>
        term = 0x4fb1c0
        my_id = <optimized out>
        proc_title = "foot:render:1\000\000"
        start = 0x4fc1b0
        done = 0x4fc1d0
        lock = 0x4fc1f0
        __PRETTY_FUNCTION__ = "render_worker_thread"
#3  0x00007ffff7cc2fa2 in start_thread () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libpthread.so.0
No symbol table info available.
#4  0x00007ffff7bf3d2f in clone () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6
No symbol table info available.
Another curious crash, this time with Unifont as the only font. footrc: ``` font=Unifont,Unifont Upper ``` Crash caused by `tail -n +44 emoji-test.txt | head -n 1`, which is slightly smiling face which is curiously included in `Unifont Upper`. The crash doesn't happen with this `footrc`: ``` font=DejaVu Sans Mono:size=10,Noto Color Emoji:size=10,Twitter Color Emoji:size=10,Noto Emoji:size=10 ``` ``` Thread 6 "foot:render:1" received signal SIGSEGV, Segmentation fault. --Type <RET> for more, q to quit, c to continue without paging-- [Switching to Thread 0x7fffe3fff700 (LWP 16110)] 0x000000000041adeb in render_cell (term=term@entry=0x4fb1c0, pix=pix@entry=0x52b000, row=row@entry=0x503650, col=col@entry=67, row_no=row_no@entry=8, has_cursor=has_cursor@entry=false) at ../render.c:447 447 int cell_cols = glyph_count > 0 ? max(1, min(glyphs[0]->cols, cols_left)) : 1; (gdb) bt full #0 0x000000000041adeb in render_cell (term=term@entry=0x4fb1c0, pix=pix@entry=0x52b000, row=row@entry=0x503650, col=col@entry=67, row_no=row_no@entry=8, has_cursor=has_cursor@entry=false) at ../render.c:447 cell = <optimized out> width = 10 height = 20 x = 672 y = 162 __PRETTY_FUNCTION__ = "render_cell" is_selected = <optimized out> _fg = <optimized out> _bg = <optimized out> fg = {red = 56540, green = 56540, blue = 52428, alpha = 65535} bg = {red = 4369, green = 4369, blue = 4369, alpha = 65535} font = 0x7fffe400d200 composed = 0x0 single = 0x0 glyphs = 0x7fffe3ffec78 --Type <RET> for more, q to quit, c to continue without paging--c glyph_count = 1 cols_left = 26 cell_cols = <optimized out> clip = {extents = {x1 = 682, y1 = 162, x2 = 692, y2 = 182}, data = 0x0} clr_pix = <optimized out> #1 0x000000000041dffd in render_row (cursor_col=-1, row_no=8, row=0x503650, pix=0x52b000, term=0x4fb1c0) at ../render.c:879 col = 67 #2 render_worker_thread (_ctx=<optimized out>) at ../render.c:929 row = 0x503650 cursor_col = -1 row_no = 8 buf = 0x558470 frame_done = false cursor = {col = 0, row = 10} ctx = <optimized out> term = 0x4fb1c0 my_id = <optimized out> proc_title = "foot:render:1\000\000" start = 0x4fc1b0 done = 0x4fc1d0 lock = 0x4fc1f0 __PRETTY_FUNCTION__ = "render_worker_thread" #3 0x00007ffff7cc2fa2 in start_thread () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libpthread.so.0 No symbol table info available. #4 0x00007ffff7bf3d2f in clone () from /nix/store/mh78fk3x12q2a77srgkzv16h0irl8r61-glibc-2.31/lib/libc.so.6 No symbol table info available. ```
dnkl commented 1 year ago
Poster
Owner

Is that a debug or release build? If release, can you reproduce with a debug build?

Is that a debug or release build? If release, can you reproduce with a debug build?
dnkl commented 1 year ago
Poster
Owner

And/or do this: start a new foot instance. Run tail -n +44 emoji-test.txt | head -n 1. You should see a new font being loaded, since Unifont does not have that glyph.

Let me know which font it is. For me, running foot with foot -f Unifont,Unifont\ Upper works, and that glyph is loaded from Joypixels.

And/or do this: start a new foot instance. Run `tail -n +44 emoji-test.txt | head -n 1`. You should see a new font being loaded, since Unifont does not have that glyph. Let me know which font it is. For me, running foot with `foot -f Unifont,Unifont\ Upper` works, and that glyph is loaded from Joypixels.

Strange, can't reproduce the crash anymore, don't know what was wrong. The glyph is in Unifont Upper btw and is rendered correctly now.

Edit: I'm experiencing a strange delay when opening nvim emoji-test.txt with Unifont, can you reproduce this?

Strange, can't reproduce the crash anymore, don't know what was wrong. The glyph is in Unifont Upper btw and is rendered correctly now. Edit: I'm experiencing a strange delay when opening `nvim emoji-test.txt` with Unifont, can you reproduce this?
dnkl commented 1 year ago
Poster
Owner

I’m experiencing a strange delay when opening nvim emoji-test.txt with Unifont, can you reproduce this?

Yes. Most likely it's harfbuzz lazy-initializing something, when the fallback font is lazy-loaded. But I haven't actually analyzed it; it's just a guess.

> I’m experiencing a strange delay when opening nvim emoji-test.txt with Unifont, can you reproduce this? Yes. Most likely it's harfbuzz lazy-initializing something, when the fallback font is lazy-loaded. But I haven't actually analyzed it; it's just a guess.
dnkl commented 1 year ago
Poster
Owner

The glyph is in Unifont Upper btw and is rendered correctly now.

Ah, this font doesn't seem to be packaged in Arch, nor AUR. Will try to install it manually.

> The glyph is in Unifont Upper btw and is rendered correctly now. Ah, this font doesn't seem to be packaged in Arch, nor AUR. Will try to install it manually.

Yes. Most likely it's harfbuzz lazy-initializing something, when the fallback font is lazy-loaded. But I haven't actually analyzed it; it's just a guess.

Fallback fonts seems to be a dodgy thing still with this branch. I can reproduce this delay also with other font configurations, so its not Unifont specific. It seems to me that fcft will load system fallback fonts not specified using -f?!

These should now render much better.

Yes, flag rendering is fixed, except for some ZWJ instances (rainbow flag and transgender flag). Here it seems to me that the handling of the variation selector is messed up:

The unqualified emojis render correctly: 1F3F3 200D 26A7 and 1F3F3 200D 1F308. The qualified emojis do not (qualified means here that the variation selector for emoji presentation is added explicitly): 1F3F3 FE0F 200D 1F308 and 1F3F3 FE0F 200D 26A7 FE0F for example (you can observe this pretty well in emoji-test.txt.

Can you try to reproduce this? I have an outdated version of harfbuzz, I just noticed, so it might be an matter of updating because these characters were added quite recently.

> Yes. Most likely it's harfbuzz lazy-initializing something, when the fallback font is lazy-loaded. But I haven't actually analyzed it; it's just a guess. Fallback fonts seems to be a dodgy thing still with this branch. I can reproduce this delay also with other font configurations, so its not Unifont specific. It seems to me that fcft will load system fallback fonts not specified using `-f`?! > These should now render much better. Yes, flag rendering is fixed, except for some ZWJ instances (rainbow flag and transgender flag). Here it seems to me that the handling of the variation selector is messed up: The unqualified emojis render correctly: `1F3F3 200D 26A7` and `1F3F3 200D 1F308`. The qualified emojis do not (qualified means here that the variation selector for emoji presentation is added explicitly): `1F3F3 FE0F 200D 1F308` and `1F3F3 FE0F 200D 26A7 FE0F` for example (you can observe this pretty well in `emoji-test.txt`. Can you try to reproduce this? I have an outdated version of harfbuzz, I just noticed, so it might be an matter of updating because these characters were added quite recently.
dnkl commented 1 year ago
Poster
Owner

It seems to me that fcft will load system fallback fonts not specified using -f?!

Correct. It will try the user-specified fallback fonts first, then try the fontconfig provided fallback fonts.

variation selector is messed up:

As far as I have seen, harfbuzz doesn't do anything with the variation selector. But I haven't tested that much yet.

And yes, I can reproduce this. I'll need to do some investigation. It is possible fcft needs to detect the selectors and select font based on that (each harfbuzz instance is tied to a specific font, and harfbuzz does not do any font fallback or font selection).

> It seems to me that fcft will load system fallback fonts not specified using -f?! Correct. It will try the user-specified fallback fonts first, then try the fontconfig provided fallback fonts. > variation selector is messed up: As far as I have seen, harfbuzz doesn't do anything with the variation selector. But I haven't tested that much yet. And yes, I can reproduce this. I'll need to do some investigation. It is possible fcft needs to detect the selectors and select font based on that (each harfbuzz instance is tied to a specific font, and harfbuzz does not do any font fallback or font selection).

And yes, I can reproduce this. I’ll need to do some investigation. It is possible fcft needs to detect the selectors and select font based on that (each harfbuzz instance is tied to a specific font, and harfbuzz does not do any font fallback or font selection).

Very strange though that harfbuzz doesn't do anything about valid unicode sequences. I. e. if harfbuzz is running with JoyPixels 1F3F3 FE0F 200D 26A7 FE0F should definitely result in 🏳️‍⚧️ (Edit: of course codeberg doesn't support this one yet…).

> And yes, I can reproduce this. I’ll need to do some investigation. It is possible fcft needs to detect the selectors and select font based on that (each harfbuzz instance is tied to a specific font, and harfbuzz does not do any font fallback or font selection). Very strange though that harfbuzz doesn't do anything about *valid* unicode sequences. I. e. if harfbuzz is running with JoyPixels `1F3F3 FE0F 200D 26A7 FE0F` should definitely result in 🏳️‍⚧️ (Edit: of course codeberg doesn't support this one yet…).
dnkl commented 1 year ago
Poster
Owner

It might be related to how fcft searches for fonts that contain the code points that make up the grapheme cluster.

Right now it searches for a font that has all code points.

Still, it does match e.g. Joypixels for 🏳️‍⚧️, but then harfbuzz seems to return two glyphs, that are overlaid. At least it looks that way.

It might be related to how fcft searches for fonts that contain the code points that make up the grapheme cluster. Right now it searches for a font that has **all** code points. Still, it _does_ match e.g. Joypixels for 🏳️‍⚧️, but then harfbuzz seems to return two glyphs, that are overlaid. At least it looks that way.

Harfbuzz seems to support these correctly though, hb-shape returns the same glyphs for the four variants:

$ grep "transgender flag" emoji-test.txt | cut -d"#" -f2 | cut -d " " -f2 | while read line; do hb-shape "/nix/store/3ckw9d7bw6kn2qyn7n3nnsax12y2jyg3-joypixels-6.0.0/share/fonts/truetype/joypixels.ttf" "$line"; done
[gid1816=0+2400|gid3=0+0]
[gid1816=0+2400|gid3=0+0]
[gid1816=0+2400|gid3=0+0]
[gid1816=0+2400|gid3=0+0]
Harfbuzz seems to support these correctly though, `hb-shape` returns the same glyphs for the four variants: ```bash $ grep "transgender flag" emoji-test.txt | cut -d"#" -f2 | cut -d " " -f2 | while read line; do hb-shape "/nix/store/3ckw9d7bw6kn2qyn7n3nnsax12y2jyg3-joypixels-6.0.0/share/fonts/truetype/joypixels.ttf" "$line"; done [gid1816=0+2400|gid3=0+0] [gid1816=0+2400|gid3=0+0] [gid1816=0+2400|gid3=0+0] [gid1816=0+2400|gid3=0+0] ```
dnkl commented 1 year ago
Poster
Owner

Turned out to be the font selection in fcft; it only used fonts that had all code points.

Joypixel did not match the presentation selectors, but other fonts did.

So I did two things: one, don't bother matching presentation selectors (and ZWJ), and only match emoji fonts when the emoji selector is present in the grapheme, and only match non-emoji fonts when the text selector is present in the grapheme.

This is probably not 100% correct, but is a start. It might be time to look at other implementations and see what they are doing.

I haven't pushed this yet, as it needs to be cleaned up a bit first.

Turned out to be the font selection in fcft; it only used fonts that had **all** code points. Joypixel did **not** match the presentation selectors, but other fonts did. So I did two things: one, don't bother matching presentation selectors (and ZWJ), and only match emoji fonts when the emoji selector is present in the grapheme, and only match non-emoji fonts when the text selector is present in the grapheme. This is probably not 100% correct, but is a start. It might be time to look at other implementations and see what they are doing. I haven't pushed this yet, as it needs to be cleaned up a bit first.

I think we have more serious problems with wcwidth than we bargained for if we want to create a correct implementation. Technical Report 29 states:

Display of Grapheme Clusters. Grapheme clusters are not the same as ligatures. For example, the grapheme cluster “ch” in Slovak is not normally a ligature and, conversely, the ligature “fi” is not a grapheme cluster. Default grapheme clusters do not necessarily reflect text display. For example, the sequence <f, i> may be displayed as a single glyph on the screen, but would still be two grapheme clusters.

This means we actually have to look at what HarfBuzz gives us in order to determine how wide a grapheme cluster actually is. From a foot independent perspective this makes a lot of sense: Many grapheme clusters will have different widths depending if the fonts supports them or not etc.

For foot this means big trouble as far as I can see, since vt.c already has to reserve a certain number of cells for grapheme clusters even though it doesn't yet know which glyphs these will resolve to. I'm really not sure how to implement this in a clean way and without calling fcft twice.

The good news at least is if we go down this road, the groundworks for #57 would already be laid.

I think we have more serious problems with `wcwidth` than we bargained for if we want to create a correct implementation. [Technical Report 29](https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries) states: > **Display of Grapheme Clusters.** Grapheme clusters are not the same as ligatures. For example, the grapheme cluster “ch” in Slovak is not normally a ligature and, conversely, the ligature “fi” is not a grapheme cluster. Default grapheme clusters do not necessarily reflect text display. For example, the sequence <f, i> may be displayed as a single glyph on the screen, but would still be two grapheme clusters. This means we actually have to look at what HarfBuzz gives us in order to determine how wide a grapheme cluster actually is. From a `foot` independent perspective this makes a lot of sense: Many grapheme clusters will have different widths depending if the fonts supports them or not etc. For `foot` this means big trouble as far as I can see, since `vt.c` already has to reserve a certain number of cells for grapheme clusters even though it doesn't yet know which glyphs these will resolve to. I'm really not sure how to implement this in a clean way and without calling fcft twice. The good news at least is if we go down this road, the groundworks for #57 would already be laid.
dnkl commented 1 year ago
Poster
Owner

I’m really not sure how to implement this in a clean way and without calling fcft twice.

The big problem is it wouldn't be twice. It coult potentially be many times for each grapheme. We don't know when the grapheme cluster ends (until it has ended), which means we'd need to call fcft for each incoming code point.

That would kill performance.

For the time being, I'm going to stick with wcswidth(), knowing very well that it looks weird in many cases; sometimes a glyph gets cut in half, and some times we reserve too many cells. But it is also the only variant that doesn't cause complete client application breakage.

But then I also think that as long as we're using wcswidth(), this branch isn't really good enough to get merged; we can display grapheme clusters, sort of, but it wouldn't be useful to people wishing to implement "correct" client applications. And we'd probably still break some applications due to our grapheme cluster segmentation.

> I’m really not sure how to implement this in a clean way and without calling fcft twice. The **big** problem is it wouldn't be twice. It coult potentially be many times for each grapheme. We don't know _when_ the grapheme cluster ends (until it _has_ ended), which means we'd need to call fcft for each incoming code point. That would kill performance. For the time being, I'm going to stick with `wcswidth()`, knowing very well that it looks weird in many cases; sometimes a glyph gets cut in half, and some times we reserve too many cells. But it is also the only variant that doesn't cause complete client application breakage. But then I also think that as long as we're using `wcswidth()`, this branch isn't really good enough to get merged; we can display grapheme clusters, sort of, but it wouldn't be useful to people wishing to implement "correct" client applications. And we'd probably _still_ break some applications due to our grapheme cluster segmentation.

The big problem is it wouldn’t be twice. It coult potentially be many times for each grapheme. We don’t know when the grapheme cluster ends (until it has ended), which means we’d need to call fcft for each incoming code point.

Okay, that makes it pretty much impossible, since not even a permanent lookup table to share between vt.c and render.c wouldn't solve the issue.

I feel like then it is a good idea to stick to standard wcswidth (maybe even without the max call?). There's really no way we can satisfyingly implement 100% correct grapheme cluster rendering and even if we could terminal applications would break left and right. For example kitty with its heuristics has this problem: Everytime I paste a emoji into a command my fish prompt gets messed up completely to the point I can't really edit it without guesswork anymore.

> The big problem is it wouldn’t be twice. It coult potentially be many times for each grapheme. We don’t know when the grapheme cluster ends (until it has ended), which means we’d need to call fcft for each incoming code point. Okay, that makes it pretty much impossible, since not even a permanent lookup table to share between `vt.c` and `render.c` wouldn't solve the issue. I feel like then it is a good idea to stick to standard `wcswidth` (maybe even without the `max` call?). There's really no way we can satisfyingly implement 100% correct grapheme cluster rendering and even if we could terminal applications would break left and right. For example `kitty` with its heuristics has this problem: Everytime I paste a emoji into a command my `fish` prompt gets messed up completely to the point I can't really edit it without guesswork anymore.
dnkl commented 1 year ago
Poster
Owner

maybe even without the max call?

I removed this a while ago :)

I'll ignore the display width problem for a little while longer, and just try to get "everything" else working correctly.

Then we'll see where we are and in which form this PR should be merged.

> maybe even without the max call? I removed this a while ago :) I'll ignore the display width problem for a little while longer, and just try to get "everything" else working correctly. Then we'll see where we are and in which form this PR should be merged.

I’ll ignore the display width problem for a little while longer, and just try to get “everything” else working correctly.

Definitely the right call, we can either not solve the problem or come up with some kind of hack…

> I’ll ignore the display width problem for a little while longer, and just try to get “everything” else working correctly. Definitely the right call, we can either not solve the problem or come up with some kind of hack…
dnkl force-pushed harfbuzz from 01b3c60fdb to d66236e92d 5 months ago
Poster
Owner

I've resurrected this PR. The width is now calculated using wcswidth() - no custom function anymore.

I'm thinking it might actually be useful, in at least two ways:

  • Using utf8proc for grapheme clusters is more correct than our current "append zero-width codepoints to the previous cell and just assume they are valid combining characters".
  • We can start paying attention to variant selectors, and make sure to pick the correct type of font.

By using wcswidth() for width calculation, we don't break any applications.

utf8proc is still optional, and grapheme shaping (and clustering) would be opt-in.

I've resurrected this PR. The width is now calculated using `wcswidth()` - no custom function anymore. I'm thinking it might actually be useful, in at least two ways: * Using utf8proc for grapheme clusters is more correct than our current "append zero-width codepoints to the previous cell and just assume they are valid combining characters". * We can start paying attention to variant selectors, and make sure to pick the correct type of font. By using `wcswidth()` for width calculation, we don't break any applications. utf8proc is still optional, and grapheme shaping (and clustering) would be opt-in.
dnkl force-pushed harfbuzz from 43b24bd531 to 1ce8f9da8c 5 months ago
Poster
Owner

Variant selectors:

https://codeberg.org/attachments/a5c33826-1286-40d0-be1b-318f5ac9be2c

Variant selectors: ![https://codeberg.org/attachments/a5c33826-1286-40d0-be1b-318f5ac9be2c](https://codeberg.org/attachments/a5c33826-1286-40d0-be1b-318f5ac9be2c)
dnkl changed title from WIP: POC: grapheme shaping using libutf8proc + fcft/harfbuzz to WIP: grapheme shaping using libutf8proc + fcft/harfbuzz 5 months ago
dnkl changed title from WIP: grapheme shaping using libutf8proc + fcft/harfbuzz to WIP: grapheme shaping using libutf8proc 5 months ago
Poster
Owner

So... performance wise, this is roughly twice as slow as current master, with grapheme-shaping disabled, and even slower when enabled.

Taking that, and all other things into consideration (i.e. the fact that we must use wcswidth(), making some graphemes use 4, or more, cells, even though the glyphs is only two cells), makes me once again reconsider.

I'll give this some more thought, but I am leaning towards closing without merging. Unless anyone has any compelling reasons for why this should be merged?

So... performance wise, this is roughly twice as slow as current master, with `grapheme-shaping` **disabled**, and even slower when enabled. Taking that, and all other things into consideration (i.e. the fact that we must use `wcswidth()`, making some graphemes use 4, or more, cells, even though the glyphs is only two cells), makes me once again reconsider. I'll give this some more thought, but I am leaning towards closing without merging. Unless anyone has any compelling reasons for why this should be merged?
dnkl force-pushed harfbuzz from 3e42fcb850 to 549a19e13f 4 months ago
Poster
Owner

I found the performance problem; I had bumped the maximum number of combining characters in a "compose chain" from 6 to 20. This resulted in much worse cache behavior.

Looking at emojis, the longest chains are the three "subdivision flags" (England, Scotland and Wales); they are 7 codepoints long.

Backing down from 20 to 7 characters makes the compose struct 8 bytes larger than in master (we've added a 4-byte width member as well).

With this change, performance is now much closer to master, but not quite. But I don't think we can ever get to the exact same level of performance. We should however be close enough now.

That leaves the wcwidth() problem. I don't think that'll ever go away. Looking at e.g. kitty, which has its own wcswidth() implementation, we can see that it does some things better. But complex sequences, like 👩🏻‍🚀, is still allocated 4 cells, rather than two. Foot currently allocates 6 cells for the same sequence.

To summarize: we now have everything in place to do grapheme cluster segmentation and grapheme shaping. We recognize the emoji presentation selector, and forces a minimum width of two when detected. Other than that, we are currently adding the wcwidth() of all codepoints together (we could use wcswidth() but would then lose support for the emoji selector).

Thus I think we're at a point where we can merge this as a compile-time, and run-time opt-in: you need to both compile with libutf8proc, and set tweak.grapheme-shaping=true in foot.ini (and fcft needs to have been compiled with harfbuzz support).

If/when better alternatives for wcswidth() show up (I have looked), we can switch to that.

I found the performance problem; I had bumped the maximum number of combining characters in a "compose chain" from 6 to 20. This resulted in much worse cache behavior. Looking at emojis, the longest chains are the three "subdivision flags" (England, Scotland and Wales); they are 7 codepoints long. Backing down from 20 to 7 characters makes the `compose` struct 8 bytes larger than in master (we've added a 4-byte `width` member as well). With this change, performance is now much closer to master, but not quite. But I don't think we can ever get to the exact same level of performance. We should however be close enough now. That leaves the `wcwidth()` problem. I don't think that'll ever go away. Looking at e.g. kitty, which has its own `wcswidth()` implementation, we can see that it does some things better. But complex sequences, like 👩🏻‍🚀, is still allocated **4 cells**, rather than **two**. Foot currently allocates **6 cells** for the same sequence. To summarize: we now have everything in place to do grapheme cluster segmentation and grapheme shaping. We recognize the emoji presentation selector, and forces a minimum width of two when detected. Other than that, we are currently adding the `wcwidth()` of all codepoints together (we could use `wcswidth()` but would then lose support for the emoji selector). Thus I think we're at a point where we can merge this as a compile-time, **and** run-time opt-in: you need to **both** compile with `libutf8proc`, and set `tweak.grapheme-shaping=true` in `foot.ini` (and fcft needs to have been compiled with harfbuzz support). If/when better alternatives for `wcswidth()` show up (I _have_ looked), we can switch to that.
dnkl changed title from WIP: grapheme shaping using libutf8proc to Grapheme shaping using libutf8proc 4 months ago

This is really nice work and I am happy you figured out the perf problem!

Are you planning on compiling the AUR version with libutf8proc?

This is really nice work and I am happy you figured out the perf problem! Are you planning on compiling the AUR version with libutf8proc?
Poster
Owner

Are you planning on compiling the AUR version with libutf8proc?

@y0ast I haven't really thought about that yet.

But I don't see why not; it's a very small library, and the number of (code) lines added to foot when it's enabled can be counted on two hands, no kidding.

In fact, the overall diff is very small, with or without libutf8proc.

> Are you planning on compiling the AUR version with libutf8proc? @y0ast I haven't really thought about that yet. But I don't see why not; it's a very small library, and the number of (code) lines added to foot when it's enabled can be counted on two hands, no kidding. In fact, the overall diff is very small, with or without libutf8proc.
dnkl force-pushed harfbuzz from 5629a82272 to c286c935ba 4 months ago
Poster
Owner

Oops, looks like I broke underline support...

Oops, looks like I broke underline support...
Poster
Owner

Oops, looks like I broke underline support...

Fixed

> Oops, looks like I broke underline support... Fixed
dnkl force-pushed harfbuzz from 1867f14ce0 to 7c134c76ea 4 months ago
dnkl force-pushed harfbuzz from 7c134c76ea to 5f4b6e6fad 4 months ago

Hey. I found you while searching for software implementing SM/RM ?2026, as i am compiling a list of supporting software for this extension.

I have actually implemented full grapheme cluster support with the same motivations as you have. Maybe you might want to read
about the architecture: https://github.com/christianparpart/contour/blob/master/docs/text-stack.md

On top of that, i started creating a small grapheme cluster spec so that we (the TE community) can hopefully move on. I would like to share that with you ASAP , if you are interested ?

Hey. I found you while searching for software implementing SM/RM ?2026, as i am compiling a list of supporting software for this extension. I have actually implemented full grapheme cluster support with the same motivations as you have. Maybe you might want to read about the architecture: https://github.com/christianparpart/contour/blob/master/docs/text-stack.md On top of that, i started creating a small grapheme cluster spec so that we (the TE community) can hopefully move on. I would like to share that with you ASAP , if you are interested ?
Poster
Owner

On top of that, i started creating a small grapheme cluster spec so that we (the TE community) can hopefully move on. I would like to share that with you ASAP , if you are interested ?

Sure, be happy to read it, and the we can take it from there, I guess :)

> On top of that, i started creating a small grapheme cluster spec so that we (the TE community) can hopefully move on. I would like to share that with you ASAP , if you are interested ? Sure, be happy to read it, and the we can take it from there, I guess :)
dnkl force-pushed harfbuzz from 258ea216b4 to df2f679936 4 months ago
dnkl force-pushed harfbuzz from df2f679936 to 2ae1d1ba65 4 months ago
dnkl force-pushed harfbuzz from 2ae1d1ba65 to befe0b0018 4 months ago
dnkl force-pushed harfbuzz from befe0b0018 to 705f01873d 4 months ago
dnkl force-pushed harfbuzz from 705f01873d to f3c71320e1 4 months ago
dnkl force-pushed harfbuzz from f3c71320e1 to 591f5d594f 4 months ago
dnkl force-pushed harfbuzz from ede12a0bf0 to 3474605c1d 4 months ago
Poster
Owner

I realized 7 codepoints isn't enough: 👩🏻❤️💋👨🏾, for example, is 10(?) codepoints.

Bumping it to 10 once again made us twice as slow as current master :/

So I had a cup of coffee and tried to actually use my brain. The result? We're now faster than the master branch, even with grapheme-shaping enabled. With grapheme-shaping disabled, were even faster.

How? Use a binary search tree to store the compose chain, instead of one big dynamically allocated, unsorted, array. See 43d670891c for details.

So, there's no longer an upper limit to the number of codepoints (well there is, 256, but I don't see that ever being reached). Things are fast as shit.

All good? I think so. The memory usage for compose chains have gone up slightly. Depending on what you compare with.

In current master, each compose chain is 28 bytes and can hold 6 codepoints. In this branch, before the switch to a binary search tree, each chain was 48(!) bytes. Now, it's 32 bytes. That's with zero codepoints. A chain with two codepoints is 40 bytes. Definitely more than 28. But still less than 48. In fact, we can have up to 4 codepoints (which covers a lot of emoji sequences) and still using less, or the same amount of memory as before.

Benchmarks (unicode-random):

  • This branch, with grapheme-shaping: 0.181s
  • This branch, without grapheme-shaping: 0.139s
  • Master (no grapheme shaping support): 0.242s
I realized 7 codepoints isn't enough: 👩🏻‍❤️‍💋‍👨🏾, for example, is 10(?) codepoints. Bumping it to 10 once again made us twice as slow as current master :/ So I had a cup of coffee and tried to actually _use_ my brain. The result? We're now **faster** than the master branch, even **with** grapheme-shaping enabled. With grapheme-shaping disabled, were _even_ faster. How? Use a binary search tree to store the compose chain, instead of one big dynamically allocated, unsorted, array. See https://codeberg.org/dnkl/foot/commit/43d670891c7a442c50c13d9a900d24a43da8a7b6 for details. So, there's no longer an upper limit to the number of codepoints (well there is, 256, but I don't see that ever being reached). Things are fast as shit. All good? I think so. The memory usage for compose chains have gone up slightly. Depending on what you compare with. In current master, each compose chain is 28 bytes and can hold 6 codepoints. In this branch, **before** the switch to a binary search tree, each chain was 48(!) bytes. Now, it's 32 bytes. That's with zero codepoints. A chain with two codepoints is 40 bytes. Definitely more than 28. But still less than 48. In fact, we can have up to 4 codepoints (which covers **a lot** of emoji sequences) and still using less, or the same amount of memory as before. Benchmarks (unicode-random): * This branch, **with** grapheme-shaping: 0.181s * This branch, **without** grapheme-shaping: 0.139s * Master (no grapheme shaping support): 0.242s
dnkl force-pushed harfbuzz from 9b06a5c39a to f19797a5af 4 months ago
dnkl added 3 commits 4 months ago
Poster
Owner

One last thing... is this (as in, this feature, grapheme shaping) bloat?

Turns out, a PGO build of this branch is smaller than a PGO build of current master!

Master:

section                size     addr
.interp                  28      792
.note.gnu.property       64      824
.note.gnu.build-id       36      888
.note.ABI-tag            32      924
.gnu.hash                28      960
.dynsym                6720      992
.dynstr                4361     7712
.gnu.version            560    12074
.gnu.version_r          432    12640
.rela.dyn             40392    13072
.init                    27    57344
.text                218853    57376
.fini                    13   276232
.rodata               38880   278528
.eh_frame_hdr            36   317408
.eh_frame               144   317448
.init_array               8   322984
.fini_array              16   322992
.data.rel.ro          26416   323008
.dynamic                560   349424
.got                   2256   349984
.data                   648   352256
.bss                   1608   352928
.comment                 36        0
Total                342154

This branch:

section                size     addr
.interp                  28      792
.note.gnu.property       64      824
.note.gnu.build-id       36      888
.note.ABI-tag            32      924
.gnu.hash                28      960
.dynsym                6816      992
.dynstr                4460     7808
.gnu.version            568    12268
.gnu.version_r          432    12840
.rela.dyn             40488    13272
.init                    27    57344
.text                217429    57376
.fini                    13   274808
.rodata               39008   278528
.eh_frame_hdr            36   317536
.eh_frame               144   317576
.init_array               8   322952
.fini_array              16   322960
.data.rel.ro          26416   322976
.dynamic                576   349392
.got                   2288   349968
.data                   648   352256
.bss                   1608   352928
.comment                 36        0
Total                341205
One last thing... is this (as in, this feature, grapheme shaping) bloat? Turns out, a PGO build of this branch is **smaller** than a PGO build of current master! Master: ``` section size addr .interp 28 792 .note.gnu.property 64 824 .note.gnu.build-id 36 888 .note.ABI-tag 32 924 .gnu.hash 28 960 .dynsym 6720 992 .dynstr 4361 7712 .gnu.version 560 12074 .gnu.version_r 432 12640 .rela.dyn 40392 13072 .init 27 57344 .text 218853 57376 .fini 13 276232 .rodata 38880 278528 .eh_frame_hdr 36 317408 .eh_frame 144 317448 .init_array 8 322984 .fini_array 16 322992 .data.rel.ro 26416 323008 .dynamic 560 349424 .got 2256 349984 .data 648 352256 .bss 1608 352928 .comment 36 0 Total 342154 ``` This branch: ``` section size addr .interp 28 792 .note.gnu.property 64 824 .note.gnu.build-id 36 888 .note.ABI-tag 32 924 .gnu.hash 28 960 .dynsym 6816 992 .dynstr 4460 7808 .gnu.version 568 12268 .gnu.version_r 432 12840 .rela.dyn 40488 13272 .init 27 57344 .text 217429 57376 .fini 13 274808 .rodata 39008 278528 .eh_frame_hdr 36 317536 .eh_frame 144 317576 .init_array 8 322952 .fini_array 16 322960 .data.rel.ro 26416 322976 .dynamic 576 349392 .got 2288 349968 .data 648 352256 .bss 1608 352928 .comment 36 0 Total 341205 ```
dnkl added 3 commits 4 months ago
dnkl merged commit 80c2d9d89d into master manually 4 months ago
The pull request has been manually merged as 80c2d9d89d.
Sign in to join this conversation.
Loading…
There is no content yet.