Improve performance when forced to double-buffer
Foot currently relies heavily on the compositor allowing us to re-use shm surfaces in the next frame.
The protocol even mentions this as an important optimization for GL(ES) compositors with wl_shm clients.
However, not all compositors release the shm buffer immediately, but instead force the client to double buffer (i.e. swap between two buffers). KDE/KWin is one of them. And it looks like wlroots-based compositors may start doing this as well: https://github.com/swaywm/wlroots/issues/2705#issuecomment-823856101
Foot "handles" this already, but in a very inefficient way; it copies the old buffer and then applies the current frame's damage.
While having to double-buffer will have a negative performance impact, we can at least do better:
- scan grid and check for damage
- if the entire grid needs to be re-drawn, go ahead and do that. i.e. ignore old buffer content.
- if not, build region(s) of areas that we're going to redraw
- re-apply last frame's scroll damage
- copy last frame's damaged regions to the new buffer, excluding new damage
Either use pixman, or
memcpy() to copy the regions from the old frame. Investigate if this should be done on row-basis, or as cell chunks.
If copying entire rows, then a row needs to be copied if a) at least one cell was updated on the last frame, b) at least one cell is clean in the new frame.
To kick things off, I hacked
shm.c to be able to force double buffering. Foot currently deals with double buffering by doing a
memcpy() of the old frame, and then applies the current frame's damage on top of that.
Comparing that, to when we are able to re-use the buffers, I'm seeing a 330% slowdown; from an average frame render time of ~300us, to ~1300us. This was with "small" damages (i.e typing a single letter at a time at the prompt at the bottom of the screen).
This suggests that on this particular setup, the
memcpy() takes ~1000us. This appears to hold true for "large" damage frames as well (e.g. running
ls /usr/bin), where the average frame render time increased by roughly the same amount, from ~5000us to ~6000us. Here, the relative slowdown is of course smaller: ~20%.
Deleting a branch is permanent. It CANNOT be undone. Continue?