Improve performance when forced to double-buffer #478

Closed
opened 6 months ago by dnkl · 1 comments
dnkl commented 6 months ago
Owner

Foot currently relies heavily on the compositor allowing us to re-use shm surfaces in the next frame.

The protocol even mentions this as an important optimization for GL(ES) compositors with wl_shm clients.

However, not all compositors release the shm buffer immediately, but instead force the client to double buffer (i.e. swap between two buffers). KDE/KWin is one of them. And it looks like wlroots-based compositors may start doing this as well: https://github.com/swaywm/wlroots/issues/2705#issuecomment-823856101

Foot "handles" this already, but in a very inefficient way; it copies the old buffer and then applies the current frame's damage.

While having to double-buffer will have a negative performance impact, we can at least do better:

  • scan grid and check for damage
    • if the entire grid needs to be re-drawn, go ahead and do that. i.e. ignore old buffer content.
    • if not, build region(s) of areas that we're going to redraw
  • re-apply last frame's scroll damage
  • copy last frame's damaged regions to the new buffer, excluding new damage

Either use pixman, or memcpy() to copy the regions from the old frame. Investigate if this should be done on row-basis, or as cell chunks.

If copying entire rows, then a row needs to be copied if a) at least one cell was updated on the last frame, b) at least one cell is clean in the new frame.

Foot currently relies heavily on the compositor allowing us to re-use shm surfaces in the next frame. The protocol even mentions this as an _**important optimization for GL(ES) compositors with wl\_shm clients**_. However, not all compositors release the shm buffer immediately, but instead force the client to double buffer (i.e. swap between two buffers). KDE/KWin is one of them. And it looks like wlroots-based compositors may start doing this as well: https://github.com/swaywm/wlroots/issues/2705#issuecomment-823856101 Foot "handles" this already, but in a very inefficient way; it copies the old buffer and then applies the current frame's damage. While having to double-buffer **will** have a negative performance impact, we can at least do better: * scan grid and check for damage - if the entire grid needs to be re-drawn, go ahead and do that. i.e. ignore old buffer content. - if not, build region(s) of areas that we're going to redraw * re-apply last frame's scroll damage * copy last frame's damaged regions to the new buffer, excluding new damage Either use pixman, or `memcpy()` to copy the regions from the old frame. Investigate if this should be done on row-basis, or as cell chunks. If copying entire rows, then a row needs to be copied if a) at least one cell was updated on the last frame, b) at least one cell is **clean** in the new frame.
dnkl added the
performance
label 6 months ago
Poster
Owner

To kick things off, I hacked shm.c to be able to force double buffering. Foot currently deals with double buffering by doing a memcpy() of the old frame, and then applies the current frame's damage on top of that.

Comparing that, to when we are able to re-use the buffers, I'm seeing a 330% slowdown; from an average frame render time of ~300us, to ~1300us. This was with "small" damages (i.e typing a single letter at a time at the prompt at the bottom of the screen).

This suggests that on this particular setup, the memcpy() takes ~1000us. This appears to hold true for "large" damage frames as well (e.g. running ls /usr/bin), where the average frame render time increased by roughly the same amount, from ~5000us to ~6000us. Here, the relative slowdown is of course smaller: ~20%.

To kick things off, I hacked `shm.c` to be able to force double buffering. Foot currently deals with double buffering by doing a `memcpy()` of the old frame, and then applies the current frame's damage on top of that. Comparing that, to when we are able to re-use the buffers, I'm seeing a 330% slowdown; from an average frame render time of ~300us, to ~1300us. This was with "small" damages (i.e typing a single letter at a time at the prompt at the bottom of the screen). This suggests that on this particular setup, the `memcpy()` takes ~1000us. This appears to hold true for "large" damage frames as well (e.g. running `ls /usr/bin`), where the average frame render time increased by roughly the same amount, from ~5000us to ~6000us. Here, the relative slowdown is of course smaller: ~20%.
dnkl referenced this issue from a commit 6 months ago
dnkl closed this issue 6 months ago
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.