Improve performance when being forced to double buffer #488
Manually merged
dnkl
merged 12 commits from double-buffering
into master
2 years ago
Loading…
Reference in new issue
There is no content yet.
Delete Branch 'double-buffering'
Deleting a branch is permanent. It CANNOT be undone. Continue?
This is still very early, but,the idea is that when the compositor forces us to double buffer (i.e. we are swapping between two shm buffers), instead of copying the old buffer wholesome, and then applying the current frame's damage, we re-apply the old frame's damage first, then our own.In short:
First, while applying this frame’s scroll damage, copy it to the buffer’s scroll damage list (so that we can access it via
term->render.last_buf
).Also, when iterating and rendering the grid, build a pixman region of the damaged regions. This is currently done on a per-row basis. This is also stored in the buffer.
Now, when being forced to double buffer, first iterate the old buffer’s damage, and re-apply it to the current buffer. Then, composite the old buffer on top of the current buffer, using the old frame’s damage region as clip region. This effectively copies everything that was rendered to the last frame. Remember, this is on a per-row basis.
Then we go on and render the frame as usual.
Note that it would be really nice if we could subtract the current frame’s damage region from the clip region (no point in copying areas we’re going to overwrite anyway). Unfortunately, that’s harder than it looks; the current frame’s damage region is only valid after this frame’s scroll damage have been applied, while the last frame’s damage region is only valid before it’s been applied.
Translating one to the other isn’t easy, since scroll damage isn’t just about counting lines - there may be multiple scroll damage records, each with its own scrolling region. This creates very complex scenarios.
Edit: we now subtract the current frame's damage from the copy-region if the current frame has no scroll damage.
I do have some early benchmark numbers, but will wait before publishing them since they are from debug builds. It does look promising though. There is a very noticeable/measurable performance hit, but the improvement compared to the master branch is still huge.
TODO
Closes #478
Benchmark results (regular LTO release builds):
Sway 1.6, wlroots 0.13.0
Terminal size: 135x67 cells
Surface size: 953x1024
(CPU: i5-8250U CPU @ 1.60GHz, 4/8 cores/threads, 6MB L3)
Times are in microseconds (µs).
Numbers in parentheses is the time taken to “prepare” the buffer before applying the current frame’s damage (hence it’s always zero in the “Immediate release” column).
Not covered here: ignoring old buffer content and instead re-rendering the entire frame.
Observations:
memcpy()
is a fairly expensive operation on buffers as large as these. Also, in addition to take time, they can easily thrash the cache, slowing things down further.no double buffering, foot re-uses previous frame’s buffer ↩︎
foot re-applies last frame’s damage before applying current frame’s damage ↩︎
foot copies the old buffer (all of it) before applying current frame’s damage ↩︎
running
cat -
in the shell, at the bottom of the screen, typing a single letter at a time ↩︎large C file in vim, moving cursor with arrow keys without scrolling the content ↩︎
large C file in vim, scrolling content by holding down arrow key ↩︎
0c727d2e00
to9bc0572c4d
2 years ago9bc0572c4d
to8047e7372c
2 years agoWIP: improve performance when being forced to double bufferto Improve performance when being forced to double buffer 2 years ago9cfe0548e8
todc4f60fd4f
2 years ago04215bac6c
into master manually 2 years ago04215bac6c
.