From time to time, we receive regular complaints about frame scheduling. In particular, compositing not being synchronized to vblanks, missed frames, repainting monitors with different refresh rates, etc. This blog post will (hopefully) explain why these issues are present and how we plan to fix them.
Past & Present
With the current scheduling algorithm, compositing /should/ start immediately right after a vblank. A vblank is the time between the vertical front porch and the vertical back porch, or simply put, it’s the time when the display starts scanning out the contents of the next frame.
One thing that’s worth point out is that buffers are not swapped after finishing a compositing cycle, they are swapped at the start of the next compositing cycle, in other words, at the next vblank
KWin assumes that glXSwapBuffers() and eglSwapBuffers() will always block until the next vblank. By delaying the buffer swap, we have more time to process input events, do some window manager things, etc. But, this assumption is outdated, nowadays, it’s rare to see a GLX or an EGL implementation where a buffer swap operation blocks when rendering double buffered.
In case the buffer swap operation doesn’t block, which is typically the case with Mesa drivers, glXSwapBuffers() or eglSwapBuffers() will be called at the end of a compositing cycle. There is a catch though. Compositing won’t be synchronized to vblanks.
Since compositing is not synchronized with vblanks anymore, you may notice that animations in some application don’t look butter smooth as they should. This issue can be easily verified using the black frame insertion test .
Another problem with our compositing scheduling algorithm is latency. Ideally, if you press a key, the corresponding symbol should show up on the screen as soon as possible. In practice, things are slightly different
With the current compositing timing, if you press a key on the keyboard, it may take up to two frame before the corresponding symbol shows up on the screen. Same thing with videos, the audio might be playing two frames ahead of what is on the screen.
Monitors With Different Refresh Rates
Things get trickier if you have several monitors and they have different refresh rates. On X11, compositing is throttled to the lowest common refresh rate, in other words if you have two monitors with a refresh rate of 60Hz and one with a refresh rate of 120Hz, compositing will be performed at a rate of 60Hz. There is probably nothing that we can do about it.
On Wayland, it’s a completely different situation. From the technical point of view, we don’t have anything that prevents compositing being performed separately per screen at different refresh rates. But due to historical reasons, compositing on Wayland is throttled similar to the X11 case.
Our main goals are to unlock true per screen rendering on Wayland and reduce latency caused by compositing (both on X11 and Wayland). Some work  has already been started to fix compositing timing and if things go smoothly, you should be able to enjoy improved frame timings in KDE Plasma 5.21.
If we start compositing as close as possible to the next vblank, then applications, such as video players, will be able to get their contents on the screen in the shortest amount of time without inducing any screen tearing.
The main drawback of this approach is that the compositor has to know how much time exactly it will take to render the next frame. In other words, we need a reliable way to predict the future, easy, no problem!
The main idea behind the compositing timing rework is to introduce a new class, called RenderLoop, that notifies the compositor when it’s a good time to start painting the next frame. On X11, there is going to be only one RenderLoop. On Wayland, every output is going to have its own RenderLoop.
As it was mentioned previously, the compositor needs to predict how long it will take to render the next frame. We solve this inconvenient problem by making two guesses:
- The first guess is based on a desired latency level that comes from a config. If the desired latency level is high, the predicted render time will be longer; on the other hand, if the desired latency level is low, the predicted render time will be shorter;
- The second guess is based on the duration of previous compositing cycles.
The RenderLoop makes two guesses and the one with the longest render time is used for scheduling compositing for the next frame. By making two estimates rather than one, hopefully, animations will be more or less stable.
There is no “silver bullet” solution for the render time prediction problem, unfortunately. In the end, it all comes down to making a trade-off between latency and stability. The config option lets the user decide what matters the most. It’s worth noting that with the default latency level, the compositor will make a compromise between frame latency and animation stability that should be good enough for most of users.
The introduction of the RenderLoop helper is only half of the battle. At the moment, all compositing is done on the main thread and it can get crowded. For example, if you have several outputs with different refresh rates, some of them will have to wait until it’s their turn to get repainted. This may result in missed vblanks, and thus laggy frames. In order to address this issue, we need to put compositing on different threads. That way, monitors will be repainted independently of each other. There is no concrete milestone for compositing on different threads, but most likely, it’s going to be KDE Plasma 5.22.
Currently, compositing infrastructure in KWin is heavily influenced by the X11 requirements, e.g. there is only one compositing clock, compositing is throttled to the lowest refresh rate, etc. Besides that, incorrect assumptions were made about the behavior of glXSwapBuffers() and eglSwapBuffers(), unfortunately, which result in frame drops and other related issues. With the ongoing Wayland improvements, we hope to fix the aforementioned issues.
22 thoughts on “Compositing Scheduling in KWin: Past, Present, and Future”
The OpenGL vblank API is horrible. Applications should be able to select() on a file descriptor to monitor the vblank interrupt. The Amiga had that 40 years ago.
Is the Vulkan API any better here?
As far as I know, right now, there is no any Vulkan extension that you could use to get present timing info. But there is a WIP PR to add such an extension, see https://github.com/KhronosGroup/Vulkan-Docs/pull/1364
How does this relate to VRR (variable refresh rate)?
Mutter moved input to a separated thread:
Is this planned for kwin too?
Adaptive sync is still unsupported. We definitely want to add support for VRR. As for handling input on a separate thread, I’m not sure about its benefits with the current architecture. For what it’s worth, the desktop shell runs in a separate process.
I think VRR would be the biggest seller for KDE Wayland. Right now it’s impossible to run freesync with multi-monitor setup on X. Wayland makes it possible. I think sway made some improvments in that regard.
I’m really excited for thing to come!
How does this relate to kwin-lowlatency?
With the ongoing improvements, you won’t need to resort to kwin-lowlatency. Also, it’s sad that the author of the fork decided not to work in collaboration with upstream to improve frame timings.
Thank you for the explanation.
So I can switch back to kwin from kwin-lowlatency when 5.21 is released?
If everything goes smoothly, yes.
Does this mean we can also get unredirection support 🙂
Obviously, we’ll need some sort of exclusive mode for wayland when gaming, like Mutter got in 3.38. Aside from frame timing improvements, kwin-lowlatency provides this on X and it’s quite nice to still have window shadows while running fullscreen apps and games. I think it was shortsighted to remove such functionality from kwin, and if there was code issues, that should have been a goal to improve rather than just remove.
No, there is still no unredirection support, but yeah, we need to bring it back.
Wonderful! Your approach sounds really good and I’m glad this pain point of KWin is getting addressed. I wanted to work on that from the day I added the per-screen rendering to the DRM platform 😉 And threaded rendering has also been a dream of mine for years. Really looking forward to it.
Yes, once compositing is performed on different threads, it will be /amazing/. However, it will be quite a challenge, mostly due to the effects system. 😦
Would it make sense to adjust the effect system in the end to the new compositing system rather than the other way around?
Couldn’t you make this configurable? I would prefer having smooth composting to all the effects (and I’m certainly not the only one)
I’m not sure what you mean. Either way, the latency level will be configurable. So, you will be able to choose what matters the most to you.
It is nice to read explanations like the above & I look forward to the improvements being released at some point. Thanks for your work!
This is a good progress. A couple of related questions:
1. What about adaptive sync (variable refresh rate)? Are there any plans to support that on Wayland? It’s a critical feature for gaming and something that currently prevents me from switching to Wayland session in Plasma (together with subsurfaces clipping problem which still persists like in Kmail).
2. Are there any plans to implement Vulkan/WSI rendering as a future alternative to OpenGL/EGL one?
Vulkan should be also a much better option for multithreaeding.
As I said a few comments above, adaptive sync is still unsupported. But we do want to implement it. As for a Vulkan render backend, we’ll have to support it one day, but at the moment, we have much more important problems.