This has been a known fact inside Nokia (MeeGo) for quite a long time due to various performance issues we’ve had to workaround, but for some reason it wasn’t acknowledged as an issue when it was brought up in the mailing list.
So, in order to proof beyond reasonable doubt that there is indeed an issue, I wrote this test. It is very minimal, there’s essentially nothing of a typical GStreamer pipeline, just an element and an app that pushes buffers to it, that’s it. But then, optionally, a queue (typical element in a GStreamer pipeline) is added in the middle, which is a thread-boundary, and then the fun begins:
The buffer size legends corresponds to exponentiation (5 => 2 ^ 5 = 32), and the CPU time is returned by the system (getrusage) in ms. You can see that in ARM systems not only more CPU time is wasted, but adding a queue makes things worst at a faster rate.
Note that this test is doing nothing, just pushing buffers around, all the CPU is wasted doing GStreamer operations. In a real scenario the situation is much worst because there isn’t only one, but multiple threads, and many elements involved, so this wasted CPU time I measured has to be multiplied many times.
Now, this has been profiled before, and everything points out to pthread_mutex_lock which is only a problem when there’s contention, which happens more often in GStreamer when buffers are small, then the futex syscall is issued, is very bad in ARM, although it probably depends on which specific system you are using.
Fortunately for me, I don’t need good latency, so I can just push one second buffers and forget about GStreamer performance issues, if you are experiencing the same, and can afford high latency, just increase the buffer sizes, if not, then you are screwed :p
Hopefully this answers Wim’s question of what a “small buffer” means, how it’s not good, and when it’s a problem.