The AutoFlushCounter would flush command buffers on any attachment change (write mask or bindings change) if there was a pending query. This is to get query results as soon as possible for draw skips, but it's assuming that a full occlusion query _pass_ happened, that we want to flush it's data before getting onto draws, rather than the queries being randomly interspersed throughout a pass that also draws.
Xenoblade 2 repeatedly switches between performing a samples passed query and outputting to a render target on each draw, and flips the write mask to do so. Flushing the command buffer every 2 draws isn't ideal, so it's best that we only do this if the pattern matches the large block style of occlusion query.
This change makes this flush only happen after a few consecutive query reports. "Consecutive" is interrupted by attachment changes or command buffer flush.
This doesn't really solve the issue where it resets more queries than it uses, it just stops the game doing it as often. I'm not sure of the best way to do that. The cost of resetting could probably be reduced by using query pools with more than one element and resetting in bulk.
* Reset queries on same command buffer
Vulkan seems to complain when the queries are reset on another command buffer. No idea why, the spec really could be written better in this regard. This fixes complaints, and hopefully any implementations that care extensively about them.
This change _guesses_ how many queries need to be reset and resets as many as possible at the same time to avoid splitting render passes. If it resets too many queries, we didn't waste too much time - if it runs out of resets it will batch reset 10 more.
The number of queries reset is the maximum number of queries in the last 3 frames. This has been worked into the AutoFlushCounter so that it only resets up to 32 if it is yet to force a command buffer submission in this attachment.
This is only done for samples passed queries right now, as they have by far the most resets.
* Address Feedback
* Periodically Flush Commands for Vulkan
NVIDIA's OpenGL driver has a built-in mechanism to automatically flush commands to GPU when a lot have been queued. It's also pretty inconsistent, but we'll ignore that for now.
Our Vulkan implementation only submits a command buffer (flush equivalent) when it needs to. This is typically when another command buffer needs to be sequenced after it, presenting a frame, or an edge case where we flush around GPU queries to get results sooner.
This difference in flush behaviour causes a notable difference between Vulkan and OpenGL when we have to wait for commands. In the worst case, we will wait for a sync point that has just been created. In Vulkan, this sync point is created by flushing the command buffer, and storing a waitable fence that signals its completion. Our command buffer contains _every command that we queued since the last submit_, which could be an entire frame's worth of draws.
This has a huge effect on CPU <-> GPU latency. The more commands in a command buffer, the longer we have to wait for it to complete, which results in wasted time. Because we don't know when the guest will force us to wait, we always want the smallest possible latency.
By periodically flushing, we ensure that each command buffer takes a more consistent, smaller amount of time to execute, and that the back of the GPU queue isn't as far away when we need to wait for something to happen. This also might reduce time that the GPU is left inactive while commands are being built.
The main affected game is Pokemon Sword, which got significantly faster in overworld areas due to reduced waiting time when it flushes a shadow map from the main GPU thread.
Another affected game is BOTW, which gets faster depending on the area. This game flushes textures/buffers from its game thread, which is the bottleneck.
Flush latency and throughput may be improved on other games that are inexplicably slower than OpenGL. It's possible that certain games could have their performance _decreased_ slightly due to flushes not being free, but it is unlikely.
Also, flushing to get query results sooner has been tweaked to improve the number of full draw skips that can be done. (tested in SMO)
* Remove unused variable
* Fix possible issue with early query flush