* Use copy dependency for textures that differs in multisample but are otherwise compatible
* Remove allowMs flag as it's no longer required for correctness, it's just an optimization now
* Dispose intermmediate pool
* Prefetch capabilities before spawning translation threads.
The Backend Multithreading only expects one thread to submit commands at a time. When compiling shaders, the translator may request the host GPU capabilities from the backend. It's possible for a bunch of translators to do this at the same time.
There's a caching mechanism in place so that the capabilities are only fetched once. By triggering this before spawning the thread, the async translation threads no longer try to queue onto the backend queue all at the same time.
The Capabilities do need to be checked from the GPU thread, due to OpenGL needing a context to check them, so it's not possible to call the underlying backend directly.
* Initialize the capabilities when setting the GPU thread + missing call in headless
* Remove private variables
* Fix various issues with texture sync
A variable called _actionRegistered is used to keep track of whether a tracking action has been registered for a given texture group handle. This variable is set when the action is registered, and should be unset when it is consumed. This is used to skip registering the tracking action if it's already registered, saving some time for render targets that are modified very often.
There were two issues with this. The worst issue was that the tracking action handler exits early if the handle's modified flag is false... which means that it never reset _actionRegistered, as that was done within the Sync() method called later. The second issue was that this variable was set true after the sync action was registered, so it was technically possible for the action to run immediately, set the flag to false, then set it to true.
Both situations would lead to the action never being registered again, as the texture group handle would be sure the action is already registered. This breaks the texture for the remaining runtime, or until it is disposed.
It was also possible for a texture to register sync once, then on future frames the last modified sync number did not update. This may have caused some more minor issues.
Seems to fix the Xenoblade flashing bug. Obviously this needs a lot of testing, since it was random chance. I typically had the most luck getting it to happen by switching time of day on the event theatre screen for a while, then entering the equipment screen by pressing X on an event.
May also fix weird things like random chance air swimming in BOTW, maybe a few texture streaming bugs.
* Exchange rather than CompareExchange
* New shader cache implementation
* Remove some debug code
* Take transform feedback varying count into account
* Create shader cache directory if it does not exist + fragment output map related fixes
* Remove debug code
* Only check texture descriptors if the constant buffer is bound
* Also check CPU VA on GetSpanMapped
* Remove more unused code and move cache related code
* XML docs + remove more unused methods
* Better codegen for TransformFeedbackDescriptor.AsSpan
* Support migration from old cache format, remove more unused code
Shader cache rebuild now also rewrites the shared toc and data files
* Fix migration error with BRX shaders
* Add a limit to the async translation queue
Avoid async translation threads not being able to keep up and the queue growing very large
* Re-create specialization state on recompile
This might be required if a new version of the shader translator requires more or less state, or if there is a bug related to the GPU state access
* Make shader cache more error resilient
* Add some missing XML docs and move GpuAccessor docs to the interface/use inheritdoc
* Address early PR feedback
* Fix rebase
* Remove IRenderer.CompileShader and IShader interface, replace with new ShaderSource struct passed to CreateProgram directly
* Handle some missing exceptions
* Make shader cache purge delete both old and new shader caches
* Register textures on new specialization state
* Translate and compile shaders in forward order (eliminates diffs due to different binding numbers)
* Limit in-flight shader compilation to the maximum number of compilation threads
* Replace ParallelDiskCacheLoader state changed event with a callback function
* Better handling for invalid constant buffer 1 data length
* Do not create the old cache directory structure if the old cache does not exist
* Constant buffer use should be per-stage. This change will invalidate existing new caches (file format version was incremented)
* Replace rectangle texture with just coordinate normalization
* Skip incompatible shaders that are missing texture information, instead of crashing
This is required if we, for example, support new texture instruction to the shader translator, and then they allow access to textures that were not accessed before. In this scenario, the old cache entry is no longer usable
* Fix coordinates normalization on cubemap textures
* Check if title ID is null before combining shader cache path
* More robust constant buffer address validation on spec state
* More robust constant buffer address validation on spec state (2)
* Regenerate shader cache with one stream, rather than one per shader.
* Only create shader cache directory during initialization
* Logging improvements
* Proper shader program disposal
* PR feedback, and add a comment on serialized structs
* XML docs for RegisterTexture
Co-authored-by: riperiperi <rhy3756547@hotmail.com>
* De-tile GOB when DMA copying from block linear to pitch kind memory regions
* XML docs + nits
* Remove using
* No flush for regular buffer copies
* Add back ulong casts, fix regression due to oversight
* Allow textures to have their data partially mapped
* Explicitly check for invalid memory ranges on the MultiRangeList
* Update GetWritableRegion to also support unmapped ranges
* Collapse AsSpan().Slice(..) calls into AsSpan(..)
Less code and a bit faster
* Collapse an Array.Clear(array, 0, array.Length) call to Array.Clear(array)
* Do not allow render targets not explicitly written by the fragment shader to be modified
* Shader cache version bump
* Remove blank lines
* Avoid redundant color mask updates
* HostShaderCacheEntry can be null
* Avoid more redundant glColorMask calls
* nit: Mask -> Masks
* Fix currentComponentMask
* More efficient way to update _currentComponentMasks
* Add timestamp to 16-byte semaphore releases.
BOTW was reading a ulong 8 bytes after a semaphore return. Turns out this is the timestamp it was trying to do performance calculation with, so I've made it write when necessary.
This mode was also added to the DMA semaphore I added recently, as it is required by a few games. (i think quake?)
The timestamp code has been moved to GPU context. Check other games with an unusually low framerate cap or dynamic resolution to see if they have improved.
* Cast dma semaphore payload to ulong to fill the space
* Write timestamp first
Might be just worrying too much, but we don't want the applcation reading timestamp if it sees the payload before timestamp is written.
This fixes an issue where the render scale array would not be updated when technically the scales on the flat array were the same, but the start index for the vertex scales was different.
* Add support for BC1/2/3 decompression (for 3D textures)
* Optimize and clean up
* Unsafe not needed here
* Fix alpha value interpolation when a0 <= a1
* Stop using glTransformFeedbackVarying and use explicit layout on the shader
* This is no longer needed
* Shader cache version bump
* Fix gl_PerVertex output for tessellation control shaders
This fixes some regressions caused by #2971 which caused rendered 3D texture data to be lost for most slices. Fixes issues with Xenoblade 2's colour grading, probably a ton of other games.
This also removes the check from TextureCache, making it the tiniest bit smaller (any win is a win here).
* Implement IMUL shader instruction
* Implement PCNT/CONT instruction and fix FFMA32I
* Add HFMA232I to the table
* Shader cache version bump
* No Rc on Ffma32i
* Initial test for texture sync
* WIP new texture flushing setup
* Improve rules for incompatible overlaps
Fixes a lot of issues with Unreal Engine games. Still a few minor issues (some caused by dma fast path?) Needs docs and cleanup.
* Cleanup, improvements
Improve rules for fast DMA
* Small tweak to group together flushes of overlapping handles.
* Fixes, flush overlapping texture data for ASTC and BC4/5 compressed textures.
Fixes the new Life is Strange game.
* Flush overlaps before init data, fix 3d texture size/overlap stuff
* Fix 3D Textures, faster single layer flush
Note: nosy people can no longer merge this with Vulkan. (unless they are nosy enough to implement the new backend methods)
* Remove unused method
* Minor cleanup
* More cleanup
* Use the More Fun and Hopefully No Driver Bugs method for getting compressed tex too
This one's for metro
* Address feedback, ASTC+ETC to FormatClass
* Change offset to use Span slice rather than IntPtr Add
* Fix this too
* Add support for render scale to vertex stage.
Occasionally games read off textureSize on the vertex stage to inform the fragment shader what size a texture is without querying in there. Scales were not present in the vertex shader to correct the sizes, so games were providing the raw upscaled texture size to the fragment shader, which was incorrect.
One downside is that the fragment and vertex support buffer description must be identical, so the full size scales array must be defined when used. I don't think this will have an impact though. Another is that the fragment texture count must be updated when vertex shader textures are used. I'd like to correct this so that the update is folded into the update for the scales.
Also cleans up a bunch of things, like it making no sense to call CommitRenderScale for each stage.
Fixes render scale causing a weird offset bloom in Super Mario Party and Clubhouse Games. Clubhouse Games still has a pixelated look in a number of its games due to something else it does in the shader.
* Split out support buffer update, lazy updates.
* Commit support buffer before compute dispatch
* Remove unnecessary qualifier.
* Address Feedback
* Flip scissor box when the YNegate bit is set
* Flip scissor based on screen scissor state, account for negative scissor Y
* No need for abs when we already know the value is negative