* Implement storage buffer operations using new Load/Store instruction
* Extend GenerateMultiTargetStorageOp to also match access with constant offset, and log and comments
* Remove now unused code
* Catch more complex cases of global memory usage
* Shader cache version bump
* Extend global access elimination to work with more shared memory cases
* Change alignment requirement from 16 bytes to 8 bytes, handle cases where we need more than 16 storage buffers
* Tweak preferencing to catch more cases
* Enable CB0 elimination even when host storage buffer alignment is > 16 (for Intel)
* Fix storage buffer bindings
* Simplify some code
* Shader cache version bump
* Fix typo
* Extend global memory elimination to handle shared memory with multiple possible offsets and local memory
* GPU: Avoid using garbage size for non-cb0 storage buffers
In the depths area, Tears of the Kingdom uses a global memory access with address on constant buffer slot 6. This isn't standard and thus doesn't actually have a size 8 bytes after it, so we were reading back a garbage size that ended up very large (at least in version 1.1.0), and would synchronize a lot of data per frame.
This PR makes storage buffers created from addresses outside constant buffer slot 0 get their size as the number of bytes remaining in the GPU mapping starting at the given virtual address. This should bound the buffer to a reasonable size, and ideally stop it crossing into other memory.
* Limit max size
* Add TODO
* Feedback
* Allow any shader SSBO constant buffer slot and offset
* Fix slot value passed to SetUsedStorageBuffer on fallback case
* Shader cache version
* Ensure that the storage buffer source constant buffer offset is word aligned
* Fix FirstBinding on GetUniformBufferDescriptors
* WIP texture pre-flush
Improve performance of TextureView GetData to buffer
Fix copy/sync ordering
Fix minor bug
Make this actually work
WIP host mapping stuff
* Fix usage flags
* message
* Cleanup 1
* Fix rebase
* Fix
* Improve pre-flush rules
* Fix pre-flush
* A lot of cleanup
* Use the host memory bits
* Select the correct memory type
* Cleanup TextureGroupHandle
* Missing comment
* Remove debugging logs
* Revert BufferHandle _value access modifier
* One interrupt action at a time.
* Support D32S8 to D24S8 conversion, safeguards
* Interrupt cannot happen in sync handle's lock
Waitable needs to be checked twice now, but this should stop it from deadlocking.
* Remove unused using
* Address some feedback
* Address feedback
* Address more feedback
* Address more feedback
* Improve sync rules
Should allow for faster sync in some cases.