Mirror/Ryujinx

mirror of https://github.com/Ryujinx/Ryujinx.git synced 2024-11-20 05:16:33 +00:00

Author	SHA1	Message	Date
jhorv	5131b71437	Reducing memory allocations (#4537 ) * add RecyclableMemoryStream dependency and MemoryStreamManager * organize BinaryReader/BinaryWriter extensions * add StreamExtensions to reduce need for BinaryWriter * simple replacments of MemoryStream with RecyclableMemoryStream * add write ReadOnlySequence<byte> support to IVirtualMemoryManager * avoid 0-length array creation * rework IpcMessage and related types to greatly reduce memory allocation by using RecylableMemoryStream, keeping streams around longer, avoiding their creation when possible, and avoiding creation of BinaryReader and BinaryWriter when possible * reduce LINQ-induced memory allocations with custom methods to query KPriorityQueue * use RecyclableMemoryStream in StreamUtils, and use StreamUtils in EmbeddedResources * add constants for nanosecond/millisecond conversions * code formatting * XML doc adjustments * fix: StreamExtension.WriteByte not writing non-zero values for lengths <= 16 * XML Doc improvements. Implement StreamExtensions.WriteByte() block writes for large-enough count values. * add copyless path for StreamExtension.Write(ReadOnlySpan<int>) * add default implementation of IVirtualMemoryManager.Write(ulong, ReadOnlySequence<byte>); remove previous explicit implementations * code style fixes * remove LINQ completely from KScheduler/KPriorityQueue by implementing a custom struct-based enumerator	2023-03-17 13:14:50 +01:00
riperiperi	1fc90e57d2	Update range for remapped sparse textures instead of recreating them (#4442 ) * Update sparsely mapped texture ranges without recreating Important TODO in TexturePool. Smaller TODO: should I look into making textures with views also do this? It needs to be able to detect if the views can be instantly deleted without issue if they're now remapped. * Actually do partial updates * Signal group dirty after mappings changed * Fix various issues (should work now) * Further optimisation Should load a lot less data (16x) when partial updating 3d textures. * Improve stability * Allow granular uploads on large textures, improve rules * Actually avoid updating slices that aren't modified. * Address some feedback, minor optimisation * Small tweak * Refactor DereferenceRequest More specific initialization methods. * Improve code for resetting handles * Explain data loading a bit more * Add some safety for setting null from different threads. All texture sets come from the one thread, but null sets can come from multiple. Only decrement ref count if we succeeded the null set first. * Address feedback 1 * Make a bit safer	2023-03-14 17:08:44 -03:00
riperiperi	fc43aecbbd	Memory: Faster Split for NonOverlappingRangeList (#4451 ) I noticed that in Xenoblade 2, the game can end up spending a lot of time adding and removing tracking handles. One of the main causes of this is actually splitting existing handles, which does the following: - Remove existing handle from list - Update existing handle to end at split address, create new handle starting at split address - Add updated handle (left) to list - Add new handle (right) to list This costs 1 deletion and 2 insertions. When there are more handles, this gets a lot more expensive, as insertions are done by copying all values to the right, and deletions by copying values to the left. This PR simply allows it to look up the handle being split, and replace its entry with the new end address without insertion or deletion. This makes a split only cost one insertion and a binary search lookup (very cheap). This isn't all of the cost on Xenoblade 2, but it does significantly reduce it. There might be something else to this - we could find a way to reduce the handle count for the game (merging on deletion? buffer deletion?), we could use a different structure for virtual regions, as the current one is optimal for buffer lookups which nearly always read, memory tracking has more of a balance between read/write. That's for a later date though, this was an easy improvment.	2023-02-21 10:53:38 +01:00
gdkchan	efb135b74c	Clear CPU side data on GPU buffer clears (#4125 ) * Clear CPU side data on GPU buffer clears * Implement tracked fill operation that can signal other resource types except buffer * Fix tests, add missing XML doc * PR feedback	2023-02-16 18:28:49 -03:00
gdkchan	86fd0643c2	Implement support for page sizes > 4KB (#4252 ) * Implement support for page sizes > 4KB * Check and work around more alignment issues * Was not meant to change this * Use MemoryBlock.GetPageSize() value for signal handler code * Do not take the path for private allocations if host supports 4KB pages * Add Flags attribute on MemoryMapFlags * Fix dirty region size with 16kb pages Would accidentally report a size that was too high (generally 16k instead of 4k, uploading 4x as much data) Co-authored-by: riperiperi <rhy3756547@hotmail.com>	2023-01-17 05:13:24 +01:00
gnisman	b402b4e7f6	Change GetPageSize to use Environment.SystemPageSize (#4291 ) * Change GetPageSize to use Environment.SystemPageSize * Fix PR comment	2023-01-14 15:37:04 -03:00
gdkchan	5e0f8e8738	Implement JIT Arm64 backend (#4114 ) * Implement JIT Arm64 backend * PPTC version bump * Address some feedback from Arm64 JIT PR * Address even more PR feedback * Remove unused IsPageAligned function * Sync Qc flag before calls * Fix comment and remove unused enum * Address riperiperi PR feedback * Delete Breakpoint IR instruction that was only implemented for Arm64	2023-01-10 19:16:59 -03:00
Mary-nyan	b6614c6ad5	chore: Update tests dependencies (#3978 ) * chore: Update tests dependencies * Apply TSR Berry suggestion to add a GC.SuppressFinalize in MemoryBlock.cs * Ensure we wait for the test thread to be dead on PartialUnmap * Use platform attribute for os specific tests * Make P/Invoke methods private * Downgrade NUnit3TestAdapter to 4.1.0 * test: Disable warning about platform compat for ThreadLocalMap() Co-authored-by: TSR Berry <20988865+TSRBerry@users.noreply.github.com>	2023-01-01 17:35:29 +01:00
Berkan Diler	0d3b82477e	Use new ArgumentNullException and ObjectDisposedException throw-helper API (#4163 )	2022-12-27 20:27:11 +01:00
Isaac Marovitz	0fbcd630bc	Replace `DllImport` usage with `LibraryImport` (#4084 ) * Replace usage of `DllImport` with `LibraryImport` * Mark methods as `partial` * Marshalling * More `partial` & marshalling * More `partial` and marshalling * More partial and marshalling * Update GdiPlusHelper to LibraryImport * Unicorn * More Partial * Marshal * Specify EntryPoint * Specify EntryPoint * Change GlobalMemoryStatusEx to LibraryImport * Change RegisterClassEx to LibraryImport * Define EntryPoints * Update Ryujinx.Ava/Ui/Controls/Win32NativeInterop.cs Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com> * Update Ryujinx.Graphics.Nvdec.FFmpeg/Native/FFmpegApi.cs Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com> * Move return mashal * Remove calling convention specification * Remove calling conventions * Update Ryujinx.Common/SystemInfo/WindowsSystemInfo.cs Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com> * Update Ryujinx/Modules/Updater/Updater.cs Co-authored-by: Mary-nyan <thog@protonmail.com> * Update Ryujinx.Ava/Modules/Updater/Updater.cs Co-authored-by: Mary-nyan <thog@protonmail.com> Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com> Co-authored-by: Mary-nyan <thog@protonmail.com>	2022-12-15 18:07:31 +01:00
riperiperi	e211c3f00a	UI: Add Metal surface creation for MoltenVK (#3980 ) * Initial implementation of metal surface across UIs * Fix SDL2 on windows * Update Ryujinx/Ryujinx.csproj Co-authored-by: Mary-nyan <thog@protonmail.com> * Address Feedback Co-authored-by: Mary-nyan <thog@protonmail.com>	2022-12-06 19:00:25 -03:00
Andrey Sukharev	4da44e09cb	Make structs readonly when applicable (#4002 ) * Make all structs readonly when applicable. It should reduce amount of needless defensive copies * Make structs with trivial boilerplate equality code record structs * Remove unnecessary readonly modifiers from TextureCreateInfo * Make BitMap structs readonly too	2022-12-05 14:47:39 +01:00
merry	a5c2aead67	ConcurrentBitmap: Use Interlocked Or/And (#3937 )	2022-11-29 13:47:57 +00:00
riperiperi	65778a6b78	GPU: Don't trigger uploads for redundant buffer updates (#3828 ) * Initial implementation * Actually do The Thing * Add remark about performance to IVirtualMemoryManager	2022-11-24 15:50:15 +01:00
riperiperi	a16682cfd3	Allow _volatile to be set from MultiRegionHandle checks again (#3830 ) * Allow _volatile to be set from MultiRegionHandle checks again Tracking handles have a `_volatile` flag which indicates that the resource being tracked is modified every time it is used under a new sequence number. This is used to reduce the time spent reprotecting memory for tracking writes to commonly modified buffers, like constant buffers. This optimisation works by detecting if a buffer is modified every time a check happens. If a buffer is checked but it is not dirty, then that data is likely not modified every sequence number, and should use memory protection for write tracking. If the opposite is the case all the time, it is faster to just assume it's dirty as we'd just be wasting time protecting the memory. The new MultiRegionBitmap could not notify handles that they had been checked as part of the fast bitmap lookup, so bindings larger than 4096 bytes wouldn't trigger it at all. This meant that they would be subject to a ton of reprotection if they were modified often. This does mean there are two separate sources for a _volatile set: VolatileOrDirty + _checkCount, and the bitmap check. These shouldn't interfere with each other, though. This fixes performance regressions from #3775 in Pokemon Sword, and hopefully Yu-Gi-Oh! RUSH DUEL: Dawn of the Battle Royale. May affect other games. * Fix stupid mistake	2022-11-18 02:54:20 +00:00
Mary-nyan	c6d05301aa	infra: Migrate to .NET 7 (#3795 ) * Update readme to mention .NET 7 * infra: Migrate to .NET 7 .NET 7 is still in preview but this prepare for the release coming up next month. * Use Random.Shared in CreateRandom * Move UInt128Utils.cs to Ryujinx.Common project * Fix inverted parameters in System.UInt128 constructor * Fix Visual Studio complains on Ryujinx.Graphics.Vic * time: Fix missing alignment enforcement in SystemClockContext Fixes at least Smash * time: Fix missing alignment enforcement in SteadyClockContext Fix games (like recent version of Smash) using time shared memory * Switch to .NET 7.0.100 release * Enable Tiered PGO * Ensure CreateId validity requirements are meet when doing random generation Also enforce correct packing layout for other Mii structures. This fix a Mario Kart 8 crashes related to the default Miis.	2022-11-09 20:22:43 +01:00
riperiperi	3d98e1361b	GPU: Use a bitmap to track buffer modified flags. (#3775 ) * Initial implementation * Some improvements. * Fix incorrect cast * Performance improvement and improved correctness * Add very fast path when all handles are checked. * Slightly faster * Add comment * De-virtualize region handle All region handles are now bitmap backed. * Remove non-bitmap tracking * Remove unused methods * Add docs, remove unused methods * Address Feedback * Rename file	2022-10-29 22:07:37 +00:00
gdkchan	7d26e4ac7b	Fix mapping leaks caused by UnmapView not working on Linux (#3650 ) * Add test for UnmapView mapping leaks * Throw when UnmapView fails on Linux * Fix UnmapView * Remove throw	2022-10-19 01:02:45 +00:00
gdkchan	356e480bf5	Fix partial unmap reprotection on Windows (#3702 )	2022-09-14 17:46:37 +02:00
gdk	7dd69f2d0e	Allocation free tree lookup	2022-09-10 16:23:49 +02:00
gdk	c646638680	Update several methods to use GetNode directly and avoid array allocations	2022-09-10 16:23:49 +02:00
gdk	65f2a82b97	Optimize PlaceholderManager.UnreserveRange	2022-09-10 16:23:49 +02:00
gdk	93dd6d525a	Fix potential issue with partial unmap We must also do the unmap operation with the RWLock, otherwise faults on the unmapped region will cause crashes and the whole thing becomes pointless	2022-09-10 16:23:49 +02:00
gdk	96d4ad952c	Fix reprotection regression	2022-09-10 16:23:49 +02:00
gdk	45e520a27c	Rewrite PlaceholderManager4KB to use intrusive RBTree, and to coalesce free placeholders Also make the other placeholder manager use intrusive RBTree, allows the IntervalTree that was added just for this to be deleted	2022-09-10 16:23:49 +02:00
gdkchan	6922862db8	Optimize kernel memory block lookup and consolidate RBTree implementations (#3410 ) * Implement intrusive red-black tree, use it for HLE kernel block manager * Implement TreeDictionary using IntrusiveRedBlackTree * Implement IntervalTree using IntrusiveRedBlackTree * Implement IntervalTree (on Ryujinx.Memory) using IntrusiveRedBlackTree * Make PredecessorOf and SuccessorOf internal, expose Predecessor and Successor properties on the node itself * Allocation free tree node lookup	2022-08-26 18:21:48 +00:00
Nicholas Rodine	951700fdd8	Removed unused usings. (#3593 ) * Removed unused usings. * Added back using, now that it's used. * Removed extra whitespace.	2022-08-18 18:04:54 +02:00
riperiperi	14ce9e1567	Move partial unmap handler to the native signal handler (#3437 ) * Initial commit with a lot of testing stuff. * Partial Unmap Cleanup Part 1 * Fix some minor issues, hopefully windows tests. * Disable partial unmap tests on macos for now Weird issue. * Goodbye magic number * Add COMPlus_EnableAlternateStackCheck for tests `COMPlus_EnableAlternateStackCheck` is needed for NullReferenceException handling to work on linux after registering the signal handler, due to how dotnet registers its own signal handler. * Address some feedback * Force retry when memory is mapped in memory tracking This case existed before, but returning `false` no longer retries, so it would crash immediately after unprotecting the memory... Now, we return `true` to deliberately retry. This case existed before (was just broken by this change) and I don't really want to look into fixing the issue right now. Technically, this means that on guest code partial unmaps will retry _due to this_ rather than hitting the handler. I don't expect this to cause any issues. This should fix random crashes in Xenoblade Chronicles 2. * Use IsRangeMapped * Suppress MockMemoryManager.UnmapEvent warning This event is not signalled by the mock memory manager. * Remove 4kb mapping	2022-07-29 19:16:29 -03:00
gdkchan	232b1012b0	Fix ThreadingLock deadlock on invalid access and TerminateProcess (#3407 )	2022-06-24 02:53:16 +02:00
gdkchan	dd8f97ab9e	Remove freed memory range from tree on memory block disposal (#3347 ) * Remove freed memory range from tree on memory block disposal * PR feedback	2022-06-05 15:12:42 -03:00
gdkchan	54deded929	Fix shared memory leak on Windows (#3319 ) * Fix shared memory leak on Windows * Fix memory leak caused by RO session disposal not decrementing the memory manager ref count * Fix UnmapViewInternal deadlock * Was not supposed to add those back	2022-05-05 14:58:59 -03:00
gdkchan	074190e03c	Remove AddProtection count > 0 assert (#3315 )	2022-05-04 14:07:10 -03:00
gdkchan	95017b8c66	Support memory aliasing (#2954 ) * Back to the origins: Make memory manager take guest PA rather than host address once again * Direct mapping with alias support on Windows * Fixes and remove more of the emulated shared memory * Linux support * Make shared and transfer memory not depend on SharedMemoryStorage * More efficient view mapping on Windows (no more restricted to 4KB pages at a time) * Handle potential access violations caused by partial unmap * Implement host mapping using shared memory on Linux * Add new GetPhysicalAddressChecked method, used to ensure the virtual address is mapped before address translation Also align GetRef behaviour with software memory manager * We don't need a mirrorable memory block for software memory manager mode * Disable memory aliasing tests while we don't have shared memory support on Mac * Shared memory & SIGBUS handler for macOS * Fix typo + nits + re-enable memory tests * Set MAP_JIT_DARWIN on x86 Mac too * Add back the address space mirror * Only set MAP_JIT_DARWIN if we are mapping as executable * Disable aliasing tests again (still fails on Mac) * Fix UnmapView4KB (by not casting size to int) * Use ref counting on memory blocks to delay closing the shared memory handle until all blocks using it are disposed * Address PR feedback * Make RO hold a reference to the guest process memory manager to avoid early disposal Co-authored-by: nastys <nastys@users.noreply.github.com>	2022-05-02 20:30:02 -03:00
riperiperi	4a892fbdc9	Fix flush action from multiple threads regression (#3311 ) If two or more threads encounter a region of memory where a read action has been registered, then they must _both_ wait on the data. Clearing the action before it completed was causing the null check above to fail, so the action would only be run on the first thread, and the second would end up continuing without waiting. Depending on what the game does, this could be disasterous. This fixes a regression introduced by #3302 with Pokemon Legends Arceus, and possibly Catherine. There are likely other affected games. What is fixed in that PR should still be fixed.	2022-05-02 12:31:53 +02:00
riperiperi	d64594ec74	Fix various issues with texture sync (#3302 ) * Fix various issues with texture sync A variable called _actionRegistered is used to keep track of whether a tracking action has been registered for a given texture group handle. This variable is set when the action is registered, and should be unset when it is consumed. This is used to skip registering the tracking action if it's already registered, saving some time for render targets that are modified very often. There were two issues with this. The worst issue was that the tracking action handler exits early if the handle's modified flag is false... which means that it never reset _actionRegistered, as that was done within the Sync() method called later. The second issue was that this variable was set true after the sync action was registered, so it was technically possible for the action to run immediately, set the flag to false, then set it to true. Both situations would lead to the action never being registered again, as the texture group handle would be sure the action is already registered. This breaks the texture for the remaining runtime, or until it is disposed. It was also possible for a texture to register sync once, then on future frames the last modified sync number did not update. This may have caused some more minor issues. Seems to fix the Xenoblade flashing bug. Obviously this needs a lot of testing, since it was random chance. I typically had the most luck getting it to happen by switching time of day on the event theatre screen for a while, then entering the equipment screen by pressing X on an event. May also fix weird things like random chance air swimming in BOTW, maybe a few texture streaming bugs. * Exchange rather than CompareExchange	2022-04-29 18:34:11 -03:00
gdkchan	0a24aa6af2	Allow textures to have their data partially mapped (#2629 ) * Allow textures to have their data partially mapped * Explicitly check for invalid memory ranges on the MultiRangeList * Update GetWritableRegion to also support unmapped ranges	2022-02-22 13:34:16 -03:00
riperiperi	cda659955c	Texture Sync, incompatible overlap handling, data flush improvements. (#2971 ) * Initial test for texture sync * WIP new texture flushing setup * Improve rules for incompatible overlaps Fixes a lot of issues with Unreal Engine games. Still a few minor issues (some caused by dma fast path?) Needs docs and cleanup. * Cleanup, improvements Improve rules for fast DMA * Small tweak to group together flushes of overlapping handles. * Fixes, flush overlapping texture data for ASTC and BC4/5 compressed textures. Fixes the new Life is Strange game. * Flush overlaps before init data, fix 3d texture size/overlap stuff * Fix 3D Textures, faster single layer flush Note: nosy people can no longer merge this with Vulkan. (unless they are nosy enough to implement the new backend methods) * Remove unused method * Minor cleanup * More cleanup * Use the More Fun and Hopefully No Driver Bugs method for getting compressed tex too This one's for metro * Address feedback, ASTC+ETC to FormatClass * Change offset to use Span slice rather than IntPtr Add * Fix this too	2022-01-09 13:28:48 -03:00
Mary	00c69f2098	Remove usage of Mono.Posix.NETStandard accross all projects (#2906 ) * Remove usage of Mono.Posix.NETStandard in Ryujinx project * Remove usage of Mono.Posix.NETStandard in ARMeilleure project * Remove usage of Mono.Posix.NETStandard in Ryujinx.Memory project * Address gdkchan's comments	2021-12-08 18:24:26 -03:00
Mary	f39fce8f54	misc: Migrate usage of RuntimeInformation to OperatingSystem (#2901 ) Very basic migration across the codebase.	2021-12-04 20:02:30 -03:00
Mary	57d3296ba4	infra: Migrate to .NET 6 (#2829 ) * infra: Migrate to .NET 6 * Rollback version naming change * Workaround .NET 6 ZipArchive API issues * ci: Switch to VS 2022 for AppVeyor CI is now ready for .NET 6 * Suppress WebClient warning in DoUpdateWithMultipleThreads * Attempt to workaround System.Drawing.Common changes on 6.0.0 * Change keyboard rendering from System.Drawing to ImageSharp * Make the software keyboard renderer multithreaded * Bump ImageSharp version to 1.0.4 to fix a bug in Image.Load * Add fallback fonts to the keyboard renderer * Fix warnings * Address caian's comment * Clean up linux workaround as it's uneeded now * Update readme Co-authored-by: Caian Benedicto <caianbene@gmail.com>	2021-11-28 21:24:17 +01:00
Mary	b4dc33efc2	kernel: Clear pages allocated with SetHeapSize (#2776 ) * kernel: Clear pages allocated with SetHeapSize Before this commit, all new pages allocated by SetHeapSize were not cleared by the kernel. This would cause undefined data to be pass to the userland and possibly resulting in weird memory corruption. This commit also add support for custom fill heap and ipc value (that is also supported by the official kernel) * Remove dots at the end of KPageTableBase.MapPages new documentation * Remove unused _stackFillValue	2021-10-24 18:52:59 -03:00
Mary	85d8d1d7ca	misc: Fix IVirtualMemoryManager.Fill ignoring value (#2775 ) This fix IVirtualMemoryManager.Fill to actually use the provided fill value instead of 0. This have no implication at the moment as everything that use it pass 0 but it is needed for some upcoming kernel fixes.	2021-10-24 18:16:59 -03:00
riperiperi	fff48bb45a	Smaller initial size for ModifiedRangeList & directly inherit range list (#2663 ) This fixes a potential regression with the new range list changes, where the cost for creating new ones would be rather large due to creating a 1024 size array. Also reduces cost for range list inheritance by using the first existing range list as a base, rather than creating a new one then adding both lists to it. The growth size for the RangeList is now identical to its initial size. Every 32 elements was probably a little too common - now it is 1024 for most things and 8 for the buffer modified range list. The Unmapped and SyncMethod methods have been changed to ensure that they behave properly if the range list is set null. Cleaned up a few calls to use the null-conditional operator.	2021-10-04 15:38:59 -03:00
riperiperi	d92fff541b	Replace CacheResourceWrite with more general "precise" write (#2684 ) * Replace CacheResourceWrite with more general "precise" write The goal of CacheResourceWrite was to notify GPU resources when they were modified directly, by looking up the modified address/size in a structure and calling a method on each resource. The downside of this is that each resource cache has to be queried individually, they all have to implement their own way to do this, and it can only signal to resources using the same PhysicalMemory instance. This PR adds the ability to signal a write as "precise" on the tracking, which signals a special handler (if present) which can be used to avoid unnecessary flush actions, or maybe even more. For buffers, precise writes specifically do not flush, and instead punch a hole in the modified range list to indicate that the data on GPU has been replaced. The downside is that precise actions must ignore the page protection bits and always signal - as they need to notify the target resource to ignore the sequence number optimization. I had to reintroduce the sequence number increment after I2M, as removing it was causing issues in rabbids kingdom battle. However - all resources modified by I2M are notified directly to lower their sequence number, so the problem is likely that another unrelated resource is not being properly updated. Thankfully, doing this does not affect performance in the games I tested. This should fix regressions from #2624. Test any games that were broken by that. (RF4, rabbids kingdom battle) I've also added a sequence number increment to ThreedClass.IncrementSyncpoint, as it seems to fix buffer corruption in OpenGL homebrew. (this was a regression from removing sequence number increment from constant buffer update - another unrelated resource thing) * Add tests. * Add XML docs for GpuRegionHandle * Skip UpdateProtection if only precise actions were called This allows precise actions to skip reprotection costs.	2021-09-29 02:27:03 +02:00
riperiperi	db97b1d7d2	Implement and use an Interval Tree for the MultiRangeList (#2641 ) * Implement and use an Interval Tree for the MultiRangeList * Feedback * Address Feedback * Missed this somehow	2021-09-19 14:55:07 +02:00
riperiperi	7379bc2f39	Array based RangeList that caches Address/EndAddress (#2642 ) * Array based RangeList that caches Address/EndAddress In isolation, this was more than 2x faster than the RangeList that checks using the interface. In practice I'm seeing much better results than I expected. The array is used because checking it is slightly faster than using a list, which loses time to struct copies, but I still want that data locality. A method has been added to the list to update the cached end address, as some users of the RangeList currently modify it dynamically. Greatly improves performance in Super Mario Odyssey, Xenoblade and any other GPU limited games. * Address Feedback	2021-09-19 14:22:26 +02:00
riperiperi	54adc5f9fb	Ensure that all threads wait for a read tracking action to complete. (#2597 ) * Lock around tracking action consume + execute. Not particularly fast. * Lock around preaction registration and use * Create a lock object * Nit	2021-08-29 16:03:41 -03:00
riperiperi	ec3e848d79	Add a Multithreading layer for the GAL, multi-thread shader compilation at runtime (#2501 ) * Initial Implementation About as fast as nvidia GL multithreading, can be improved with faster command queuing. * Struct based command list Speeds up a bit. Still a lot of time lost to resource copy. * Do shader init while the render thread is active. * Introduce circular span pool V1 Ideally should be able to use structs instead of references for storing these spans on commands. Will try that next. * Refactor SpanRef some more Use a struct to represent SpanRef, rather than a reference. * Flush buffers on background thread * Use a span for UpdateRenderScale. Much faster than copying the array. * Calculate command size using reflection * WIP parallel shaders * Some minor optimisation * Only 2 max refs per command now. The command with 3 refs is gone. 😌 * Don't cast on the GPU side * Remove redundant casts, force sync on window present * Fix Shader Cache * Fix host shader save. * Fixup to work with new renderer stuff * Make command Run static, use array of delegates as lookup Profile says this takes less time than the previous way. * Bring up to date * Add settings toggle. Fix Muiltithreading Off mode. * Fix warning. * Release tracking lock for flushes * Fix Conditional Render fast path with threaded gal * Make handle iteration safe when releasing the lock This is mostly temporary. * Attempt to set backend threading on driver Only really works on nvidia before launching a game. * Fix race condition with BufferModifiedRangeList, exceptions in tracking actions * Update buffer set commands * Some cleanup * Only use stutter workaround when using opengl renderer non-threaded * Add host-conditional reservation of counter events There has always been the possibility that conditional rendering could use a query object just as it is disposed by the counter queue. This change makes it so that when the host decides to use host conditional rendering, the query object is reserved so that it cannot be deleted. Counter events can optionally start reserved, as the threaded implementation can reserve them before the backend creates them, and there would otherwise be a short amount of time where the counter queue could dispose the event before a call to reserve it could be made. * Address Feedback * Make counter flush tracked again. Hopefully does not cause any issues this time. * Wait for FlushTo on the main queue thread. Currently assumes only one thread will want to FlushTo (in this case, the GPU thread) * Add SDL2 headless integration * Add HLE macro commands. Co-authored-by: Mary <mary@mary.zone>	2021-08-27 00:31:29 +02:00
gdkchan	bb8a920b63	Do not dirty memory tracking region handles if they are partially unmapped (#2536 )	2021-08-11 21:50:33 +02:00
riperiperi	4b60371e64	Return mapped buffer pointer directly for flush, WriteableRegion for textures (#2494 ) * Return mapped buffer pointer directly for flush, WriteableRegion for textures A few changes here to generally improve performance, even for platforms not using the persistent buffer flush. - Texture and buffer flush now return a ReadOnlySpan<byte>. It's guaranteed that this span is pinned in memory, but it will be overwritten on the next flush from that thread, so it is expected that the data is used before calling again. - As a result, persistent mappings no longer copy to a new array - rather the persistent map is returned directly as a Span<>. A similar host array is used for the glGet flushes instead of allocating new arrays each time. - Texture flushes now do their layout conversion into a WriteableRegion when the texture is not MultiRange, which allows the flush to happen directly into guest memory rather than into a temporary span, then copied over. This avoids another copy when doing layout conversion. Overall, this saves 1 data copy for buffer flush, 1 copy for linear textures with matching source/target stride, and 2 copies for block textures or linear textures with mismatching strides. * Fix tests * Fix array pointer for Mesa/Intel path * Address some feedback * Update method for getting array pointer.	2021-07-19 19:10:54 -03:00

1 2

73 commits