Commit graph

11 commits

Author SHA1 Message Date
Logan Stromberg
6eb85e846f
Reduce some unnecessary allocations in DMA handler (#2886)
* experimental changes to try and reduce allocations in kernel threading and DMA handler

* Simplify the changes in this branch to just 1. Don't make unnecessary copies of data just for texture-texture transfers and 2. Add a fast path for 1bpp linear byte copies

* forgot to check src + dst linearity in 1bpp DMA fast path. Fixes the UE4 regression.

* removing dev log I left in

* Generalizing the DMA linear fast path to cases other than 1bpp copies

* revert kernel changes

* revert whitespace

* remove unneeded references

* PR feedback

Co-authored-by: Logan Stromberg <lostromb@microsoft.com>
Co-authored-by: gdk <gab.dark.100@gmail.com>
2022-07-14 15:45:56 -03:00
gdkchan
79408b68c3
De-tile GOB when DMA copying from block linear to pitch kind memory regions (#3207)
* De-tile GOB when DMA copying from block linear to pitch kind memory regions

* XML docs + nits

* Remove using

* No flush for regular buffer copies

* Add back ulong casts, fix regression due to oversight
2022-03-20 13:55:07 -03:00
gdkchan
7bfb5f79b8
When copying linear textures, DMA should ignore region X/Y (#3121) 2022-02-16 11:13:45 +01:00
riperiperi
c52158b733
Add timestamp to 16-byte/4-word semaphore releases. (#3049)
* Add timestamp to 16-byte semaphore releases.

BOTW was reading a ulong 8 bytes after a semaphore return. Turns out this is the timestamp it was trying to do performance calculation with, so I've made it write when necessary.

This mode was also added to the DMA semaphore I added recently, as it is required by a few games. (i think quake?)

The timestamp code has been moved to GPU context. Check other games with an unusually low framerate cap or dynamic resolution to see if they have improved.

* Cast dma semaphore payload to ulong to fill the space

* Write timestamp first

Might be just worrying too much, but we don't want the applcation reading timestamp if it sees the payload before timestamp is written.
2022-01-27 22:50:32 +01:00
riperiperi
cda659955c
Texture Sync, incompatible overlap handling, data flush improvements. (#2971)
* Initial test for texture sync

* WIP new texture flushing setup

* Improve rules for incompatible overlaps

Fixes a lot of issues with Unreal Engine games. Still a few minor issues (some caused by dma fast path?) Needs docs and cleanup.

* Cleanup, improvements

Improve rules for fast DMA

* Small tweak to group together flushes of overlapping handles.

* Fixes, flush overlapping texture data for ASTC and BC4/5 compressed textures.

Fixes the new Life is Strange game.

* Flush overlaps before init data, fix 3d texture size/overlap stuff

* Fix 3D Textures, faster single layer flush

Note: nosy people can no longer merge this with Vulkan. (unless they are nosy enough to implement the new backend methods)

* Remove unused method

* Minor cleanup

* More cleanup

* Use the More Fun and Hopefully No Driver Bugs method for getting compressed tex too

This one's for metro

* Address feedback, ASTC+ETC to FormatClass

* Change offset to use Span slice rather than IntPtr Add

* Fix this too
2022-01-09 13:28:48 -03:00
gdkchan
a87f7f2029
Fix DMA copy fast path line size when xCount < stride (#2942) 2021-12-26 13:05:26 -03:00
riperiperi
521a07e612
Add support for releasing a semaphore to DmaClass (#2926)
* Add support for releasing a semaphore to DmaClass

Fixes freezes in OpenGL games, primarily GameMaker ones such as Undertale.

* Address Feedback
2021-12-19 11:32:52 -03:00
gdkchan
ac4ec1a015
Account for negative strides on DMA copy (#2623)
* Account for negative strides on DMA copy

* Should account for non-zero Y
2021-09-11 22:54:18 +02:00
gdkchan
ff5df5d8a1
Support non-contiguous copies on I2M and DMA engines (#2473)
* Support non-contiguous copies on I2M and DMA engines

* Vector copy should start aligned on I2M

* Nits

* Zero extend the offset
2021-08-04 22:20:58 +02:00
gdkchan
40b21cc3c4
Separate GPU engines (part 2/2) (#2440)
* 3D engine now uses DeviceState too, plus new state modification tracking

* Remove old methods code

* Remove GpuState and friends

* Optimize DeviceState, force inline some functions

* This change was not supposed to go in

* Proper channel initialization

* Optimize state read/write methods even more

* Fix debug build

* Do not dirty state if the write is redundant

* The YControl register should dirty either the viewport or front face state too, to update the host origin

* Avoid redundant vertex buffer updates

* Move state and get rid of the Ryujinx.Graphics.Gpu.State namespace

* Comments and nits

* Fix rebase

* PR feedback

* Move changed = false to improve codegen

* PR feedback

* Carry RyuJIT a bit more
2021-07-11 17:20:40 -03:00
gdkchan
8b44eb1c98
Separate GPU engines and make state follow official docs (part 1/2) (#2422)
* Use DeviceState for compute and i2m

* Migrate 2D class, more comments

* Migrate DMA copy engine

* Remove now unused code

* Replace GpuState by GpuAccessorState on GpuAcessor, since compute no longer has a GpuState

* More comments

* Add logging (disabled)

* Add back i2m on 3D engine
2021-07-07 20:56:06 -03:00