Commit graph

1512 commits

Author SHA1 Message Date
Chenj168
7ad8b3ef75
Move the OpActivator to OpCodeTable class for improve performance (#1001)
* Move the OpActivator to OpCodeTable class, for reduce the use of ConcurrentDictionary

* Modify code style.
2020-03-29 19:52:56 +11:00
Xpl0itR
2816b2bf17
Discord Rich Presence update (#1029)
* Fix bug

* Add a few more titles and move the list from dat file into the code

* requested changes
2020-03-29 14:25:54 +11:00
gdkchan
ab4867505e
Implement GPU scissors (#1058)
* Implement GPU scissors

* Remove unused using

* Add missing changes for Clear
2020-03-29 14:02:58 +11:00
Elise
06bf25521f
Implement NOP and stub DEPBAR shader instructions (#1041)
* Implement NOP and stub DEPBAR shader instruction

* Fix a few issues and formatting stuff

* Remove OpCodeNop/Depbar and use OpCode instead

* Fix NOP shader instruction opcode

* Fix formatting
2020-03-26 19:30:16 -03:00
Thog
0dd38028cb
Make Device Location Name configuration (custom TZ) (#1031)
This permit to use arbitrary timezone (instead of UTC).

Useful for games like ACNH.
2020-03-26 09:23:21 +11:00
jduncanator
82c3df83c4
prepo: Add a MessagePack object formatter (#1034) 2020-03-26 08:33:18 +11:00
Thog
5423daea56
ui: Make it possible to open the device save directory (#1040)
* Add an open device folder option

* Simplify logic from previous commit

* Address Xpl0itR's comments

* Address Ac_K comment
2020-03-25 18:09:38 +01:00
Elise
d5670aff77
Fix unhandled UnauthorizedAccessException causing crash while listing… (#1025)
* Fix unhandled UnauthorizedAccessException causing crash while listing directories

* Actually handle not having privileges for a directory

* Fix log message when not having privileges for a directory

* Remove unneccesary empty lines

* Remove unneccecssary space
2020-03-25 17:17:54 +01:00
gdkchan
1586450a38
Implement VMNMX shader instruction (#1032)
* Implement VMNMX shader instruction

* No need for the gap on the enum

* Fix typo
2020-03-25 15:49:10 +01:00
Ac_K
56374c8633
prepo: Implement RequestImmediateTransmission and GetTransmissionStatus (#1033)
* prepo: Implement RequestImmediateTransmission and GetTransmissionStatus

This implement RequestImmediateTransmission and GetTransmissionStatus of the prepo service accurately to RE.
Since we don't use reports, I've explained what the calls do in the real service.

Close #958

* fix comment
2020-03-25 12:28:15 +01:00
Alex Barney
21c9c04f9f
Add IMultiCommitManager (#1011)
* Update LibHac

* Add IMultiCommitManager

* Updates

* Delete NuGet.Config

* Add command version
2020-03-25 19:14:35 +11:00
riperiperi
f695a215ad
Add Fast Paths for Crypto instructions (A32/A64) (#1026)
* Add Fast Paths for Crypto instructions (A32/A64)

* Replace additional XOR with passing in const zero.
2020-03-25 17:20:29 +11:00
Ac_K
a40d8d4a17
Fix gpu vendor name parsing (#1030)
* Update GLRenderer.cs

* print the vendor name fully
2020-03-25 00:12:01 +01:00
jduncanator
daecb1193d
prepo: Resolve JSON parsing issues in prepo report handling (#1022)
It seems MsgPack.Cli incorrectly converts some unicode control characters directly to JSON literal Unicode strings (eg. '\1'). As these are invalid, pretty printing the JSON report fails.

This removes pretty printing, pending a proper implementation, and tidies up MsgPack deserialization.
2020-03-24 23:26:37 +01:00
HorrorTroll
17200214df
Add GPU name in status bar (#984)
* Add GPU name in status bar

* Fixed like Ac_K suggest

* Nit.

* Minor fix

* Minor change.

* Nit.

* Fixed for ATI vendor

* Minor fix, again...
2020-03-24 22:54:09 +01:00
LDj3SNuD
1de16f7653
Add Fcvtas_S/V & Fcvtau_S/V. (#1018) 2020-03-24 22:53:49 +01:00
Chenj168
d0960e75aa
Fix the item name cannot be displayed in profiler view. (#1021) 2020-03-24 18:48:03 +01:00
Thog
e590262531
friends: Stub GetBlockedUserListIds (#1017) 2020-03-23 22:19:45 +01:00
Ac_K
b2d307d34f
Fix Prepo parsing reports (#1016)
This fix the parsing of prepo service reports which could failed in some edge case. I've improved the parsing of the object to a JSON string too.
2020-03-23 21:58:41 +01:00
gdkchan
6edc929894
Implement ICMP shader instruction (#1010) 2020-03-23 17:32:30 +01:00
gdkchan
9a208c4fb5
Keep sRGB enabled for texture blits (#1009) 2020-03-23 09:34:26 +11:00
gdkchan
49d7b1c7d8
Implement textureQueryLevels (#1007) 2020-03-23 08:31:31 +11:00
Chenj168
31b94f4641
Move the MakeOp to OpCodeTable class, for reduce the use of ConcurrentDictionary (#996) 2020-03-20 15:15:37 +11:00
gdkchan
8e64984158
Support partial invalidation on texture access (#1000)
* Support partial invalidation on texture access

* Fix typo

* PR feedback

* Fix modified size clamping
2020-03-20 14:17:11 +11:00
Ac_K
32d3f3f690
Implement GetRegionCode and add the RegionCode to settings (#999)
This implement `GetRegionCode` accordingly to RE. I've added a setting in the GUI and a field in the Configuration file with a way to update the Configuration file if needed.
2020-03-20 09:37:55 +11:00
Chenj168
561d64e5bf
Modify TranslatedFunction.GetPointer () to optimize performance (#995)
* add local var to reduce calling Marshal.GetFunctionPointerForDelegate

* modify code style
2020-03-20 09:11:20 +11:00
riperiperi
8226997bc7
CodeGen Optimisations (LSRA and Translator) (#978)
* Start of JIT garbage collection improvements

- thread static pool for Operand, MemoryOperand, Operation
- Operands and Operations are always to be constructed via their static
helper classes, so they can be pooled.
- removing LinkedList from Node for sources/destinations (replaced with
List<>s for now, but probably could do arrays since size is bounded)
- removing params constructors from Node
- LinkedList<> to List<> with Clear() for Operand assignments/uses
- ThreadStaticPool is very simple and basically just exists for the
purpose of our specific translation allocation problem. Right now it
will stay at the worst case allocation count for that thread (so far) -
the pool can never shrink.

- Still some cases of Operand[] that haven't been removed yet. Will need
to evaluate them (eg. is there a reasonable max number of params for
Calls?)

* ConcurrentStack instead of ConcurrentQueue for Rejit

* Optimize some parts of LSRA

- BitMap now operates on 64-bit int rather than 32-bit
- BitMap is now pooled in a ThreadStatic pool (within lrsa)
- BitMap now is now its own iterator. Marginally speeds up iterating
through the bits.
- A few cases where enumerators were generated have been converted to
forms that generate less garbage.
- New data structure for sorting _usePositions in LiveIntervals. Much
faster split, NextUseAfter, initial insertion. Random insertion is
slightly slower.
- That last one is WIP since you need to insert the values backwards. It
would be ideal if it just flipped it for you, uncomplicating things on
the caller side.

* Use a static pool of thread static pools. (yes.)

Prevents each execution thread creating its own lowCq pool and making me cry.

* Move constant value to top, change naming convention.

* Fix iteration of memory operands.

* Increase max thread count.

* Address Feedback
2020-03-18 22:44:32 +11:00
Thog
7475e180b4
audren: Accept REV8 (#993)
REV8 only added changes on performance buffer and wavebuffer dsp command
generation.

As we don't support any of those, we can just increment the revision
number for now.
2020-03-18 09:43:47 +11:00
gdkchan
8bb64ac69c
Improve shader sampler type selection (#989) 2020-03-15 11:24:45 +11:00
riperiperi
8ce3993afa
Fix GTK window crash by using 24 bit surface on unix, 32 bit on windows. (#976)
* Use 24 bit surface on unix, 32 bit on windows.

* Address jd's comment

Co-authored-by: Thomas Guillemard <me@thog.eu>
2020-03-14 22:36:56 +01:00
riperiperi
dd433c1296
Implement AESMC, AESIMC, AESE, AESD and VEOR AArch32 instructions (#982)
* Add VEOR and AES instructions.

* Add tests for crypto instructions.

* Update ValueSource name.
2020-03-14 10:29:58 +11:00
gdkchan
ff2bac9c90
Implement MME shadow RAM (#987) 2020-03-13 12:30:26 +11:00
riperiperi
d904706fc0
Use a Jump Table for direct and indirect calls/jumps, removing transitions to managed (#975)
* Implement Jump Table for Native Calls

NOTE: this slows down rejit considerably! Not recommended to be used
without codegen optimisation or AOT.

- Does not work on Linux
- A32 needs an additional commit.

* A32 Support

(WIP)

* Actually write Direct Call pointers to the table

That would help.

* Direct Calls: Rather than returning to the translator, attempt to keep within the native stack frame.

A return to the translator can still happen, but only by exceptionally
bubbling up to it.

Also:
- Always translate lowCq as a function. Faster interop with the direct
jumps, and this will be useful in future if we want to do speculative
translation.
- Tail Call Detection: after the decoding stage, detect if we do a tail
call, and avoid translating into it. Detected if a jump is made to an
address outwith the contiguous sequence of blocks surrounding the entry
point. The goal is to reduce code touched by jit and rejit.

* A32 Support

* Use smaller max function size for lowCq, fix exceptional returns

When a return has an unexpected value and there is no code block
following this one, we now return the value rather than continuing.

* CompareAndSwap (buggy)

* Ensure CompareAndSwap does not get optimized away.

* Use CompareAndSwap to make the dynamic table thread safe.

* Tail call for linux, throw on too many arguments.

* Combine CompareAndSwap 128 and 32/64.

They emit different IR instructions since their PreAllocator behaviour
is different, but now they just have one function on EmitterContext.

* Fix issues separating from optimisations.

* Use a stub to find and execute missing functions.

This allows us to skip doing many runtime comparisons and branches, and reduces the amount of code we need to emit significantly.

For the indirect call table, this stub also does the work of moving in the highCq address to the table when one is found.

* Make Jump Tables and Jit Cache dynmically resize

Reserve virtual memory, commit as needed.

* Move TailCallRemover to its own class.

* Multithreaded Translation (based on heuristic)

A poor one, at that. Need to get core count for a better one, which
means a lot of OS specific garbage.

* Better priority management for background threads.

* Bound core limit a bit more

Past a certain point the load is not paralellizable and starts stealing from the main thread. Likely due to GC, memory, heap allocation thread contention. Reduce by one core til optimisations come to improve the situation.

* Fix memory management on linux.

* Temporary solution to some sync problems.

This will make sure threads exit correctly, most of the time. There is a potential race where setting the sync counter to 0 does nothing (counter stays at what it was before, thread could take too long to exit), but we need to find a better way to do this anyways. Synchronization frequency has been tightened as we never enter blockwise segments of code. Essentially this means, check every x functions or loop iterations, before lowcq blocks existed and were worth just as much. Ideally it should be done in a better way, since functions can be anywhere from 1 to 5000 instructions. (maybe based on host timer, or an interrupt flag from a scheduler thread)

* Address feedback minus CompareAndSwap change.

* Use default ReservedRegion granularity.

* Merge CompareAndSwap with its V128 variant.

* We already got the source, no need to do it again.

* Make sure all background translation threads exit.

* Fix CompareAndSwap128

Detection criteria was a bit scuffed.

* Address Comments.
2020-03-12 14:20:55 +11:00
gdkchan
c26f3774bd
Implement VMULL, VMLSL, VRSHR, VQRSHRN, VQRSHRUN AArch32 instructions + other fixes (#977)
* Implement VMULL, VMLSL, VQRSHRN, VQRSHRUN AArch32 instructions plus other fixes

* Re-align opcode table

* Re-enable undefined, use subclasses to fix checks

* Add test and fix VRSHR instruction

* PR feedback
2020-03-11 11:49:27 +11:00
gdkchan
89ccec197e
Implement VMOVL and VORR.I32 AArch32 SIMD instructions (#960)
* Implement VMOVL and VORR.I32 AArch32 SIMD instructions

* Rename <dt> to <size> on test description

* Rename Widen to Long and improve VMOVL implementation a bit
2020-03-10 16:17:30 +11:00
Alex Barney
08c0e3829b
Insert the SD card by default (#968) 2020-03-10 09:34:35 +11:00
gdkchan
61d79facd1
Optimize x64 loads and stores using complex addressing modes (#972)
* Optimize x64 loads and stores using complex addressing modes

* This was meant to be used for testing
2020-03-10 09:29:34 +11:00
Xpl0itR
e2bb5e8091
Move status information from the title bar to the new status bar (#948)
* Move status information from the title bar to the new status bar

* jd's requested changes

* Ack's requested changes

* gdk's requested changes

* Remove frame time statistics
2020-03-07 13:40:06 +11:00
gdkchan
ab3b6ea6d4
A64 SIMD LDP and STP with size = 0b11 is undefined (#971) 2020-03-07 13:39:52 +11:00
jduncanator
54501962f6
Fix branch with CC and predicate, and a case of SYNC propagation (#967) 2020-03-06 11:09:49 +11:00
jduncanator
68e15c1a74
Implement Fast Paths for most A32 SIMD instructions (#952)
* Begin work on A32 SIMD Intrinsics

* More instructions, some cleanup.

* Intrinsics for Move instructions (zip etc)

These pass the existing tests.

* Intrinsics for some of Cvt

While doing this I noticed that the conversion for int/fp was incorrect
in the slow path. I'll fix this in the original repo.

* Intrinsics for more Arithmetic instructions.

* Intrinsics for Vext

* Fix VEXT Intrinsic for double words.

* Use InsertPs to move scalar values.

* Cleanup, fix VPADD.f32 and VMIN signed integer.

* Cleanup, add SSE2 support for scalar insert.

Works similarly to the IR scalar insert, but obviously this one works
directly on V128.

* Minor cleanup.

* Enable intrinsic for FP64 to integer conversion.

* Address feedback apart from splitting out intrinsic float abs

Also: bad VREV encodings as undefined rather than throwing in translation.

* Move float abs to helper, fix bug with cvt

* Rename opc2 & 3 to match A32 docs, use ArgumentOutOfRangeException appropriately.

* Get name of variable at compilation rather than string literal.

* Use correct double sign mask.
2020-03-05 11:41:33 +11:00
gdkchan
d9ed827696
Don't decode blocks starting outside mapped memory & undefined instead of throw on invalid sysreg coprocessor (#964)
* Don't decode blocks in invalid memory locations.

* Emit undefined instruction on invalid coprocessor

...rather than throwing.

* Call undefined instruction directly.
2020-03-04 16:25:27 -03:00
Ac_K
25c3b8b356
Implement some calls of ISelfController (#965)
* Implement some calls of ISelfController

This PR implement some calls of ISelfController:

- EnterFatalSection
- LeaveFatalSection
- GetAccumulatedSuspendedTickValue (close #937)

According to RE of the 8.1.0 am service.

* thread safe increment/decrement

* Fix thread safe

* remove unused using
2020-03-04 00:41:41 -03:00
Alex Barney
cecbd256a5
Add support for cache storage (#936)
* Update LibHac

* Run EnsureApplicationCacheStorage when launching a game

* Add new FS commands
2020-03-03 15:07:06 +01:00
gdkchan
dc97457bf0
Initial support for double precision shader instructions. (#963)
* Implement DADD, DFMA and DMUL shader instructions

* Rename FP to FP32

* Correct double immediate

* Classic mistake
2020-03-03 15:02:08 +01:00
emmauss
3045c1a186
update glwidget package (#961) 2020-03-03 13:49:18 +11:00
Thog
47f079d23e
stub GetNintendoAccountUserResourceCacheForApplication (#962) 2020-03-03 01:07:27 +11:00
Thog
3b531de670
Implement mii:u and mii:e entirely (#955)
* Implement mii:u and mii:e entirely

Co-authored-by: AcK77 <Acoustik666@gmail.com>

This commit implement the mii service accurately.

This is based on Ac_k work but was polished and updated to 7.x.

Please note that the following calls are partially implemented:

- Convert: Used to convert from old console format (Wii/Wii U/3ds)
- Import and Export: this is shouldn't be accesible in production mode.

* Remove some debug leftovers

* Make it possible to load an arbitrary mii database from a Switch

* Address gdk's comments

* Reduce visibility of all the Mii code

* Address Ac_K's comments

* Remove the StructLayout of DatabaseSessionMetadata

* Add a missing line return in DatabaseSessionMetadata

* Misc fixes and style changes

* Fix some issues from last commit

* Fix database server metadata UpdateCounter in MarkDirty (Thanks Moose for the catch)

* MountCounter should only be incremented when no error is reported

* Fix FixDatabase

Co-authored-by: Alex Barney <thealexbarney@gmail.com>
2020-03-02 09:56:01 +11:00
gdkchan
7d1a294eae
Implement SMULWB, SMULWT, SMLAWB, SMLAWT (AArch32) (#953)
* Implement SMULWB, SMULWT, SMLAWB, SMLAWT, and add tests for some multiply instructions

* Improve test descriptions

* Rename SMULH to SMUL__

Co-authored-by: jduncanator <1518948+jduncanator@users.noreply.github.com>
2020-03-01 18:47:05 +11:00
gdkchan
fb0939f9b6
Add SSAT, SSAT16, USAT and USAT16 ARM32 instructions (#954)
* Implement SMULWB, SMULWT, SMLAWB, SMLAWT, and add tests for some multiply instructions

* Improve test descriptions

* Rename SMULH to SMUL__

* Add SSAT, SSAT16, USAT and USAT16 ARM32 instructions

* Fix new tests

* Replace AND 0xFFFF with 16-bits zero extension (more efficient)
2020-03-01 07:51:55 +11:00