Commit graph

75 commits

Author SHA1 Message Date
mageven
ca5d8e58dd
Add progress reporting to PTC and Shader Cache (#2057)
* UI changes

* Add progress reporting to PTC & ShaderCache

* Account for null events and expand docs

Co-authored-by: Joshi234 <46032261+Joshi234@users.noreply.github.com>
2021-03-03 01:39:36 +01:00
LDj3SNuD
bcbf240d2e
PPTC: Fix unwanted propagation of a relocatable constant in a specific case. (#1990)
* Fix unwanted propagation of a relocatable constant in a specific case.

* Ptc.InternalVersion = 1990

* Nit to retrigger the Checks.
2021-02-23 13:15:45 +01:00
mageven
9bda7b4699
Implement VCNT instruction (#1963)
* Implement VCNT based on AArch64 CNT

Add tests

* Update PTC version

* Address LDj's comments

* Explicit size in encoding
* Tighter tests
* Replace SoftFallback with IR helper

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>

* Reduce one BitwiseAnd from IR fallback

Based on popcount64b from https://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation

* Rename parameter and add assert

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
2021-02-22 16:26:13 +01:00
LDj3SNuD
dc0adb533d
PPTC & Pool Enhancements. (#1968)
* PPTC & Pool Enhancements.

* Avoid buffer allocations in CodeGenContext.GetCode(). Avoid stream allocations in PTC.PtcInfo.

Refactoring/nits.

* Use XXHash128, for Ptc.Load & Ptc.Save, x10 faster than Md5.

* Why not a nice Span.

* Added a simple PtcFormatter library for deserialization/serialization, which does not require reflection, in use at PtcJumpTable and PtcProfiler; improves maintainability and simplicity/readability of affected code.

* Nits.

* Revert #1987.

* Revert "Revert #1987."

This reverts commit 998be765cf.
2021-02-22 03:23:48 +01:00
FICTURE7
1586880114
Turn Copy into Fill in HybridAllocator (#2010)
* Turn Copy into Fill in HybridAllocator

* Set PTC internal verison
2021-02-21 18:33:59 +01:00
gdkchan
9d82d27df2
Fix memory tracking performance regression (#2026)
* Fix memory tracking performance regression

* Set PTC version
2021-02-17 09:16:20 +11:00
gdkchan
715b605e95
Validate CPU virtual addresses on access (#1987)
* Enable PTE null checks again

* Do address validation on EmitPtPointerLoad, and make it branchless

* PTC version increment

* Mask of pointer tag for exclusive access

* Move mask to the correct place

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
2021-02-16 19:04:19 +01:00
sharmander
40797a1283
Optimization | Modify Add (Integer) Instruction to use LEA instead. (#1971)
* Optimization | Modify Add Instruction to use LEA instead.

Currently, the add instruction requires 4 registers to take place. By using LEA, we can effectively perform the same working using 3 registers, reducing memory spills and improving translation efficiency.

* Fix IsSameOperandDestSrc1 Check for Add

* Use LEA if Dest != SRC1

* Update IsSameOperandDestSrc1 to account for Cases where Dest and Src1 can be same for add

* Fix error in logic

* Typo

* Add paranthesis for clarity

* Compare registers as requested.

* Cleanup if statement, use same comparison method as generateCopy

* Make change as recommended by gdk

* Perform check only when Add calls are made

* use ensure sametype for lea, fix else

* Update comment

* Update version #
2021-02-08 10:49:46 +11:00
gdkchan
ee28ccebf4
Disable partial JIT invalidation on unmap (#1991) 2021-02-08 10:25:14 +11:00
gdkchan
dcce407071
Lower precision of estimate instruction results to match Arm behavior (#1943)
* Lower precision of estimate instruction results to match Arm behavior

* PTC version update

* Nits
2021-01-28 10:23:00 +11:00
mageven
c19cfca183
Implement PRFM (register variant) as NOP (#1956)
* Implement PRFM (register variant) as NOP

Fix typo pfrm -> prfm
Add comments to distinguish variants

* Increment PTC version
2021-01-26 16:09:27 +11:00
FICTURE7
ddf1105bcb
Add VCLZ.* fast path (#1917)
* Add VCLZ fast path

* Add VCLZ.8B/16B SSSE3 fast path
* Add VCLZ.4H/8H SSSE3 fast path
* Add VCLZ.2S/4S SSE2 fast path

* Improve CLZ.4H/8H fast path

* Improve CLZ.2S/4S fast path

* Set PPTC version
2021-01-25 10:01:25 +11:00
LDj3SNuD
68f6b79fd3
Add a simple Pools Limiter. (#1830)
* Added support for offline invalidation, via PPTC, of low cq translations replaced by high cq translations; both on a single run and between runs.

Added invalidation of .cache files in the event of reuse on a different user operating system.

Added .info and .cache files invalidation in case of a failed stream decompression.

Nits.

* InternalVersion = 1712;

* Nits.

* Address comment.

* Get rid of BinaryFormatter.

Nits.

* Move Ptc.LoadTranslations().

Nits.

* Nits.

* Fixed corner cases (in case backup copies have to be used). Added save logs.

* Not core fixes.

* Complement to the previous commit. Added load logs. Removed BinaryFormatter leftovers.

* Add LoadTranslations log.

* Nits.

* Removed the search and management of LowCq overlapping functions.

* Final increment of .info and .cache flags.

* Nit.

* Free up memory allocated by Pools during any PPTC translations at boot time.

* Nit due to rebase.

* Add a simple Pools Limiter.

* Nits.

* Fix missing JumpTable.RegisterFunction() due to rebase.

Clear MemoryStreams as soon as possible, when they are no longer needed.

* Code cleaning.

* Nit for retrigger Checks.

* Update Ptc.cs

* Contextual refactoring of Translator. Ignore resetting of pools for DirectCallStubs.

* Nit for retrigger Checks.
2021-01-12 19:04:02 +01:00
LDj3SNuD
430ba6da65
CPU (A64): Add Pmull_V Inst. with Clmul fast path for the "1/2D -> 1Q" variant & Sse fast path and slow path for both the "8/16B -> 8H" and "1/2D -> 1Q" variants; with Test. (#1817)
* Add Pmull_V Sse fast path only, both "8/16B -> 8H" and "1/2D -> 1Q" variants; with Test.

* Add Clmul fast path for the 128 bits variant.

* Small optimisation (save 60 instructions) for the Sse fast path about the 128 bits variant.

* Add slow path, both variants. Fix V128 Shl/Shr when shift = 0.

* A32: Add Vmull_I P64 variant (slow path); not tested.

* A32: Add Vmull_I_P8_P64 Test and fix P64 variant.
2021-01-04 23:45:54 +01:00
Ac_K
5b9c876155 Hotfix for #1814 2020-12-24 04:44:39 +01:00
LDj3SNuD
2502f1f07f
Free up memory allocated by Pools during any PPTC translations at boot time. (#1814)
* Added support for offline invalidation, via PPTC, of low cq translations replaced by high cq translations; both on a single run and between runs.

Added invalidation of .cache files in the event of reuse on a different user operating system.

Added .info and .cache files invalidation in case of a failed stream decompression.

Nits.

* InternalVersion = 1712;

* Nits.

* Address comment.

* Get rid of BinaryFormatter.

Nits.

* Move Ptc.LoadTranslations().

Nits.

* Nits.

* Fixed corner cases (in case backup copies have to be used). Added save logs.

* Not core fixes.

* Complement to the previous commit. Added load logs. Removed BinaryFormatter leftovers.

* Add LoadTranslations log.

* Nits.

* Removed the search and management of LowCq overlapping functions.

* Final increment of .info and .cache flags.

* Nit.

* Free up memory allocated by Pools during any PPTC translations at boot time.

* Nit due to rebase.
2020-12-24 03:58:36 +01:00
LDj3SNuD
8a33e884f8
Fix Vnmls_S fast path (F64: losing input d value). Fix Vnmla_S & Vnmls_S slow paths (using fused inst.s). Fix Vfma_V slow path not using StandardFPSCRValue(). (#1775)
* Fix Vnmls_S fast path (F64: losing input d value). Fix Vnmla_S & Vnmls_S slow paths (using fused inst.s).

Add Vfma_S & Vfms_S Fma fast paths.
Add Vfnma_S inst. with Fma/Sse fast paths and slow path.
Add Vfnms_S Sse fast path.

Add Tests for affected inst.s.

Nits.

* InternalVersion = 1775

* Nits.

* Fix Vfma_V slow path not using StandardFPSCRValue().

* Nit: Fix Vfma_V order.

* Add Vfms_V Sse fast path and slow path.

* Add Vfma_V and Vfms_V Test.
2020-12-17 20:43:41 +01:00
LDj3SNuD
b5c215111d
PPTC Follow-up. (#1712)
* Added support for offline invalidation, via PPTC, of low cq translations replaced by high cq translations; both on a single run and between runs.

Added invalidation of .cache files in the event of reuse on a different user operating system.

Added .info and .cache files invalidation in case of a failed stream decompression.

Nits.

* InternalVersion = 1712;

* Nits.

* Address comment.

* Get rid of BinaryFormatter.

Nits.

* Move Ptc.LoadTranslations().

Nits.

* Nits.

* Fixed corner cases (in case backup copies have to be used). Added save logs.

* Not core fixes.

* Complement to the previous commit. Added load logs. Removed BinaryFormatter leftovers.

* Add LoadTranslations log.

* Nits.

* Removed the search and management of LowCq overlapping functions.

* Final increment of .info and .cache flags.

* Nit.

* GetIndirectFunctionAddress(): Validate that writing actually takes place in dynamic table memory range (and not elsewhere).

* Fix Ptc.UpdateInfo() due to rebase.

* Nit for retrigger Checks.

* Nit for retrigger Checks.
2020-12-17 20:32:09 +01:00
gdkchan
61634dd415
Clear JIT cache on exit (#1518)
* Initial cache memory allocator implementation

* Get rid of CallFlag

* Perform cache cleanup on exit

* Basic cache invalidation

* Thats not how conditionals works in C# it seems

* Set PTC version to PR number

* Address PR feedback

* Update InstEmitFlowHelper.cs

* Flag clear on address is no longer needed

* Do not include exit block in function size calculation

* Dispose jump table

* For future use

* InternalVersion = 1519 (force retest).

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
2020-12-16 17:07:42 -03:00
gdkchan
c8bb3cc50e
Fix register read after write on STREX implementation (#1801)
* Fix register read after write on STREX implementation

* PTC version update
2020-12-13 12:19:38 -03:00
sharmander
36f6bbf5b9
CPU: Implement VFNMA.F32 | F.64 (#1783)
* Implement VFNMA.F<32/64>

* Update PTC Version

* Update Implementation & Renames & Correct Order

* Fix alignment

* Update implementation to not trigger assert

* Actually use the intrinsic that makes sense :)
2020-12-07 21:04:01 -03:00
LDj3SNuD
567ea726e1
Add support for guest Fz (Fpcr) mode through host Ftz and Daz (Mxcsr) modes (fast paths). (#1630)
* Add support for guest Fz (Fpcr) mode through host Ftz and Daz (Mxcsr) modes (fast paths).

* Ptc.InternalVersion = 1630

* Nits.

* Address comments.

* Update Ptc.cs

* Address comment.
2020-12-07 10:37:07 +01:00
sharmander
b479a43939
CPU: Implement VFNMS.F32/64 (#1758)
* Add necessary methods / op-code

* Enable Support for FMA Instruction Set

* Add Intrinsics / Assembly Opcodes for VFMSUB231XX.

* Add X86 Instructions for VFMSUB231XX

* Implement VFNMS

* Implement VFNMS Tests

* Add special cases for FMA instructions.

* Update PPTC Version

* Remove unused Op

* Move Check into Assert / Cleanup

* Rename and cleanup

* Whitespace

* Whitespace / Rename

* Re-sort

* Address final requests

* Implement VFMA.F64

* Simplify switch

* Simplify FMA Instructions into their own IntrinsicType.

* Remove whitespace

* Fix indentation

* Change tests for Vfnms -- disable inf / nan

* Move args up, not description ;)

* Undo vfma

* Completely remove vfms code.,

* Fix order of instruction in assembler
2020-12-03 20:20:02 +01:00
riperiperi
9852cb9c9e
Use backup when PTC compression is corrupt (#1704)
* Use backup when PTC compression is corrupt

* Apply suggestions from code review

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
2020-11-20 02:51:59 +01:00
LDj3SNuD
0679084f11
CPU (A64): Add FP16/FP32 fast paths (F16C Intrinsics) for Fcvt_S, Fcvtl_V & Fcvtn_V Instructions. Now HardwareCapabilities uses CpuId. (#1650)
* net5.0

* CPU (A64): Add FP16/FP32 fast paths (F16C Intrinsics) for Fcvt_S, Fcvtl_V & Fcvtn_V Instructions. Switch to .NET 5.0.

Nits.

Tests performed successfully in both debug and release mode (for all instructions involved).

* Address comment.

* Update appveyor.yml

* Revert "Update appveyor.yml"

This reverts commit 27cdd59e8b.

* Remove Assembler CpuId.

* Update appveyor.yml

* Address comment.
2020-11-18 19:35:54 +01:00
Mary
863edae328
shader cache: Fix Linux boot issues (#1709)
* shader cache: Fix Linux boot issues

This rollback the init logic back to previous state, and replicate the
way PTC handle initialization.

* shader cache: set default state of ready for translation event to false

* Fix cpu unit tests
2020-11-17 22:40:19 +01:00
riperiperi
b4d8d893a4
Memory Read/Write Tracking using Region Handles (#1272)
* WIP Range Tracking

- Texture invalidation seems to have large problems
- Buffer/Pool invalidation may have problems
- Mirror memory tracking puts an additional `add` in compiled code, we likely just want to make HLE access slower if this is the final solution.
- Native project is in the messiest possible location.
- [HACK] JIT memory access always uses native "fast" path
- [HACK] Trying some things with texture invalidation and views.

It works :)

Still a few hacks, messy things, slow things

More work in progress stuff (also move to memory project)

Quite a bit faster now.
- Unmapping GPU VA and CPU VA will now correctly update write tracking regions, and invalidate textures for the former.
- The Virtual range list is now non-overlapping like the physical one.
- Fixed some bugs where regions could leak.
- Introduced a weird bug that I still need to track down (consistent invalid buffer in MK8 ribbon road)

Move some stuff.

I think we'll eventually just put the dll and so for this in a nuget package.

Fix rebase.

[WIP] MultiRegionHandle variable size ranges

- Avoid reprotecting regions that change often (needs some tweaking)
- There's still a bug in buffers, somehow.
- Might want different api for minimum granularity

Fix rebase issue

Commit everything needed for software only tracking.

Remove native components.

Remove more native stuff.

Cleanup

Use a separate window for the background context, update opentk. (fixes linux)

Some experimental changes

Should get things working up to scratch - still need to try some things with flush/modification and res scale.

Include address with the region action.

Initial work to make range tracking work

Still a ton of bugs

Fix some issues with the new stuff.

* Fix texture flush instability

There's still some weird behaviour, but it's much improved without this. (textures with cpu modified data were flushing over it)

* Find the destination texture for Buffer->Texture full copy

Greatly improves performance for nvdec videos (with range tracking)

* Further improve texture tracking

* Disable Memory Tracking for view parents

This is a temporary approach to better match behaviour on master (where invalidations would be soaked up by views, rather than trigger twice)

The assumption is that when views are created to a texture, they will cover all of its data anyways. Of course, this can easily be improved in future.

* Introduce some tracking tests.

WIP

* Complete base tests.

* Add more tests for multiregion, fix existing test.

* Cleanup Part 1

* Remove unnecessary code from memory tracking

* Fix some inconsistencies with 3D texture rule.

* Add dispose tests.

* Use a background thread for the background context.

Rather than setting and unsetting a context as current, doing the work on a dedicated thread with signals seems to be a bit faster.

Also nerf the multithreading test a bit.

* Copy to texture with matching alignment

This extends the copy to work for some videos with unusual size, such as tutorial videos in SMO. It will only occur if the destination texture already exists at XCount size.

* Track reads for buffer copies. Synchronize new buffers before copying overlaps.

* Remove old texture flushing mechanisms.

Range tracking all the way, baby.

* Wake the background thread when disposing.

Avoids a deadlock when games are closed.

* Address Feedback 1

* Separate TextureCopy instance for background thread

Also `BackgroundContextWorker.InBackground` for a more sensible idenfifier for if we're in a background thread.

* Add missing XML docs.

* Address Feedback

* Maybe I should start drinking coffee.

* Some more feedback.

* Remove flush warning, Refocus window after making background context
2020-10-16 17:18:35 -03:00
LDj3SNuD
04e330cc00
Add Umaal & Vabd_I, Vabdl_I, Vaddl_I, Vhadd, Vqshrn, Vshll inst.s (slow paths). (#1577)
* Add Umaal & Vabd_I, Vabdl_I, Vaddl_I, Vhadd, Vqshrn, Vshll inst.s (slow paths).

No test provided (i.e. draft).

* Ptc InternalVersion = 1577
2020-10-13 22:41:33 +02:00
gdkchan
f2b12c9749
Remove old, unused CPU optimization (#1586) 2020-09-30 16:16:34 -03:00
gdkchan
6c9565693f
IPC refactor part 1: Use explicit separate threads to process requests (#1447)
* Changes to allow explicit management of service threads

* Remove now unused code

* Remove ThreadCounter, its no longer needed

* Allow and use separate server per service, also fix exit issues

* New policy change: PTC version now uses PR number
2020-09-22 14:50:40 +10:00
LDj3SNuD
66b799a6e4
Fix host stack overflow caused by some recursive guest methods. (#1528)
* Fix host stack overflow caused by some recursive guest methods.

* PPTC flag up.

* Address comments.

Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2020-09-19 20:16:30 -03:00
FICTURE7
f60033e0aa
Implement block placement (#1549)
* Implement block placement

Implement a simple pass which re-orders cold blocks at the end of the
list of blocks in the CFG.

* Set PPTC version

* Use Array.Resize

Address gdkchan's feedback
2020-09-19 20:00:24 -03:00
FICTURE7
36ec1bc6c0
Relax block ordering constraints (#1535)
* Relax block ordering constraints

Before `block.Next` had to follow `block.ListNext`, now it does not.
Instead `CodeGenerator` will now emit the necessary jump instructions
to ensure control flow.

This makes control flow and block order modifications easier. It also
eliminates some simple cases of redundant branches.

* Set PPTC version
2020-09-12 12:32:53 -03:00
FICTURE7
4c7bebf3e6
Do not emit StoreToContext before Return (#1537)
* Do not emit StoreToContext before Return

* Set PPTC version
2020-09-07 12:52:17 +10:00
gdkchan
6cc187da59
SIMD&FP load/store with scale > 4 should be undefined (#1522)
* SIMD&FP load/store with scale > 4 should be undefined

* Catch more invalid encodings for FP&SIMD LDR/STR (reg variant)

* Set PTC version to PR number
2020-09-01 17:02:23 -03:00
FICTURE7
92f7f163ef
Improve static branch prediction along fast path for memory accesses (#1484)
* Improve static branch prediction along fast path for memory accesses

* Set PPTC interval version
2020-08-31 20:55:15 -03:00
mageven
b9398f1f3a
Allow launching with custom data directories (#1505)
* Allow launching with custom data directories

Don't load alternate keys when using custom directory

* Address gdkchan's comments

* Misc fixes to log levels

Added more enabled log levels by default

Moved successful config updation to Notice as
1. It's not a warning
2. Warnings could've been disabled by the config load and hence message
   would be lost
2020-08-30 18:51:53 +02:00
LDj3SNuD
6938988427
Fix Vcvt_FI & Vcvt_RM; Add Vfma_S & Vfms_S. Add Tests. (#1471)
* Fix Vcvt_FI & Vcvt_RM; Add Vfma_S & Vfms_S. Add Tests.

* Address PR feedback & Nit.
2020-08-13 02:34:02 -03:00
mageven
9c6a3aacc2
Fix PTC version increment from #1433 (#1462) 2020-08-09 05:32:00 +02:00
LDj3SNuD
e36e97c64d
CPU: This PR fixes Fpscr, among other things. (#1433)
* CPU: This PR fixes Fpscr, among other things.

* Add Fpscr.Qc = 1 if sat. for Vqrshrn & Vqrshrun.

* Fix Vcmp & Vcmpe opcode table.

* Revert "Fix Vcmp & Vcmpe opcode table."

This reverts commit c117d9410d.

* Address PR feedbacks.
2020-08-08 17:18:51 +02:00
Ficture Seven
ee22517d92
Improve branch operations (#1442)
* Add Compare instruction

* Add BranchIf instruction

* Use test when BranchIf & Compare against 0

* Propagate Compare into BranchIfTrue/False use

- Propagate Compare operations into their BranchIfTrue/False use and
  turn these into a BranchIf.

- Clean up Comparison enum.

* Replace BranchIfTrue/False with BranchIf

* Use BranchIf in EmitPtPointerLoad

- Using BranchIf early instead of BranchIfTrue/False improves LCQ and
  reduces the amount of work needed by the Optimizer.

  EmitPtPointerLoader was a/the big producer of BranchIfTrue/False.

- Fix asserts firing when assembling BitwiseAnd because of type
  mismatch in EmitStoreExclusive. This is harmless and should not
  cause any diffs.

* Increment PPTC interval version

* Improve IRDumper for BranchIf & Compare

* Use BranchIf in EmitNativeCall

* Clean up

* Do not emit test when immediately preceded by and
2020-08-05 08:52:33 +10:00
mageven
a33dc2f491
Improved Logger (#1292)
* Logger class changes only

Now compile-time checking is possible with the help of Nullable Value
types.

* Misc formatting

* Manual optimizations

PrintGuestLog
PrintGuestStackTrace
Surfaceflinger DequeueBuffer

* Reduce SendVibrationXX log level to Debug

* Add Notice log level

This level is always enabled and used to print system info, etc...
Also, rewrite LogColor to switch expression as colors are static

* Unify unhandled exception event handlers

* Print enabled LogLevels during init

* Re-add App Exit disposes in proper order

nit: switch case spacing

* Revert PrintGuestStackTrace to Info logs due to #1407

PrintGuestStackTrace is now called in some critical error handlers
so revert to old behavior as KThread isn't part of Guest.

* Batch replace Logger statements
2020-08-04 01:32:53 +02:00
gdkchan
9878fc2d3c
Implement inline memory load/store exclusive and ordered (#1413)
* Implement inline memory load/store exclusive

* Fix missing REX prefix on 8-bits CMPXCHG

* Increment PTC version due to bugfix

* Remove redundant memory checks

* Address PR feedback

* Increment PPTC version
2020-07-30 11:29:28 -03:00
Ficture Seven
b3c051bbec
Use movd,movq for i32/64 VectorExtract %x, 0x0 (#1439)
* Use movd,movq for i32/64 VectorExtract %x, 0x0

* Increment PPTC interval version

* Use else-if instead

- Address gdkchan's feedback.
- Clean up Debug.Assert calls

* Inline `count` expression into Debug.Assert

Apparently the CoreCLR JIT will not eliminate this. :(
2020-07-30 15:52:26 +10:00
gdkchan
2678bf0010
PPTC version increment (#1427) 2020-07-25 20:01:57 +02:00
Valentin PONS
3af2ce74ec
Implements some 32-bit instructions (VBIC, VTST, VSRA) (#1192)
* Added some 32 bits instructions:

* VBIC
* VTST
* VSRA

* Incremented the PTC

* Add tests and fix implementation

* Fixed VBIC immediate opcode mapping

* Hey hey!

* Nit.

Co-authored-by: gdkchan <gab.dark.100@gmail.com>
Co-authored-by: LDj3SNuD <dvitiello@gmail.com>
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
2020-07-19 15:11:58 -03:00
LDj3SNuD
56a61a5758
CPU: A32: Fix Vabs_V & Vneg_V (S8, S16, S32 & F32); add Tests. (#1394)
* Fix Vabs_V & Vneg_V (S8, S16, S32 & F32); add Tests.

* Update Ptc.cs
2020-07-17 10:57:49 -03:00
LDj3SNuD
88619d71b8
CPU: A32: Add Vadd & Vsub Wide (S/U_8/16/32) Inst.s with Test. (#1390) 2020-07-17 14:21:40 +10:00
LDj3SNuD
a804db6eed
Add Fmax/minv_V & S/Ushl_S Inst.s with Tests. Fix Maxps/d & Minps/d d… (#1335)
* Add Fmax/minv_V & S/Ushl_S Inst.s with Tests. Fix Maxps/d & Minps/d double zero sign handling. Allows better handling of NaNs.

* Optimized EmitSse2VectorIsNaNOpF() for multiple uses per opF.
2020-07-13 21:08:47 +10:00
riperiperi
d7044b10a2
Add SSE4.2 Path for CRC32, add A32 variant, add tests for non-castagnoli variants. (#1328)
* Add CRC32 A32 instructions.

* Fix CRC32 instructions.

* Add CRC intrinsic and fast path.

Loop is currently unrolled, will look into adding temp vars after tests are added.

* Begin work on Crc tests

* Fix SSE4.2 path for CRC32C, finialize tests.

* Remove unused IR path.

* Fix spacing between prefix checks.

* This should be Src.

* PTC Version

* OpCodeTable Order

* Integer check improvement. Value and Crc can be either 32 or 64 size.

* This wasn't necessary...

* If size is 3, value type must be I64.

* Fix same src+dest handling for non crc intrinsics.

* Pre-fix (ha) issue with vex encodings
2020-07-13 20:48:14 +10:00