Commit graph

74 commits

Author SHA1 Message Date
gdkchan
729ff5337c
Fix increment on Arm32 NEON VLDn/VSTn instructions with regs > 1 (#3695)
* Fix increment on Arm32 NEON VLDn/VSTn instructions with regs > 1

* PPTC version bump

* PR feedback
2022-09-13 08:24:09 +02:00
gdkchan
4d69286a9c
Implement VRINT (vector) Arm32 NEON instructions (#3691) 2022-09-11 15:44:27 +00:00
merry
1529e6cf0d
T32: Add Vfp instructions (#3690) 2022-09-10 23:03:14 -03:00
gdkchan
f468db7602
Implement Thumb (32-bit) memory (ordered), multiply, extension and bitfield instructions (#3687)
* Implement Thumb (32-bit) memory (ordered), multiply and bitfield instructions

* Remove public from interface

* Fix T32 BL immediate and implement signed and unsigned extend instructions
2022-09-10 22:51:00 -03:00
gdkchan
c64524a240
Add ADD (zx imm12), NOP, MOV (rs), LDA, TBB, TBH, MOV (zx imm16) and CLZ thumb instructions (#3683)
* Add ADD (zx imm12), NOP, MOV (register shifted), LDA, TBB, TBH, MOV (zx imm16) and CLZ thumb instructions, fix LDRD, STRD, CBZ, CBNZ and BLX (reg)

* Bump PPTC version
2022-09-09 22:09:11 -03:00
gdkchan
db45688aa8
Implement VRSRA, VRSHRN, VQSHRUN, VQMOVN, VQMOVUN, VQADD, VQSUB, VRHADD, VPADDL, VSUBL, VQDMULH and VMLAL Arm32 NEON instructions (#3677)
* Implement VRSRA, VRSHRN, VQSHRUN, VQMOVN, VQMOVUN, VQADD, VQSUB, VRHADD, VPADDL, VSUBL, VQDMULH and VMLAL Arm32 NEON instructions

* PPTC version

* Fix VQADD/VQSUB

* Improve MRC/MCR handling and exception messages

In case data is being recompiled as code, we don't want to throw at emit stage, instead we should only throw if it actually tries to execute
2022-09-09 21:47:38 -03:00
gdkchan
eba682b767
Implement some 32-bit Thumb instructions (#3614)
* Implement some 32-bit Thumb instructions

* Optimize OpCode32MemMult using PopCount
2022-08-25 09:59:34 +00:00
Nicholas Rodine
951700fdd8
Removed unused usings. (#3593)
* Removed unused usings.

* Added back using, now that it's used.

* Removed extra whitespace.
2022-08-18 18:04:54 +02:00
gdkchan
2bb9b33da1
Implement Arm32 Sha256 and MRS Rd, CPSR instructions (#3544)
* Implement Arm32 Sha256 and MRS Rd, CPSR instructions

* Add tests using Arm64 outputs
2022-08-05 19:03:50 +02:00
merry
6a1a03566a
T32: Implement load/store single (immediate) (#3186)
* T32: Implement load/store single (immediate)

* tests

* tidy formatting

* address comments
2022-04-21 01:25:43 +02:00
merry
7af9fcbc06
T32: Implement Data Processing (Modified Immediate) instructions (#3178)
* T32: Implement Data Processing (Modified Immediate) instructions

* Update tests

* switch -> lookup table
2022-03-06 22:25:01 +01:00
merry
747081d2c7
Decoders: Fix instruction lengths for 16-bit B instructions (#3177) 2022-03-05 16:20:24 +01:00
merry
497199bb50
Decoder: Exit on trapping instructions, and resume execution at trapping instruction (#3153)
* Decoder: Exit on trapping instructions, and resume execution at trapping instruction

* Resume at trapping address

* remove mustExit
2022-03-04 23:16:58 +01:00
merry
bd9ac0fdaa
T32: Implement B, B.cond, BL, BLX (#3155)
* Decoders: Make IsThumb a function of OpCode32

* OpCode32: Fix GetPc

* T32: Implement B, B.cond, BL, BLX

* rm usings
2022-03-04 23:05:08 +01:00
merry
7b35ebc64a
T32: Implement ALU (shifted register) instructions (#3135)
* T32: Implement ADC, ADD, AND, BIC, CMN, CMP, EOR, MOV, MVN, ORN, ORR, RSB, SBC, SUB, TEQ, TST (shifted register)

* OpCodeTable: Sort T32 list

* Tests: Rename RandomTestCase to PrecomputedThumbTestCase

* T32: Tests for AluRsImm instructions

* fix nit

* fix nit 2
2022-02-22 19:11:28 -03:00
merry
dc063eac83
ARMeilleure: Implement single stepping (#3133)
* Decoder: Implement SingleInstruction decoder mode

* Translator: Implement Step

* DecoderMode: Rename Normal to MultipleBlocks
2022-02-22 11:11:42 -03:00
merry
747876dc67
Decoders: Add IOpCode32HasSetFlags (#3136) 2022-02-18 01:33:43 +01:00
merry
98e05ee4b7
ARMeilleure: Thumb support (All T16 instructions) (#3105)
* Decoders: Add InITBlock argument

* OpCodeTable: Minor cleanup

* OpCodeTable: Remove existing thumb instruction implementations

* OpCodeTable: Prepare for thumb instructions

* OpCodeTables: Improve thumb fast lookup

* Tests: Prepare for thumb tests

* T16: Implement BX

* T16: Implement LSL/LSR/ASR (imm)

* T16: Implement ADDS, SUBS (reg)

* T16: Implement ADDS, SUBS (3-bit immediate)

* T16: Implement MOVS, CMP, ADDS, SUBS (8-bit immediate)

* T16: Implement ANDS, EORS, LSLS, LSRS, ASRS, ADCS, SBCS, RORS, TST, NEGS, CMP, CMN, ORRS, MULS, BICS, MVNS (low registers)

* T16: Implement ADD, CMP, MOV (high reg)

* T16: Implement BLX (reg)

* T16: Implement LDR (literal)

* T16: Implement {LDR,STR}{,H,B,SB,SH} (register)

* T16: Implement {LDR,STR}{,B,H} (immediate)

* T16: Implement LDR/STR (SP)

* T16: Implement ADR

* T16: Implement Add to SP (immediate)

* T16: Implement ADD/SUB (SP)

* T16: Implement SXTH, SXTB, UXTH, UTXB

* T16: Implement CBZ, CBNZ

* T16: Implement PUSH, POP

* T16: Implement REV, REV16, REVSH

* T16: Implement NOP

* T16: Implement LDM, STM

* T16: Implement SVC

* T16: Implement B (conditional)

* T16: Implement B (unconditional)

* T16: Implement IT

* fixup! T16: Implement ADD/SUB (SP)

* fixup! T16: Implement Add to SP (immediate)

* fixup! T16: Implement IT

* CpuTestThumb: Add randomized tests

* Remove inITBlock argument

* Address nits

* Use index to handle IfThenBlockState

* Reduce line noise

* fixup

* nit
2022-02-17 19:39:45 -03:00
merry
86b37d0ff7
ARMeilleure: A32: Implement SHSUB8 and UHSUB8 (#3089)
* ARMeilleure: A32: Implement UHSUB8

* ARMeilleure: A32: Implement SHSUB8
2022-02-08 10:46:42 +01:00
merry
88d3ffb97c
ARMeilleure: A32: Implement SHADD8 (#3086) 2022-02-06 12:25:45 -03:00
merry
222b1ad7da
ARMeilleure: OpCodeTable: Add CMN (RsReg) (#3087) 2022-02-06 02:01:05 +01:00
sharmander
60f7cba30a
Implement FCVTNS (Scalar GP) (#2953)
* Implement FCVTNS (Scalar GP)

* Update Ptc Version
2022-01-19 22:21:44 -03:00
sharmander
e5f7ff1eee
CPU - Implement FCVTMS (Vector) (#2937)
* Add FCVTMS_V Implementation to Armeilleure

* Fix opcode designation

* Add tests

* Amend Ptc version

* Fix OpCode / Tests

* Create Math.Floor helper method + Update implementation

* Address gdk comments

* Re-address gdk comments

* Update ARMeilleure/Decoders/OpCodeTable.cs

Co-authored-by: gdkchan <gab.dark.100@gmail.com>

* Update Tests to use 2S (4S) and 2D

Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2022-01-04 16:45:28 -03:00
gdkchan
e24949ca2c
Implement CSDB instruction (#2927) 2021-12-19 11:19:05 -03:00
Piyachet Kanda
3e2f89b4fd
Implement UHADD8 instruction (#2908)
* Implement UHADD8 instruction along with a test unit

* Update PTC revision number
2021-12-08 17:05:59 -03:00
Mary
501c3d5cea
Implement MSR instruction for A32 (#2585)
* Implement MSR instruction

Fix #1342.

Now Pocket Rumble is playable.

* Address gdkchan's comments

* Address gdkchan's comments

* Address gdkchan's comment
2021-08-27 00:07:44 +02:00
gdkchan
ab9d4b862d
Implement VORN (register) Arm32 instruction (#2396) 2021-06-23 23:21:23 +02:00
FICTURE7
89791ba68d
Add inlined on translation call counting (#2190)
* Add EntryTable<TEntry>

* Add on translation call counting

* Add Counter

* Add PPTC support

* Make Counter a generic & use a 32-bit counter instead

* Return false on overflow

* Set PPTC version

* Print more information about the rejit queue

* Make Counter<T> disposable

* Remove Block.TailCall since it is not used anymore

* Apply suggestions from code review

Address gdkchan's feedback

Co-authored-by: gdkchan <gab.dark.100@gmail.com>

* Fix more stale docs

* Remove rejit requests queue logging

* Make Counter<T> finalizable

Most certainly quite an odd use case.

* Make EntryTable<T>.TryAllocate set entry to default

* Re-trigger CI

* Dispose Counters before they hit the finalizer queue

* Re-trigger CI

Just for good measure...

* Make EntryTable<T> expandable

* EntryTable is now expandable instead of being a fixed slab.
* Remove EntryTable<T>.TryAllocate
* Remove Counter<T>.TryCreate

Address LDj3SNuD's feedback

* Apply suggestions from code review

Address LDj3SNuD's feedback

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>

* Remove useless return

* POH approach, but the sequel

* Revert "POH approach, but the sequel"

This reverts commit 5f5abaa247.

The sequel got shelved

* Add extra documentation

Co-authored-by: gdkchan <gab.dark.100@gmail.com>
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
2021-04-18 23:43:53 +02:00
LDj3SNuD
4bd1ad16f9
Add Sqdmulh_Ve & Sqrdmulh_Ve Inst.s with Tests. (#2139) 2021-03-25 23:33:32 +01:00
mageven
9bda7b4699
Implement VCNT instruction (#1963)
* Implement VCNT based on AArch64 CNT

Add tests

* Update PTC version

* Address LDj's comments

* Explicit size in encoding
* Tighter tests
* Replace SoftFallback with IR helper

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>

* Reduce one BitwiseAnd from IR fallback

Based on popcount64b from https://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation

* Rename parameter and add assert

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
2021-02-22 16:26:13 +01:00
mageven
c19cfca183
Implement PRFM (register variant) as NOP (#1956)
* Implement PRFM (register variant) as NOP

Fix typo pfrm -> prfm
Add comments to distinguish variants

* Increment PTC version
2021-01-26 16:09:27 +11:00
LDj3SNuD
c3e0c41da3
CPU (A64): Add Fmaxnmp & Fminnmp Scalar Inst.s, Fast & Slow Paths; with Tests. (#1894) 2021-01-20 09:12:33 +11:00
LDj3SNuD
430ba6da65
CPU (A64): Add Pmull_V Inst. with Clmul fast path for the "1/2D -> 1Q" variant & Sse fast path and slow path for both the "8/16B -> 8H" and "1/2D -> 1Q" variants; with Test. (#1817)
* Add Pmull_V Sse fast path only, both "8/16B -> 8H" and "1/2D -> 1Q" variants; with Test.

* Add Clmul fast path for the 128 bits variant.

* Small optimisation (save 60 instructions) for the Sse fast path about the 128 bits variant.

* Add slow path, both variants. Fix V128 Shl/Shr when shift = 0.

* A32: Add Vmull_I P64 variant (slow path); not tested.

* A32: Add Vmull_I_P8_P64 Test and fix P64 variant.
2021-01-04 23:45:54 +01:00
LDj3SNuD
8a33e884f8
Fix Vnmls_S fast path (F64: losing input d value). Fix Vnmla_S & Vnmls_S slow paths (using fused inst.s). Fix Vfma_V slow path not using StandardFPSCRValue(). (#1775)
* Fix Vnmls_S fast path (F64: losing input d value). Fix Vnmla_S & Vnmls_S slow paths (using fused inst.s).

Add Vfma_S & Vfms_S Fma fast paths.
Add Vfnma_S inst. with Fma/Sse fast paths and slow path.
Add Vfnms_S Sse fast path.

Add Tests for affected inst.s.

Nits.

* InternalVersion = 1775

* Nits.

* Fix Vfma_V slow path not using StandardFPSCRValue().

* Nit: Fix Vfma_V order.

* Add Vfms_V Sse fast path and slow path.

* Add Vfma_V and Vfms_V Test.
2020-12-17 20:43:41 +01:00
LDj3SNuD
b5c215111d
PPTC Follow-up. (#1712)
* Added support for offline invalidation, via PPTC, of low cq translations replaced by high cq translations; both on a single run and between runs.

Added invalidation of .cache files in the event of reuse on a different user operating system.

Added .info and .cache files invalidation in case of a failed stream decompression.

Nits.

* InternalVersion = 1712;

* Nits.

* Address comment.

* Get rid of BinaryFormatter.

Nits.

* Move Ptc.LoadTranslations().

Nits.

* Nits.

* Fixed corner cases (in case backup copies have to be used). Added save logs.

* Not core fixes.

* Complement to the previous commit. Added load logs. Removed BinaryFormatter leftovers.

* Add LoadTranslations log.

* Nits.

* Removed the search and management of LowCq overlapping functions.

* Final increment of .info and .cache flags.

* Nit.

* GetIndirectFunctionAddress(): Validate that writing actually takes place in dynamic table memory range (and not elsewhere).

* Fix Ptc.UpdateInfo() due to rebase.

* Nit for retrigger Checks.

* Nit for retrigger Checks.
2020-12-17 20:32:09 +01:00
sharmander
e901b7850c
CPU: Implement VRINTX.F32 | VRINTX.F64 (#1776)
* Start implementation

* Draft

* Updated opcode.

Needs verification.

* Clean up code.

* Update implementation and tests.

* Update implemenation + tests

* Get RM from FPSCR + Do not use emit/addintrinsic

* Remove "fast" path, as recommended by gdk.

* Variable DELETED.

* Update ARMeilleure/Decoders/OpCodeTable.cs

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>

* Update ARMeilleure/Instructions/InstEmitSimdCvt32.cs

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>

* Update ARMeilleure/Instructions/InstEmitSimdCvt32.cs

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>

* Update ARMeilleure/Instructions/InstEmitSimdCvt32.cs

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>

* Move method

* stringing things together :)

Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
2020-12-16 20:27:15 -03:00
sharmander
3332b29f01
CPU: Implement VFMA (Vector) (#1762)
* Implement VFMA.F64

* Simplify switch

* Simplify FMA Instructions into their own IntrinsicType.

* Remove whitespace

* Fix indentation

* Change tests for Vfnms -- disable inf / nan

* Move args up, not description ;)

* Implementation Complete.

All Tests Pass (Slow / Fast Path)

* Move location of function in assembler + test updates.

* Shift params upwards

* Remove unused function

* Update PTC version.

* Add comments / re-oreder opcode table.

* Remove whitespace

* Fix nit

* Fix nit.

* Fix whitespace

* Wrong opcode was used by a bad merge.

* Addressed rip's comments.
2020-12-15 00:01:52 -03:00
sharmander
36f6bbf5b9
CPU: Implement VFNMA.F32 | F.64 (#1783)
* Implement VFNMA.F<32/64>

* Update PTC Version

* Update Implementation & Renames & Correct Order

* Fix alignment

* Update implementation to not trigger assert

* Actually use the intrinsic that makes sense :)
2020-12-07 21:04:01 -03:00
sharmander
b479a43939
CPU: Implement VFNMS.F32/64 (#1758)
* Add necessary methods / op-code

* Enable Support for FMA Instruction Set

* Add Intrinsics / Assembly Opcodes for VFMSUB231XX.

* Add X86 Instructions for VFMSUB231XX

* Implement VFNMS

* Implement VFNMS Tests

* Add special cases for FMA instructions.

* Update PPTC Version

* Remove unused Op

* Move Check into Assert / Cleanup

* Rename and cleanup

* Whitespace

* Whitespace / Rename

* Re-sort

* Address final requests

* Implement VFMA.F64

* Simplify switch

* Simplify FMA Instructions into their own IntrinsicType.

* Remove whitespace

* Fix indentation

* Change tests for Vfnms -- disable inf / nan

* Move args up, not description ;)

* Undo vfma

* Completely remove vfms code.,

* Fix order of instruction in assembler
2020-12-03 20:20:02 +01:00
gdkchan
2f16491712
Get rid of Reflection.Emit dependency on CPU and Shader projects (#1626)
* Get rid of Reflection.Emit dependency on CPU and Shader projects

* Remove useless private sets

* Missed those due to the alignment
2020-10-21 09:13:44 -03:00
LDj3SNuD
04e330cc00
Add Umaal & Vabd_I, Vabdl_I, Vaddl_I, Vhadd, Vqshrn, Vshll inst.s (slow paths). (#1577)
* Add Umaal & Vabd_I, Vabdl_I, Vaddl_I, Vhadd, Vqshrn, Vshll inst.s (slow paths).

No test provided (i.e. draft).

* Ptc InternalVersion = 1577
2020-10-13 22:41:33 +02:00
gdkchan
6cc187da59
SIMD&FP load/store with scale > 4 should be undefined (#1522)
* SIMD&FP load/store with scale > 4 should be undefined

* Catch more invalid encodings for FP&SIMD LDR/STR (reg variant)

* Set PTC version to PR number
2020-09-01 17:02:23 -03:00
LDj3SNuD
2cb8bd7006
CPU (A64): Add Scvtf_S_Fixed & Ucvtf_S_Fixed with Tests. (#1492) 2020-08-31 20:48:21 -03:00
LDj3SNuD
6938988427
Fix Vcvt_FI & Vcvt_RM; Add Vfma_S & Vfms_S. Add Tests. (#1471)
* Fix Vcvt_FI & Vcvt_RM; Add Vfma_S & Vfms_S. Add Tests.

* Address PR feedback & Nit.
2020-08-13 02:34:02 -03:00
Valentin PONS
3af2ce74ec
Implements some 32-bit instructions (VBIC, VTST, VSRA) (#1192)
* Added some 32 bits instructions:

* VBIC
* VTST
* VSRA

* Incremented the PTC

* Add tests and fix implementation

* Fixed VBIC immediate opcode mapping

* Hey hey!

* Nit.

Co-authored-by: gdkchan <gab.dark.100@gmail.com>
Co-authored-by: LDj3SNuD <dvitiello@gmail.com>
Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>
2020-07-19 15:11:58 -03:00
LDj3SNuD
56a61a5758
CPU: A32: Fix Vabs_V & Vneg_V (S8, S16, S32 & F32); add Tests. (#1394)
* Fix Vabs_V & Vneg_V (S8, S16, S32 & F32); add Tests.

* Update Ptc.cs
2020-07-17 10:57:49 -03:00
LDj3SNuD
88619d71b8
CPU: A32: Add Vadd & Vsub Wide (S/U_8/16/32) Inst.s with Test. (#1390) 2020-07-17 14:21:40 +10:00
Ficture Seven
863b0c8dcb
Fix Decode exception condition (#1377) 2020-07-15 17:48:16 +10:00
LDj3SNuD
a804db6eed
Add Fmax/minv_V & S/Ushl_S Inst.s with Tests. Fix Maxps/d & Minps/d d… (#1335)
* Add Fmax/minv_V & S/Ushl_S Inst.s with Tests. Fix Maxps/d & Minps/d double zero sign handling. Allows better handling of NaNs.

* Optimized EmitSse2VectorIsNaNOpF() for multiple uses per opF.
2020-07-13 21:08:47 +10:00
riperiperi
d7044b10a2
Add SSE4.2 Path for CRC32, add A32 variant, add tests for non-castagnoli variants. (#1328)
* Add CRC32 A32 instructions.

* Fix CRC32 instructions.

* Add CRC intrinsic and fast path.

Loop is currently unrolled, will look into adding temp vars after tests are added.

* Begin work on Crc tests

* Fix SSE4.2 path for CRC32C, finialize tests.

* Remove unused IR path.

* Fix spacing between prefix checks.

* This should be Src.

* PTC Version

* OpCodeTable Order

* Integer check improvement. Value and Crc can be either 32 or 64 size.

* This wasn't necessary...

* If size is 3, value type must be I64.

* Fix same src+dest handling for non crc intrinsics.

* Pre-fix (ha) issue with vex encodings
2020-07-13 20:48:14 +10:00