rjx-mirror/Ryujinx.HLE/HOS/Kernel/Threading/KThread.cs

1217 lines
36 KiB
C#
Raw Normal View History

using Ryujinx.Common.Logging;
using Ryujinx.Cpu;
using Ryujinx.HLE.HOS.Kernel.Common;
using Ryujinx.HLE.HOS.Kernel.Process;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
using System.Threading;
namespace Ryujinx.HLE.HOS.Kernel.Threading
2018-02-04 23:08:20 +00:00
{
class KThread : KSynchronizationObject, IKFutureSchedulerObject
2018-02-04 23:08:20 +00:00
{
public const int MaxWaitSyncObjects = 64;
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
private int _hostThreadRunning;
public Thread HostThread { get; private set; }
public ARMeilleure.State.ExecutionContext Context { get; private set; }
2018-02-04 23:08:20 +00:00
public long AffinityMask { get; set; }
public long ThreadUid { get; private set; }
public long TotalTimeRunning { get; set; }
public KSynchronizationObject SignaledObj { get; set; }
public ulong CondVarAddress { get; set; }
private ulong _entrypoint;
public ulong MutexAddress { get; set; }
public KProcess Owner { get; private set; }
private ulong _tlsAddress;
public ulong TlsAddress => _tlsAddress;
public ulong TlsDramAddress { get; private set; }
public KSynchronizationObject[] WaitSyncObjects { get; }
public int[] WaitSyncHandles { get; }
public long LastScheduledTime { get; set; }
public LinkedListNode<KThread>[] SiblingsPerCore { get; private set; }
public LinkedList<KThread> Withholder { get; set; }
public LinkedListNode<KThread> WithholderNode { get; set; }
public LinkedListNode<KThread> ProcessListNode { get; set; }
private LinkedList<KThread> _mutexWaiters;
private LinkedListNode<KThread> _mutexWaiterNode;
public KThread MutexOwner { get; private set; }
2018-06-26 04:09:32 +00:00
public int ThreadHandleForUserMutex { get; set; }
private ThreadSchedState _forcePauseFlags;
public KernelResult ObjSyncResult { get; set; }
public int DynamicPriority { get; set; }
public int CurrentCore { get; set; }
public int BasePriority { get; set; }
public int PreferredCore { get; set; }
private long _affinityMaskOverride;
private int _preferredCoreOverride;
#pragma warning disable CS0649
private int _affinityOverrideCount;
#pragma warning restore CS0649
public ThreadSchedState SchedFlags { get; private set; }
private int _shallBeTerminated;
public bool ShallBeTerminated { get => _shallBeTerminated != 0; set => _shallBeTerminated = value ? 1 : 0; }
public bool SyncCancelled { get; set; }
public bool WaitingSync { get; set; }
private bool _hasExited;
private bool _hasBeenInitialized;
private bool _hasBeenReleased;
public bool WaitingInArbitration { get; set; }
private KScheduler _scheduler;
private KSchedulingData _schedulingData;
public long LastPc { get; set; }
public KThread(KernelContext context) : base(context)
2018-02-04 23:08:20 +00:00
{
_scheduler = KernelContext.Scheduler;
_schedulingData = KernelContext.Scheduler.SchedulingData;
WaitSyncObjects = new KSynchronizationObject[MaxWaitSyncObjects];
WaitSyncHandles = new int[MaxWaitSyncObjects];
SiblingsPerCore = new LinkedListNode<KThread>[KScheduler.CpuCoresCount];
_mutexWaiters = new LinkedList<KThread>();
}
public KernelResult Initialize(
ulong entrypoint,
ulong argsPtr,
ulong stackTop,
int priority,
int defaultCpuCore,
KProcess owner,
ThreadType type = ThreadType.User,
ThreadStart customHostThreadStart = null)
{
if ((uint)type > 3)
{
throw new ArgumentException($"Invalid thread type \"{type}\".");
}
PreferredCore = defaultCpuCore;
AffinityMask |= 1L << defaultCpuCore;
SchedFlags = type == ThreadType.Dummy
? ThreadSchedState.Running
: ThreadSchedState.None;
CurrentCore = PreferredCore;
DynamicPriority = priority;
BasePriority = priority;
ObjSyncResult = KernelResult.ThreadNotStarted;
_entrypoint = entrypoint;
if (type == ThreadType.User)
{
if (owner.AllocateThreadLocalStorage(out _tlsAddress) != KernelResult.Success)
{
return KernelResult.OutOfMemory;
}
TlsDramAddress = owner.MemoryManager.GetDramAddressFromVa(_tlsAddress);
MemoryHelper.FillWithZeros(owner.CpuMemory, (long)_tlsAddress, KTlsPageInfo.TlsEntrySize);
}
bool is64Bits;
if (owner != null)
{
Owner = owner;
owner.IncrementReferenceCount();
owner.IncrementThreadCount();
is64Bits = (owner.MmuFlags & 1) != 0;
}
else
{
is64Bits = true;
}
HostThread = new Thread(customHostThreadStart ?? (() => ThreadStart(entrypoint)));
Context = CpuContext.CreateExecutionContext();
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
bool isAarch32 = (Owner.MmuFlags & 1) == 0;
Context.IsAarch32 = isAarch32;
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
Context.SetX(0, argsPtr);
if (isAarch32)
{
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
Context.SetX(13, (uint)stackTop);
}
else
{
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
Context.SetX(31, stackTop);
}
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
Context.CntfrqEl0 = 19200000;
Context.Tpidr = (long)_tlsAddress;
owner.SubscribeThreadEventHandlers(Context);
ThreadUid = KernelContext.NewThreadUid();
HostThread.Name = $"HLE.HostThread.{ThreadUid}";
_hasBeenInitialized = true;
if (owner != null)
{
owner.AddThread(this);
if (owner.IsPaused)
{
KernelContext.CriticalSection.Enter();
if (ShallBeTerminated || SchedFlags == ThreadSchedState.TerminationPending)
{
KernelContext.CriticalSection.Leave();
return KernelResult.Success;
}
_forcePauseFlags |= ThreadSchedState.ProcessPauseFlag;
CombineForcePauseFlags();
KernelContext.CriticalSection.Leave();
}
}
return KernelResult.Success;
}
public KernelResult Start()
{
if (!KernelContext.KernelInitialized)
{
KernelContext.CriticalSection.Enter();
if (!ShallBeTerminated && SchedFlags != ThreadSchedState.TerminationPending)
{
_forcePauseFlags |= ThreadSchedState.KernelInitPauseFlag;
CombineForcePauseFlags();
}
KernelContext.CriticalSection.Leave();
}
KernelResult result = KernelResult.ThreadTerminating;
KernelContext.CriticalSection.Enter();
if (!ShallBeTerminated)
{
KThread currentThread = KernelContext.Scheduler.GetCurrentThread();
while (SchedFlags != ThreadSchedState.TerminationPending &&
currentThread.SchedFlags != ThreadSchedState.TerminationPending &&
!currentThread.ShallBeTerminated)
{
if ((SchedFlags & ThreadSchedState.LowMask) != ThreadSchedState.None)
{
result = KernelResult.InvalidState;
break;
}
if (currentThread._forcePauseFlags == ThreadSchedState.None)
{
if (Owner != null && _forcePauseFlags != ThreadSchedState.None)
{
CombineForcePauseFlags();
}
SetNewSchedFlags(ThreadSchedState.Running);
result = KernelResult.Success;
break;
}
else
{
currentThread.CombineForcePauseFlags();
KernelContext.CriticalSection.Leave();
KernelContext.CriticalSection.Enter();
if (currentThread.ShallBeTerminated)
{
break;
}
}
}
}
KernelContext.CriticalSection.Leave();
return result;
}
public void Exit()
{
// TODO: Debug event.
if (Owner != null)
{
Owner.ResourceLimit?.Release(LimitableResource.Thread, 0, 1);
_hasBeenReleased = true;
}
KernelContext.CriticalSection.Enter();
_forcePauseFlags &= ~ThreadSchedState.ForcePauseMask;
ExitImpl();
KernelContext.CriticalSection.Leave();
DecrementReferenceCount();
}
public ThreadSchedState PrepareForTermination()
{
KernelContext.CriticalSection.Enter();
ThreadSchedState result;
if (Interlocked.CompareExchange(ref _shallBeTerminated, 1, 0) == 0)
{
if ((SchedFlags & ThreadSchedState.LowMask) == ThreadSchedState.None)
{
SchedFlags = ThreadSchedState.TerminationPending;
}
else
{
if (_forcePauseFlags != ThreadSchedState.None)
{
_forcePauseFlags &= ~ThreadSchedState.ThreadPauseFlag;
ThreadSchedState oldSchedFlags = SchedFlags;
SchedFlags &= ThreadSchedState.LowMask;
AdjustScheduling(oldSchedFlags);
}
if (BasePriority >= 0x10)
{
SetPriority(0xF);
}
if ((SchedFlags & ThreadSchedState.LowMask) == ThreadSchedState.Running)
{
// TODO: GIC distributor stuffs (sgir changes ect)
}
SignaledObj = null;
ObjSyncResult = KernelResult.ThreadTerminating;
ReleaseAndResume();
}
}
result = SchedFlags;
KernelContext.CriticalSection.Leave();
return result & ThreadSchedState.LowMask;
}
public void Terminate()
{
ThreadSchedState state = PrepareForTermination();
if (state != ThreadSchedState.TerminationPending)
{
KernelContext.Synchronization.WaitFor(new KSynchronizationObject[] { this }, -1, out _);
}
}
public void HandlePostSyscall()
{
ThreadSchedState state;
do
{
if (ShallBeTerminated || SchedFlags == ThreadSchedState.TerminationPending)
{
KernelContext.Scheduler.ExitThread(this);
Exit();
// As the death of the thread is handled by the CPU emulator, we differ from the official kernel and return here.
break;
}
KernelContext.CriticalSection.Enter();
if (ShallBeTerminated || SchedFlags == ThreadSchedState.TerminationPending)
{
state = ThreadSchedState.TerminationPending;
}
else
{
if (_forcePauseFlags != ThreadSchedState.None)
{
CombineForcePauseFlags();
}
state = ThreadSchedState.Running;
}
KernelContext.CriticalSection.Leave();
} while (state == ThreadSchedState.TerminationPending);
}
private void ExitImpl()
{
KernelContext.CriticalSection.Enter();
SetNewSchedFlags(ThreadSchedState.TerminationPending);
_hasExited = true;
Signal();
KernelContext.CriticalSection.Leave();
}
public KernelResult Sleep(long timeout)
{
KernelContext.CriticalSection.Enter();
if (ShallBeTerminated || SchedFlags == ThreadSchedState.TerminationPending)
{
KernelContext.CriticalSection.Leave();
return KernelResult.ThreadTerminating;
}
SetNewSchedFlags(ThreadSchedState.Paused);
if (timeout > 0)
{
KernelContext.TimeManager.ScheduleFutureInvocation(this, timeout);
}
KernelContext.CriticalSection.Leave();
if (timeout > 0)
{
KernelContext.TimeManager.UnscheduleFutureInvocation(this);
}
return 0;
}
public void Yield()
{
KernelContext.CriticalSection.Enter();
if (SchedFlags != ThreadSchedState.Running)
{
KernelContext.CriticalSection.Leave();
KernelContext.Scheduler.ContextSwitch();
return;
}
if (DynamicPriority < KScheduler.PrioritiesCount)
{
// Move current thread to the end of the queue.
_schedulingData.Reschedule(DynamicPriority, CurrentCore, this);
}
_scheduler.ThreadReselectionRequested = true;
KernelContext.CriticalSection.Leave();
KernelContext.Scheduler.ContextSwitch();
}
public void YieldWithLoadBalancing()
{
KernelContext.CriticalSection.Enter();
if (SchedFlags != ThreadSchedState.Running)
{
KernelContext.CriticalSection.Leave();
KernelContext.Scheduler.ContextSwitch();
return;
}
int prio = DynamicPriority;
int core = CurrentCore;
KThread nextThreadOnCurrentQueue = null;
if (DynamicPriority < KScheduler.PrioritiesCount)
{
// Move current thread to the end of the queue.
_schedulingData.Reschedule(prio, core, this);
Func<KThread, bool> predicate = x => x.DynamicPriority == prio;
nextThreadOnCurrentQueue = _schedulingData.ScheduledThreads(core).FirstOrDefault(predicate);
}
IEnumerable<KThread> SuitableCandidates()
{
foreach (KThread thread in _schedulingData.SuggestedThreads(core))
{
int srcCore = thread.CurrentCore;
if (srcCore >= 0)
{
KThread selectedSrcCore = _scheduler.CoreContexts[srcCore].SelectedThread;
if (selectedSrcCore == thread || ((selectedSrcCore?.DynamicPriority ?? 2) < 2))
{
continue;
}
}
// If the candidate was scheduled after the current thread, then it's not worth it,
// unless the priority is higher than the current one.
if (nextThreadOnCurrentQueue.LastScheduledTime >= thread.LastScheduledTime ||
nextThreadOnCurrentQueue.DynamicPriority < thread.DynamicPriority)
{
yield return thread;
}
}
}
KThread dst = SuitableCandidates().FirstOrDefault(x => x.DynamicPriority <= prio);
if (dst != null)
{
_schedulingData.TransferToCore(dst.DynamicPriority, core, dst);
_scheduler.ThreadReselectionRequested = true;
}
if (this != nextThreadOnCurrentQueue)
{
_scheduler.ThreadReselectionRequested = true;
}
KernelContext.CriticalSection.Leave();
KernelContext.Scheduler.ContextSwitch();
}
public void YieldAndWaitForLoadBalancing()
{
KernelContext.CriticalSection.Enter();
if (SchedFlags != ThreadSchedState.Running)
{
KernelContext.CriticalSection.Leave();
KernelContext.Scheduler.ContextSwitch();
return;
}
int core = CurrentCore;
_schedulingData.TransferToCore(DynamicPriority, -1, this);
KThread selectedThread = null;
if (!_schedulingData.ScheduledThreads(core).Any())
{
foreach (KThread thread in _schedulingData.SuggestedThreads(core))
{
if (thread.CurrentCore < 0)
{
continue;
}
KThread firstCandidate = _schedulingData.ScheduledThreads(thread.CurrentCore).FirstOrDefault();
if (firstCandidate == thread)
{
continue;
}
if (firstCandidate == null || firstCandidate.DynamicPriority >= 2)
{
_schedulingData.TransferToCore(thread.DynamicPriority, core, thread);
selectedThread = thread;
}
break;
}
}
if (selectedThread != this)
{
_scheduler.ThreadReselectionRequested = true;
}
KernelContext.CriticalSection.Leave();
KernelContext.Scheduler.ContextSwitch();
}
public void SetPriority(int priority)
{
KernelContext.CriticalSection.Enter();
BasePriority = priority;
UpdatePriorityInheritance();
KernelContext.CriticalSection.Leave();
}
public KernelResult SetActivity(bool pause)
{
KernelResult result = KernelResult.Success;
KernelContext.CriticalSection.Enter();
ThreadSchedState lowNibble = SchedFlags & ThreadSchedState.LowMask;
if (lowNibble != ThreadSchedState.Paused && lowNibble != ThreadSchedState.Running)
{
KernelContext.CriticalSection.Leave();
return KernelResult.InvalidState;
}
KernelContext.CriticalSection.Enter();
if (!ShallBeTerminated && SchedFlags != ThreadSchedState.TerminationPending)
{
if (pause)
{
// Pause, the force pause flag should be clear (thread is NOT paused).
if ((_forcePauseFlags & ThreadSchedState.ThreadPauseFlag) == 0)
{
_forcePauseFlags |= ThreadSchedState.ThreadPauseFlag;
CombineForcePauseFlags();
}
else
{
result = KernelResult.InvalidState;
}
}
else
{
// Unpause, the force pause flag should be set (thread is paused).
if ((_forcePauseFlags & ThreadSchedState.ThreadPauseFlag) != 0)
{
ThreadSchedState oldForcePauseFlags = _forcePauseFlags;
_forcePauseFlags &= ~ThreadSchedState.ThreadPauseFlag;
if ((oldForcePauseFlags & ~ThreadSchedState.ThreadPauseFlag) == ThreadSchedState.None)
{
ThreadSchedState oldSchedFlags = SchedFlags;
SchedFlags &= ThreadSchedState.LowMask;
AdjustScheduling(oldSchedFlags);
}
}
else
{
result = KernelResult.InvalidState;
}
}
}
KernelContext.CriticalSection.Leave();
KernelContext.CriticalSection.Leave();
return result;
}
public void CancelSynchronization()
{
KernelContext.CriticalSection.Enter();
if ((SchedFlags & ThreadSchedState.LowMask) != ThreadSchedState.Paused || !WaitingSync)
{
SyncCancelled = true;
}
else if (Withholder != null)
{
Withholder.Remove(WithholderNode);
SetNewSchedFlags(ThreadSchedState.Running);
Withholder = null;
SyncCancelled = true;
}
else
{
SignaledObj = null;
ObjSyncResult = KernelResult.Cancelled;
SetNewSchedFlags(ThreadSchedState.Running);
SyncCancelled = false;
}
KernelContext.CriticalSection.Leave();
}
public KernelResult SetCoreAndAffinityMask(int newCore, long newAffinityMask)
{
KernelContext.CriticalSection.Enter();
bool useOverride = _affinityOverrideCount != 0;
// The value -3 is "do not change the preferred core".
if (newCore == -3)
{
newCore = useOverride ? _preferredCoreOverride : PreferredCore;
if ((newAffinityMask & (1 << newCore)) == 0)
{
KernelContext.CriticalSection.Leave();
return KernelResult.InvalidCombination;
}
}
if (useOverride)
{
_preferredCoreOverride = newCore;
_affinityMaskOverride = newAffinityMask;
}
else
{
long oldAffinityMask = AffinityMask;
PreferredCore = newCore;
AffinityMask = newAffinityMask;
if (oldAffinityMask != newAffinityMask)
{
int oldCore = CurrentCore;
if (CurrentCore >= 0 && ((AffinityMask >> CurrentCore) & 1) == 0)
{
if (PreferredCore < 0)
{
CurrentCore = HighestSetCore(AffinityMask);
}
else
{
CurrentCore = PreferredCore;
}
}
AdjustSchedulingForNewAffinity(oldAffinityMask, oldCore);
}
}
KernelContext.CriticalSection.Leave();
return KernelResult.Success;
}
private static int HighestSetCore(long mask)
{
for (int core = KScheduler.CpuCoresCount - 1; core >= 0; core--)
{
if (((mask >> core) & 1) != 0)
{
return core;
}
}
return -1;
}
private void CombineForcePauseFlags()
{
ThreadSchedState oldFlags = SchedFlags;
ThreadSchedState lowNibble = SchedFlags & ThreadSchedState.LowMask;
SchedFlags = lowNibble | _forcePauseFlags;
AdjustScheduling(oldFlags);
}
private void SetNewSchedFlags(ThreadSchedState newFlags)
{
KernelContext.CriticalSection.Enter();
ThreadSchedState oldFlags = SchedFlags;
SchedFlags = (oldFlags & ThreadSchedState.HighMask) | newFlags;
if ((oldFlags & ThreadSchedState.LowMask) != newFlags)
{
AdjustScheduling(oldFlags);
}
KernelContext.CriticalSection.Leave();
}
public void ReleaseAndResume()
{
KernelContext.CriticalSection.Enter();
if ((SchedFlags & ThreadSchedState.LowMask) == ThreadSchedState.Paused)
{
if (Withholder != null)
{
Withholder.Remove(WithholderNode);
SetNewSchedFlags(ThreadSchedState.Running);
Withholder = null;
}
else
{
SetNewSchedFlags(ThreadSchedState.Running);
}
}
KernelContext.CriticalSection.Leave();
}
public void Reschedule(ThreadSchedState newFlags)
{
KernelContext.CriticalSection.Enter();
ThreadSchedState oldFlags = SchedFlags;
SchedFlags = (oldFlags & ThreadSchedState.HighMask) |
(newFlags & ThreadSchedState.LowMask);
AdjustScheduling(oldFlags);
KernelContext.CriticalSection.Leave();
}
public void AddMutexWaiter(KThread requester)
{
AddToMutexWaitersList(requester);
requester.MutexOwner = this;
UpdatePriorityInheritance();
}
public void RemoveMutexWaiter(KThread thread)
{
if (thread._mutexWaiterNode?.List != null)
{
_mutexWaiters.Remove(thread._mutexWaiterNode);
}
thread.MutexOwner = null;
UpdatePriorityInheritance();
}
public KThread RelinquishMutex(ulong mutexAddress, out int count)
{
count = 0;
if (_mutexWaiters.First == null)
{
return null;
}
KThread newMutexOwner = null;
LinkedListNode<KThread> currentNode = _mutexWaiters.First;
do
{
// Skip all threads that are not waiting for this mutex.
while (currentNode != null && currentNode.Value.MutexAddress != mutexAddress)
{
currentNode = currentNode.Next;
}
if (currentNode == null)
{
break;
}
LinkedListNode<KThread> nextNode = currentNode.Next;
_mutexWaiters.Remove(currentNode);
currentNode.Value.MutexOwner = newMutexOwner;
if (newMutexOwner != null)
{
// New owner was already selected, re-insert on new owner list.
newMutexOwner.AddToMutexWaitersList(currentNode.Value);
}
else
{
// New owner not selected yet, use current thread.
newMutexOwner = currentNode.Value;
}
count++;
currentNode = nextNode;
}
while (currentNode != null);
if (newMutexOwner != null)
{
UpdatePriorityInheritance();
newMutexOwner.UpdatePriorityInheritance();
}
return newMutexOwner;
}
private void UpdatePriorityInheritance()
{
// If any of the threads waiting for the mutex has
// higher priority than the current thread, then
// the current thread inherits that priority.
int highestPriority = BasePriority;
if (_mutexWaiters.First != null)
{
int waitingDynamicPriority = _mutexWaiters.First.Value.DynamicPriority;
if (waitingDynamicPriority < highestPriority)
{
highestPriority = waitingDynamicPriority;
}
}
if (highestPriority != DynamicPriority)
{
int oldPriority = DynamicPriority;
DynamicPriority = highestPriority;
AdjustSchedulingForNewPriority(oldPriority);
if (MutexOwner != null)
{
// Remove and re-insert to ensure proper sorting based on new priority.
MutexOwner._mutexWaiters.Remove(_mutexWaiterNode);
MutexOwner.AddToMutexWaitersList(this);
MutexOwner.UpdatePriorityInheritance();
}
}
}
private void AddToMutexWaitersList(KThread thread)
{
LinkedListNode<KThread> nextPrio = _mutexWaiters.First;
int currentPriority = thread.DynamicPriority;
while (nextPrio != null && nextPrio.Value.DynamicPriority <= currentPriority)
{
nextPrio = nextPrio.Next;
}
if (nextPrio != null)
{
thread._mutexWaiterNode = _mutexWaiters.AddBefore(nextPrio, thread);
}
else
{
thread._mutexWaiterNode = _mutexWaiters.AddLast(thread);
}
}
private void AdjustScheduling(ThreadSchedState oldFlags)
{
if (oldFlags == SchedFlags)
{
return;
}
if (oldFlags == ThreadSchedState.Running)
{
// Was running, now it's stopped.
if (CurrentCore >= 0)
{
_schedulingData.Unschedule(DynamicPriority, CurrentCore, this);
}
for (int core = 0; core < KScheduler.CpuCoresCount; core++)
{
if (core != CurrentCore && ((AffinityMask >> core) & 1) != 0)
{
_schedulingData.Unsuggest(DynamicPriority, core, this);
}
}
}
else if (SchedFlags == ThreadSchedState.Running)
{
// Was stopped, now it's running.
if (CurrentCore >= 0)
{
_schedulingData.Schedule(DynamicPriority, CurrentCore, this);
}
for (int core = 0; core < KScheduler.CpuCoresCount; core++)
{
if (core != CurrentCore && ((AffinityMask >> core) & 1) != 0)
{
_schedulingData.Suggest(DynamicPriority, core, this);
}
}
}
_scheduler.ThreadReselectionRequested = true;
}
private void AdjustSchedulingForNewPriority(int oldPriority)
{
if (SchedFlags != ThreadSchedState.Running)
{
return;
}
// Remove thread from the old priority queues.
if (CurrentCore >= 0)
{
_schedulingData.Unschedule(oldPriority, CurrentCore, this);
}
for (int core = 0; core < KScheduler.CpuCoresCount; core++)
{
if (core != CurrentCore && ((AffinityMask >> core) & 1) != 0)
{
_schedulingData.Unsuggest(oldPriority, core, this);
}
}
// Add thread to the new priority queues.
KThread currentThread = _scheduler.GetCurrentThread();
if (CurrentCore >= 0)
{
if (currentThread == this)
{
_schedulingData.SchedulePrepend(DynamicPriority, CurrentCore, this);
}
else
{
_schedulingData.Schedule(DynamicPriority, CurrentCore, this);
}
}
for (int core = 0; core < KScheduler.CpuCoresCount; core++)
{
if (core != CurrentCore && ((AffinityMask >> core) & 1) != 0)
{
_schedulingData.Suggest(DynamicPriority, core, this);
}
}
_scheduler.ThreadReselectionRequested = true;
}
private void AdjustSchedulingForNewAffinity(long oldAffinityMask, int oldCore)
{
if (SchedFlags != ThreadSchedState.Running || DynamicPriority >= KScheduler.PrioritiesCount)
{
return;
}
// Remove thread from the old priority queues.
for (int core = 0; core < KScheduler.CpuCoresCount; core++)
{
if (((oldAffinityMask >> core) & 1) != 0)
{
if (core == oldCore)
{
_schedulingData.Unschedule(DynamicPriority, core, this);
}
else
{
_schedulingData.Unsuggest(DynamicPriority, core, this);
}
}
}
// Add thread to the new priority queues.
for (int core = 0; core < KScheduler.CpuCoresCount; core++)
{
if (((AffinityMask >> core) & 1) != 0)
{
if (core == CurrentCore)
{
_schedulingData.Schedule(DynamicPriority, core, this);
}
else
{
_schedulingData.Suggest(DynamicPriority, core, this);
}
}
}
_scheduler.ThreadReselectionRequested = true;
}
public void SetEntryArguments(long argsPtr, int threadHandle)
{
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
Context.SetX(0, (ulong)argsPtr);
Context.SetX(1, (ulong)threadHandle);
}
public void TimeUp()
{
ReleaseAndResume();
}
public string GetGuestStackTrace()
{
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
return Owner.Debugger.GetGuestStackTrace(Context);
}
public void PrintGuestStackTrace()
{
Logger.Info?.Print(LogClass.Cpu, $"Guest stack trace:\n{GetGuestStackTrace()}\n");
}
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
public void Execute()
{
if (Interlocked.CompareExchange(ref _hostThreadRunning, 1, 0) == 0)
{
HostThread.Start();
}
}
private void ThreadStart(ulong entrypoint)
{
Owner.CpuContext.Execute(Context, entrypoint);
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
ThreadExit();
Use a Jump Table for direct and indirect calls/jumps, removing transitions to managed (#975) * Implement Jump Table for Native Calls NOTE: this slows down rejit considerably! Not recommended to be used without codegen optimisation or AOT. - Does not work on Linux - A32 needs an additional commit. * A32 Support (WIP) * Actually write Direct Call pointers to the table That would help. * Direct Calls: Rather than returning to the translator, attempt to keep within the native stack frame. A return to the translator can still happen, but only by exceptionally bubbling up to it. Also: - Always translate lowCq as a function. Faster interop with the direct jumps, and this will be useful in future if we want to do speculative translation. - Tail Call Detection: after the decoding stage, detect if we do a tail call, and avoid translating into it. Detected if a jump is made to an address outwith the contiguous sequence of blocks surrounding the entry point. The goal is to reduce code touched by jit and rejit. * A32 Support * Use smaller max function size for lowCq, fix exceptional returns When a return has an unexpected value and there is no code block following this one, we now return the value rather than continuing. * CompareAndSwap (buggy) * Ensure CompareAndSwap does not get optimized away. * Use CompareAndSwap to make the dynamic table thread safe. * Tail call for linux, throw on too many arguments. * Combine CompareAndSwap 128 and 32/64. They emit different IR instructions since their PreAllocator behaviour is different, but now they just have one function on EmitterContext. * Fix issues separating from optimisations. * Use a stub to find and execute missing functions. This allows us to skip doing many runtime comparisons and branches, and reduces the amount of code we need to emit significantly. For the indirect call table, this stub also does the work of moving in the highCq address to the table when one is found. * Make Jump Tables and Jit Cache dynmically resize Reserve virtual memory, commit as needed. * Move TailCallRemover to its own class. * Multithreaded Translation (based on heuristic) A poor one, at that. Need to get core count for a better one, which means a lot of OS specific garbage. * Better priority management for background threads. * Bound core limit a bit more Past a certain point the load is not paralellizable and starts stealing from the main thread. Likely due to GC, memory, heap allocation thread contention. Reduce by one core til optimisations come to improve the situation. * Fix memory management on linux. * Temporary solution to some sync problems. This will make sure threads exit correctly, most of the time. There is a potential race where setting the sync counter to 0 does nothing (counter stays at what it was before, thread could take too long to exit), but we need to find a better way to do this anyways. Synchronization frequency has been tightened as we never enter blockwise segments of code. Essentially this means, check every x functions or loop iterations, before lowcq blocks existed and were worth just as much. Ideally it should be done in a better way, since functions can be anywhere from 1 to 5000 instructions. (maybe based on host timer, or an interrupt flag from a scheduler thread) * Address feedback minus CompareAndSwap change. * Use default ReservedRegion granularity. * Merge CompareAndSwap with its V128 variant. * We already got the source, no need to do it again. * Make sure all background translation threads exit. * Fix CompareAndSwap128 Detection criteria was a bit scuffed. * Address Comments.
2020-03-12 03:20:55 +00:00
Context.Dispose();
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
}
private void ThreadExit()
{
KernelContext.Scheduler.ExitThread(this);
KernelContext.Scheduler.RemoveThread(this);
}
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
public bool IsCurrentHostThread()
{
return Thread.CurrentThread == HostThread;
}
public override bool IsSignaled()
{
return _hasExited;
}
protected override void Destroy()
{
if (_hasBeenInitialized)
{
FreeResources();
bool released = Owner != null || _hasBeenReleased;
if (Owner != null)
{
Owner.ResourceLimit?.Release(LimitableResource.Thread, 1, released ? 0 : 1);
Owner.DecrementReferenceCount();
}
else
{
KernelContext.ResourceLimit.Release(LimitableResource.Thread, 1, released ? 0 : 1);
}
}
}
private void FreeResources()
{
Owner?.RemoveThread(this);
if (_tlsAddress != 0 && Owner.FreeThreadLocalStorage(_tlsAddress) != KernelResult.Success)
{
throw new InvalidOperationException("Unexpected failure freeing thread local storage.");
}
KernelContext.CriticalSection.Enter();
// Wake up all threads that may be waiting for a mutex being held by this thread.
foreach (KThread thread in _mutexWaiters)
{
thread.MutexOwner = null;
thread._preferredCoreOverride = 0;
thread.ObjSyncResult = KernelResult.InvalidState;
thread.ReleaseAndResume();
}
KernelContext.CriticalSection.Leave();
Owner?.DecrementThreadCountAndTerminateIfZero();
2018-02-04 23:08:20 +00:00
}
}
}