Ryujinx/ARMeilleure/Translation/Translator.cs

577 lines
19 KiB
C#
Raw Normal View History

using ARMeilleure.CodeGen;
using ARMeilleure.Common;
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
using ARMeilleure.Decoders;
using ARMeilleure.Diagnostics;
using ARMeilleure.Instructions;
using ARMeilleure.IntermediateRepresentation;
using ARMeilleure.Memory;
POWER - Performance Optimizations With Extensive Ramifications (#2286) * Refactoring of KMemoryManager class * Replace some trivial uses of DRAM address with VA * Get rid of GetDramAddressFromVa * Abstracting more operations on derived page table class * Run auto-format on KPageTableBase * Managed to make TryConvertVaToPa private, few uses remains now * Implement guest physical pages ref counting, remove manual freeing * Make DoMmuOperation private and call new abstract methods only from the base class * Pass pages count rather than size on Map/UnmapMemory * Change memory managers to take host pointers * Fix a guest memory leak and simplify KPageTable * Expose new methods for host range query and mapping * Some refactoring of MapPagesFromClientProcess to allow proper page ref counting and mapping without KPageLists * Remove more uses of AddVaRangeToPageList, now only one remains (shared memory page checking) * Add a SharedMemoryStorage class, will be useful for host mapping * Sayonara AddVaRangeToPageList, you served us well * Start to implement host memory mapping (WIP) * Support memory tracking through host exception handling * Fix some access violations from HLE service guest memory access and CPU * Fix memory tracking * Fix mapping list bugs, including a race and a error adding mapping ranges * Simple page table for memory tracking * Simple "volatile" region handle mode * Update UBOs directly (experimental, rough) * Fix the overlap check * Only set non-modified buffers as volatile * Fix some memory tracking issues * Fix possible race in MapBufferFromClientProcess (block list updates were not locked) * Write uniform update to memory immediately, only defer the buffer set. * Fix some memory tracking issues * Pass correct pages count on shared memory unmap * Armeilleure Signal Handler v1 + Unix changes Unix currently behaves like windows, rather than remapping physical * Actually check if the host platform is unix * Fix decommit on linux. * Implement windows 10 placeholder shared memory, fix a buffer issue. * Make PTC version something that will never match with master * Remove testing variable for block count * Add reference count for memory manager, fix dispose Can still deadlock with OpenAL * Add address validation, use page table for mapped check, add docs Might clean up the page table traversing routines. * Implement batched mapping/tracking. * Move documentation, fix tests. * Cleanup uniform buffer update stuff. * Remove unnecessary assignment. * Add unsafe host mapped memory switch On by default. Would be good to turn this off for untrusted code (homebrew, exefs mods) and give the user the option to turn it on manually, though that requires some UI work. * Remove C# exception handlers They have issues due to current .NET limitations, so the meilleure one fully replaces them for now. * Fix MapPhysicalMemory on the software MemoryManager. * Null check for GetHostAddress, docs * Add configuration for setting memory manager mode (not in UI yet) * Add config to UI * Fix type mismatch on Unix signal handler code emit * Fix 6GB DRAM mode. The size can be greater than `uint.MaxValue` when the DRAM is >4GB. * Address some feedback. * More detailed error if backing memory cannot be mapped. * SetLastError on all OS functions for consistency * Force pages dirty with UBO update instead of setting them directly. Seems to be much faster across a few games. Need retesting. * Rebase, configuration rework, fix mem tracking regression * Fix race in FreePages * Set memory managers null after decrementing ref count * Remove readonly keyword, as this is now modified. * Use a local variable for the signal handler rather than a register. * Fix bug with buffer resize, and index/uniform buffer binding. Should fix flickering in games. * Add InvalidAccessHandler to MemoryTracking Doesn't do anything yet * Call invalid access handler on unmapped read/write. Same rules as the regular memory manager. * Make unsafe mapped memory its own MemoryManagerType * Move FlushUboDirty into UpdateState. * Buffer dirty cache, rather than ubo cache Much cleaner, may be reusable for Inline2Memory updates. * This doesn't return anything anymore. * Add sigaction remove methods, correct a few function signatures. * Return empty list of physical regions for size 0. * Also on AddressSpaceManager Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2021-05-24 20:52:44 +00:00
using ARMeilleure.Signal;
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
using ARMeilleure.State;
using ARMeilleure.Translation.Cache;
using ARMeilleure.Translation.PTC;
using Ryujinx.Common;
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Runtime.InteropServices;
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
using System.Threading;
Reduce JIT GC allocations (#2515) * Turn `MemoryOperand` into a struct * Remove `IntrinsicOperation` * Remove `PhiNode` * Remove `Node` * Turn `Operand` into a struct * Turn `Operation` into a struct * Clean up pool management methods * Add `Arena` allocator * Move `OperationHelper` to `Operation.Factory` * Move `OperandHelper` to `Operand.Factory` * Optimize `Operation` a bit * Fix `Arena` initialization * Rename `NativeList<T>` to `ArenaList<T>` * Reduce `Operand` size from 88 to 56 bytes * Reduce `Operation` size from 56 to 40 bytes * Add optimistic interning of Register & Constant operands * Optimize `RegisterUsage` pass a bit * Optimize `RemoveUnusedNodes` pass a bit Iterating in reverse-order allows killing dependency chains in a single pass. * Fix PPTC symbols * Optimize `BasicBlock` a bit Reduce allocations from `_successor` & `DominanceFrontiers` * Fix `Operation` resize * Make `Arena` expandable Change the arena allocator to be expandable by allocating in pages, with some of them being pooled. Currently 32 pages are pooled. An LRU removal mechanism should probably be added to it. Apparently MHR can allocate bitmaps large enough to exceed the 16MB limit for the type. * Move `Arena` & `ArenaList` to `Common` * Remove `ThreadStaticPool` & co * Add `PhiOperation` * Reduce `Operand` size from 56 from 48 bytes * Add linear-probing to `Operand` intern table * Optimize `HybridAllocator` a bit * Add `Allocators` class * Tune `ArenaAllocator` sizes * Add page removal mechanism to `ArenaAllocator` Remove pages which have not been used for more than 5s after each reset. I am on fence if this would be better using a Gen2 callback object like the one in System.Buffers.ArrayPool<T>, to trim the pool. Because right now if a large translation happens, the pages will be freed only after a reset. This reset may not happen for a while because no new translation is hit, but the arena base sizes are rather small. * Fix `OOM` when allocating larger than page size in `ArenaAllocator` Tweak resizing mechanism for Operand.Uses and Assignemnts. * Optimize `Optimizer` a bit * Optimize `Operand.Add<T>/Remove<T>` a bit * Clean up `PreAllocator` * Fix phi insertion order Reduce codegen diffs. * Fix code alignment * Use new heuristics for degree of parallelism * Suppress warnings * Address gdkchan's feedback Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that `Operand.Value` should usually not be modified directly. * Add fast path to `ArenaAllocator` * Assembly for `ArenaAllocator.Allocate(ulong)`: .L0: mov rax, [rcx+0x18] lea r8, [rax+rdx] cmp r8, [rcx+0x10] ja short .L2 .L1: mov rdx, [rcx+8] add rax, [rdx+8] mov [rcx+0x18], r8 ret .L2: jmp ArenaAllocator.AllocateSlow(UInt64) A few variable/field had to be changed to ulong so that RyuJIT avoids emitting zero-extends. * Implement a new heuristic to free pooled pages. If an arena is used often, it is more likely that its pages will be needed, so the pages are kept for longer (e.g: during PPTC rebuild or burst sof compilations). If is not used often, then it is more likely that its pages will not be needed (e.g: after PPTC rebuild or bursts of compilations). * Address riperiperi's feedback * Use `EqualityComparer<T>` in `IntrusiveList<T>` Avoids a potential GC hole in `Equals(T, T)`.
2021-08-17 18:08:34 +00:00
using static ARMeilleure.IntermediateRepresentation.Operand.Factory;
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
namespace ARMeilleure.Translation
{
public class Translator
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
{
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
private static readonly AddressTable<ulong>.Level[] Levels64Bit =
new AddressTable<ulong>.Level[]
{
new(31, 17),
new(23, 8),
new(15, 8),
new( 7, 8),
new( 2, 5)
};
private static readonly AddressTable<ulong>.Level[] Levels32Bit =
new AddressTable<ulong>.Level[]
{
new(31, 17),
new(23, 8),
new(15, 8),
new( 7, 8),
new( 1, 6)
};
private readonly IJitMemoryAllocator _allocator;
private readonly ConcurrentQueue<KeyValuePair<ulong, TranslatedFunction>> _oldFuncs;
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
private readonly Ptc _ptc;
internal TranslatorCache<TranslatedFunction> Functions { get; }
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
internal AddressTable<ulong> FunctionTable { get; }
internal EntryTable<uint> CountTable { get; }
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
internal TranslatorStubs Stubs { get; }
2022-09-08 23:14:08 +00:00
internal TranslatorQueue Queue { get; }
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
internal IMemoryManager Memory { get; }
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
private volatile int _threadCount;
// FIXME: Remove this once the init logic of the emulator will be redone.
public static readonly ManualResetEvent IsReadyForTranslation = new(false);
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
public Translator(IJitMemoryAllocator allocator, IMemoryManager memory, bool for64Bits)
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
{
_allocator = allocator;
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
Memory = memory;
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
_oldFuncs = new ConcurrentQueue<KeyValuePair<ulong, TranslatedFunction>>();
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
_ptc = new Ptc();
2022-09-08 23:14:08 +00:00
Queue = new TranslatorQueue();
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
JitCache.Initialize(allocator);
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
CountTable = new EntryTable<uint>();
Functions = new TranslatorCache<TranslatedFunction>();
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
FunctionTable = new AddressTable<ulong>(for64Bits ? Levels64Bit : Levels32Bit);
Stubs = new TranslatorStubs(this);
FunctionTable.Fill = (ulong)Stubs.SlowDispatchStub;
POWER - Performance Optimizations With Extensive Ramifications (#2286) * Refactoring of KMemoryManager class * Replace some trivial uses of DRAM address with VA * Get rid of GetDramAddressFromVa * Abstracting more operations on derived page table class * Run auto-format on KPageTableBase * Managed to make TryConvertVaToPa private, few uses remains now * Implement guest physical pages ref counting, remove manual freeing * Make DoMmuOperation private and call new abstract methods only from the base class * Pass pages count rather than size on Map/UnmapMemory * Change memory managers to take host pointers * Fix a guest memory leak and simplify KPageTable * Expose new methods for host range query and mapping * Some refactoring of MapPagesFromClientProcess to allow proper page ref counting and mapping without KPageLists * Remove more uses of AddVaRangeToPageList, now only one remains (shared memory page checking) * Add a SharedMemoryStorage class, will be useful for host mapping * Sayonara AddVaRangeToPageList, you served us well * Start to implement host memory mapping (WIP) * Support memory tracking through host exception handling * Fix some access violations from HLE service guest memory access and CPU * Fix memory tracking * Fix mapping list bugs, including a race and a error adding mapping ranges * Simple page table for memory tracking * Simple "volatile" region handle mode * Update UBOs directly (experimental, rough) * Fix the overlap check * Only set non-modified buffers as volatile * Fix some memory tracking issues * Fix possible race in MapBufferFromClientProcess (block list updates were not locked) * Write uniform update to memory immediately, only defer the buffer set. * Fix some memory tracking issues * Pass correct pages count on shared memory unmap * Armeilleure Signal Handler v1 + Unix changes Unix currently behaves like windows, rather than remapping physical * Actually check if the host platform is unix * Fix decommit on linux. * Implement windows 10 placeholder shared memory, fix a buffer issue. * Make PTC version something that will never match with master * Remove testing variable for block count * Add reference count for memory manager, fix dispose Can still deadlock with OpenAL * Add address validation, use page table for mapped check, add docs Might clean up the page table traversing routines. * Implement batched mapping/tracking. * Move documentation, fix tests. * Cleanup uniform buffer update stuff. * Remove unnecessary assignment. * Add unsafe host mapped memory switch On by default. Would be good to turn this off for untrusted code (homebrew, exefs mods) and give the user the option to turn it on manually, though that requires some UI work. * Remove C# exception handlers They have issues due to current .NET limitations, so the meilleure one fully replaces them for now. * Fix MapPhysicalMemory on the software MemoryManager. * Null check for GetHostAddress, docs * Add configuration for setting memory manager mode (not in UI yet) * Add config to UI * Fix type mismatch on Unix signal handler code emit * Fix 6GB DRAM mode. The size can be greater than `uint.MaxValue` when the DRAM is >4GB. * Address some feedback. * More detailed error if backing memory cannot be mapped. * SetLastError on all OS functions for consistency * Force pages dirty with UBO update instead of setting them directly. Seems to be much faster across a few games. Need retesting. * Rebase, configuration rework, fix mem tracking regression * Fix race in FreePages * Set memory managers null after decrementing ref count * Remove readonly keyword, as this is now modified. * Use a local variable for the signal handler rather than a register. * Fix bug with buffer resize, and index/uniform buffer binding. Should fix flickering in games. * Add InvalidAccessHandler to MemoryTracking Doesn't do anything yet * Call invalid access handler on unmapped read/write. Same rules as the regular memory manager. * Make unsafe mapped memory its own MemoryManagerType * Move FlushUboDirty into UpdateState. * Buffer dirty cache, rather than ubo cache Much cleaner, may be reusable for Inline2Memory updates. * This doesn't return anything anymore. * Add sigaction remove methods, correct a few function signatures. * Return empty list of physical regions for size 0. * Also on AddressSpaceManager Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2021-05-24 20:52:44 +00:00
if (memory.Type.IsHostMapped())
{
NativeSignalHandler.InitializeSignalHandler();
}
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
}
public IPtcLoadState LoadDiskCache(string titleIdText, string displayVersion, bool enabled)
{
_ptc.Initialize(titleIdText, displayVersion, enabled, Memory.Type);
return _ptc;
}
public void PrepareCodeRange(ulong address, ulong size)
{
if (_ptc.Profiler.StaticCodeSize == 0)
{
_ptc.Profiler.StaticCodeStart = address;
_ptc.Profiler.StaticCodeSize = size;
}
}
public void Execute(State.ExecutionContext context, ulong address)
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
{
if (Interlocked.Increment(ref _threadCount) == 1)
{
IsReadyForTranslation.WaitOne();
if (_ptc.State == PtcState.Enabled)
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
{
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
Debug.Assert(Functions.Count == 0);
_ptc.LoadTranslations(this);
_ptc.MakeAndSaveTranslations(this);
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
}
_ptc.Profiler.Start();
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
_ptc.Disable();
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
// Simple heuristic, should be user configurable in future. (1 for 4 core/ht or less, 2 for 6 core + ht
// etc). All threads are normal priority except from the last, which just fills as much of the last core
// as the os lets it with a low priority. If we only have one rejit thread, it should be normal priority
// as highCq code is performance critical.
//
// TODO: Use physical cores rather than logical. This only really makes sense for processors with
// hyperthreading. Requires OS specific code.
Use a Jump Table for direct and indirect calls/jumps, removing transitions to managed (#975) * Implement Jump Table for Native Calls NOTE: this slows down rejit considerably! Not recommended to be used without codegen optimisation or AOT. - Does not work on Linux - A32 needs an additional commit. * A32 Support (WIP) * Actually write Direct Call pointers to the table That would help. * Direct Calls: Rather than returning to the translator, attempt to keep within the native stack frame. A return to the translator can still happen, but only by exceptionally bubbling up to it. Also: - Always translate lowCq as a function. Faster interop with the direct jumps, and this will be useful in future if we want to do speculative translation. - Tail Call Detection: after the decoding stage, detect if we do a tail call, and avoid translating into it. Detected if a jump is made to an address outwith the contiguous sequence of blocks surrounding the entry point. The goal is to reduce code touched by jit and rejit. * A32 Support * Use smaller max function size for lowCq, fix exceptional returns When a return has an unexpected value and there is no code block following this one, we now return the value rather than continuing. * CompareAndSwap (buggy) * Ensure CompareAndSwap does not get optimized away. * Use CompareAndSwap to make the dynamic table thread safe. * Tail call for linux, throw on too many arguments. * Combine CompareAndSwap 128 and 32/64. They emit different IR instructions since their PreAllocator behaviour is different, but now they just have one function on EmitterContext. * Fix issues separating from optimisations. * Use a stub to find and execute missing functions. This allows us to skip doing many runtime comparisons and branches, and reduces the amount of code we need to emit significantly. For the indirect call table, this stub also does the work of moving in the highCq address to the table when one is found. * Make Jump Tables and Jit Cache dynmically resize Reserve virtual memory, commit as needed. * Move TailCallRemover to its own class. * Multithreaded Translation (based on heuristic) A poor one, at that. Need to get core count for a better one, which means a lot of OS specific garbage. * Better priority management for background threads. * Bound core limit a bit more Past a certain point the load is not paralellizable and starts stealing from the main thread. Likely due to GC, memory, heap allocation thread contention. Reduce by one core til optimisations come to improve the situation. * Fix memory management on linux. * Temporary solution to some sync problems. This will make sure threads exit correctly, most of the time. There is a potential race where setting the sync counter to 0 does nothing (counter stays at what it was before, thread could take too long to exit), but we need to find a better way to do this anyways. Synchronization frequency has been tightened as we never enter blockwise segments of code. Essentially this means, check every x functions or loop iterations, before lowcq blocks existed and were worth just as much. Ideally it should be done in a better way, since functions can be anywhere from 1 to 5000 instructions. (maybe based on host timer, or an interrupt flag from a scheduler thread) * Address feedback minus CompareAndSwap change. * Use default ReservedRegion granularity. * Merge CompareAndSwap with its V128 variant. * We already got the source, no need to do it again. * Make sure all background translation threads exit. * Fix CompareAndSwap128 Detection criteria was a bit scuffed. * Address Comments.
2020-03-12 03:20:55 +00:00
int unboundedThreadCount = Math.Max(1, (Environment.ProcessorCount - 6) / 3);
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
int threadCount = Math.Min(4, unboundedThreadCount);
Use a Jump Table for direct and indirect calls/jumps, removing transitions to managed (#975) * Implement Jump Table for Native Calls NOTE: this slows down rejit considerably! Not recommended to be used without codegen optimisation or AOT. - Does not work on Linux - A32 needs an additional commit. * A32 Support (WIP) * Actually write Direct Call pointers to the table That would help. * Direct Calls: Rather than returning to the translator, attempt to keep within the native stack frame. A return to the translator can still happen, but only by exceptionally bubbling up to it. Also: - Always translate lowCq as a function. Faster interop with the direct jumps, and this will be useful in future if we want to do speculative translation. - Tail Call Detection: after the decoding stage, detect if we do a tail call, and avoid translating into it. Detected if a jump is made to an address outwith the contiguous sequence of blocks surrounding the entry point. The goal is to reduce code touched by jit and rejit. * A32 Support * Use smaller max function size for lowCq, fix exceptional returns When a return has an unexpected value and there is no code block following this one, we now return the value rather than continuing. * CompareAndSwap (buggy) * Ensure CompareAndSwap does not get optimized away. * Use CompareAndSwap to make the dynamic table thread safe. * Tail call for linux, throw on too many arguments. * Combine CompareAndSwap 128 and 32/64. They emit different IR instructions since their PreAllocator behaviour is different, but now they just have one function on EmitterContext. * Fix issues separating from optimisations. * Use a stub to find and execute missing functions. This allows us to skip doing many runtime comparisons and branches, and reduces the amount of code we need to emit significantly. For the indirect call table, this stub also does the work of moving in the highCq address to the table when one is found. * Make Jump Tables and Jit Cache dynmically resize Reserve virtual memory, commit as needed. * Move TailCallRemover to its own class. * Multithreaded Translation (based on heuristic) A poor one, at that. Need to get core count for a better one, which means a lot of OS specific garbage. * Better priority management for background threads. * Bound core limit a bit more Past a certain point the load is not paralellizable and starts stealing from the main thread. Likely due to GC, memory, heap allocation thread contention. Reduce by one core til optimisations come to improve the situation. * Fix memory management on linux. * Temporary solution to some sync problems. This will make sure threads exit correctly, most of the time. There is a potential race where setting the sync counter to 0 does nothing (counter stays at what it was before, thread could take too long to exit), but we need to find a better way to do this anyways. Synchronization frequency has been tightened as we never enter blockwise segments of code. Essentially this means, check every x functions or loop iterations, before lowcq blocks existed and were worth just as much. Ideally it should be done in a better way, since functions can be anywhere from 1 to 5000 instructions. (maybe based on host timer, or an interrupt flag from a scheduler thread) * Address feedback minus CompareAndSwap change. * Use default ReservedRegion granularity. * Merge CompareAndSwap with its V128 variant. * We already got the source, no need to do it again. * Make sure all background translation threads exit. * Fix CompareAndSwap128 Detection criteria was a bit scuffed. * Address Comments.
2020-03-12 03:20:55 +00:00
for (int i = 0; i < threadCount; i++)
{
Use a Jump Table for direct and indirect calls/jumps, removing transitions to managed (#975) * Implement Jump Table for Native Calls NOTE: this slows down rejit considerably! Not recommended to be used without codegen optimisation or AOT. - Does not work on Linux - A32 needs an additional commit. * A32 Support (WIP) * Actually write Direct Call pointers to the table That would help. * Direct Calls: Rather than returning to the translator, attempt to keep within the native stack frame. A return to the translator can still happen, but only by exceptionally bubbling up to it. Also: - Always translate lowCq as a function. Faster interop with the direct jumps, and this will be useful in future if we want to do speculative translation. - Tail Call Detection: after the decoding stage, detect if we do a tail call, and avoid translating into it. Detected if a jump is made to an address outwith the contiguous sequence of blocks surrounding the entry point. The goal is to reduce code touched by jit and rejit. * A32 Support * Use smaller max function size for lowCq, fix exceptional returns When a return has an unexpected value and there is no code block following this one, we now return the value rather than continuing. * CompareAndSwap (buggy) * Ensure CompareAndSwap does not get optimized away. * Use CompareAndSwap to make the dynamic table thread safe. * Tail call for linux, throw on too many arguments. * Combine CompareAndSwap 128 and 32/64. They emit different IR instructions since their PreAllocator behaviour is different, but now they just have one function on EmitterContext. * Fix issues separating from optimisations. * Use a stub to find and execute missing functions. This allows us to skip doing many runtime comparisons and branches, and reduces the amount of code we need to emit significantly. For the indirect call table, this stub also does the work of moving in the highCq address to the table when one is found. * Make Jump Tables and Jit Cache dynmically resize Reserve virtual memory, commit as needed. * Move TailCallRemover to its own class. * Multithreaded Translation (based on heuristic) A poor one, at that. Need to get core count for a better one, which means a lot of OS specific garbage. * Better priority management for background threads. * Bound core limit a bit more Past a certain point the load is not paralellizable and starts stealing from the main thread. Likely due to GC, memory, heap allocation thread contention. Reduce by one core til optimisations come to improve the situation. * Fix memory management on linux. * Temporary solution to some sync problems. This will make sure threads exit correctly, most of the time. There is a potential race where setting the sync counter to 0 does nothing (counter stays at what it was before, thread could take too long to exit), but we need to find a better way to do this anyways. Synchronization frequency has been tightened as we never enter blockwise segments of code. Essentially this means, check every x functions or loop iterations, before lowcq blocks existed and were worth just as much. Ideally it should be done in a better way, since functions can be anywhere from 1 to 5000 instructions. (maybe based on host timer, or an interrupt flag from a scheduler thread) * Address feedback minus CompareAndSwap change. * Use default ReservedRegion granularity. * Merge CompareAndSwap with its V128 variant. * We already got the source, no need to do it again. * Make sure all background translation threads exit. * Fix CompareAndSwap128 Detection criteria was a bit scuffed. * Address Comments.
2020-03-12 03:20:55 +00:00
bool last = i != 0 && i == unboundedThreadCount - 1;
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
2022-09-08 23:14:08 +00:00
Thread backgroundTranslatorThread = new Thread(BackgroundTranslate)
Use a Jump Table for direct and indirect calls/jumps, removing transitions to managed (#975) * Implement Jump Table for Native Calls NOTE: this slows down rejit considerably! Not recommended to be used without codegen optimisation or AOT. - Does not work on Linux - A32 needs an additional commit. * A32 Support (WIP) * Actually write Direct Call pointers to the table That would help. * Direct Calls: Rather than returning to the translator, attempt to keep within the native stack frame. A return to the translator can still happen, but only by exceptionally bubbling up to it. Also: - Always translate lowCq as a function. Faster interop with the direct jumps, and this will be useful in future if we want to do speculative translation. - Tail Call Detection: after the decoding stage, detect if we do a tail call, and avoid translating into it. Detected if a jump is made to an address outwith the contiguous sequence of blocks surrounding the entry point. The goal is to reduce code touched by jit and rejit. * A32 Support * Use smaller max function size for lowCq, fix exceptional returns When a return has an unexpected value and there is no code block following this one, we now return the value rather than continuing. * CompareAndSwap (buggy) * Ensure CompareAndSwap does not get optimized away. * Use CompareAndSwap to make the dynamic table thread safe. * Tail call for linux, throw on too many arguments. * Combine CompareAndSwap 128 and 32/64. They emit different IR instructions since their PreAllocator behaviour is different, but now they just have one function on EmitterContext. * Fix issues separating from optimisations. * Use a stub to find and execute missing functions. This allows us to skip doing many runtime comparisons and branches, and reduces the amount of code we need to emit significantly. For the indirect call table, this stub also does the work of moving in the highCq address to the table when one is found. * Make Jump Tables and Jit Cache dynmically resize Reserve virtual memory, commit as needed. * Move TailCallRemover to its own class. * Multithreaded Translation (based on heuristic) A poor one, at that. Need to get core count for a better one, which means a lot of OS specific garbage. * Better priority management for background threads. * Bound core limit a bit more Past a certain point the load is not paralellizable and starts stealing from the main thread. Likely due to GC, memory, heap allocation thread contention. Reduce by one core til optimisations come to improve the situation. * Fix memory management on linux. * Temporary solution to some sync problems. This will make sure threads exit correctly, most of the time. There is a potential race where setting the sync counter to 0 does nothing (counter stays at what it was before, thread could take too long to exit), but we need to find a better way to do this anyways. Synchronization frequency has been tightened as we never enter blockwise segments of code. Essentially this means, check every x functions or loop iterations, before lowcq blocks existed and were worth just as much. Ideally it should be done in a better way, since functions can be anywhere from 1 to 5000 instructions. (maybe based on host timer, or an interrupt flag from a scheduler thread) * Address feedback minus CompareAndSwap change. * Use default ReservedRegion granularity. * Merge CompareAndSwap with its V128 variant. * We already got the source, no need to do it again. * Make sure all background translation threads exit. * Fix CompareAndSwap128 Detection criteria was a bit scuffed. * Address Comments.
2020-03-12 03:20:55 +00:00
{
Name = "CPU.BackgroundTranslatorThread." + i,
Priority = last ? ThreadPriority.Lowest : ThreadPriority.Normal
};
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
Use a Jump Table for direct and indirect calls/jumps, removing transitions to managed (#975) * Implement Jump Table for Native Calls NOTE: this slows down rejit considerably! Not recommended to be used without codegen optimisation or AOT. - Does not work on Linux - A32 needs an additional commit. * A32 Support (WIP) * Actually write Direct Call pointers to the table That would help. * Direct Calls: Rather than returning to the translator, attempt to keep within the native stack frame. A return to the translator can still happen, but only by exceptionally bubbling up to it. Also: - Always translate lowCq as a function. Faster interop with the direct jumps, and this will be useful in future if we want to do speculative translation. - Tail Call Detection: after the decoding stage, detect if we do a tail call, and avoid translating into it. Detected if a jump is made to an address outwith the contiguous sequence of blocks surrounding the entry point. The goal is to reduce code touched by jit and rejit. * A32 Support * Use smaller max function size for lowCq, fix exceptional returns When a return has an unexpected value and there is no code block following this one, we now return the value rather than continuing. * CompareAndSwap (buggy) * Ensure CompareAndSwap does not get optimized away. * Use CompareAndSwap to make the dynamic table thread safe. * Tail call for linux, throw on too many arguments. * Combine CompareAndSwap 128 and 32/64. They emit different IR instructions since their PreAllocator behaviour is different, but now they just have one function on EmitterContext. * Fix issues separating from optimisations. * Use a stub to find and execute missing functions. This allows us to skip doing many runtime comparisons and branches, and reduces the amount of code we need to emit significantly. For the indirect call table, this stub also does the work of moving in the highCq address to the table when one is found. * Make Jump Tables and Jit Cache dynmically resize Reserve virtual memory, commit as needed. * Move TailCallRemover to its own class. * Multithreaded Translation (based on heuristic) A poor one, at that. Need to get core count for a better one, which means a lot of OS specific garbage. * Better priority management for background threads. * Bound core limit a bit more Past a certain point the load is not paralellizable and starts stealing from the main thread. Likely due to GC, memory, heap allocation thread contention. Reduce by one core til optimisations come to improve the situation. * Fix memory management on linux. * Temporary solution to some sync problems. This will make sure threads exit correctly, most of the time. There is a potential race where setting the sync counter to 0 does nothing (counter stays at what it was before, thread could take too long to exit), but we need to find a better way to do this anyways. Synchronization frequency has been tightened as we never enter blockwise segments of code. Essentially this means, check every x functions or loop iterations, before lowcq blocks existed and were worth just as much. Ideally it should be done in a better way, since functions can be anywhere from 1 to 5000 instructions. (maybe based on host timer, or an interrupt flag from a scheduler thread) * Address feedback minus CompareAndSwap change. * Use default ReservedRegion granularity. * Merge CompareAndSwap with its V128 variant. * We already got the source, no need to do it again. * Make sure all background translation threads exit. * Fix CompareAndSwap128 Detection criteria was a bit scuffed. * Address Comments.
2020-03-12 03:20:55 +00:00
backgroundTranslatorThread.Start();
}
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
}
Statistics.InitializeTimer();
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
NativeInterface.RegisterThread(context, Memory, this);
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
if (Optimizations.UseUnmanagedDispatchLoop)
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
{
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
Stubs.DispatchLoop(context.NativeContextPtr, address);
}
else
{
do
{
address = ExecuteSingle(context, address);
}
while (context.Running && address != 0);
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
}
NativeInterface.UnregisterThread();
if (Interlocked.Decrement(ref _threadCount) == 0)
{
ClearJitCache();
2022-09-08 23:14:08 +00:00
Queue.Dispose();
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
Stubs.Dispose();
FunctionTable.Dispose();
CountTable.Dispose();
_ptc.Close();
_ptc.Profiler.Stop();
_ptc.Dispose();
_ptc.Profiler.Dispose();
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
}
}
private ulong ExecuteSingle(State.ExecutionContext context, ulong address)
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
{
TranslatedFunction func = GetOrTranslate(address, context.ExecutionMode);
Statistics.StartTimer();
ulong nextAddr = func.Execute(context);
Statistics.StopTimer(address);
return nextAddr;
}
public ulong Step(State.ExecutionContext context, ulong address)
{
TranslatedFunction func = Translate(address, context.ExecutionMode, highCq: false, singleStep: true);
address = func.Execute(context);
EnqueueForDeletion(address, func);
return address;
}
internal TranslatedFunction GetOrTranslate(ulong address, ExecutionMode mode)
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
{
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
if (!Functions.TryGetValue(address, out TranslatedFunction func))
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
{
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
func = Translate(address, mode, highCq: false);
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
TranslatedFunction oldFunc = Functions.GetOrAdd(address, func.GuestSize, func);
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
if (oldFunc != func)
{
JitCache.Unmap(func.FuncPtr);
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
func = oldFunc;
}
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
if (_ptc.Profiler.Enabled)
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
{
_ptc.Profiler.AddEntry(address, mode, highCq: false);
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
}
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
RegisterFunction(address, func);
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
}
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
return func;
}
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
internal void RegisterFunction(ulong guestAddress, TranslatedFunction func)
{
if (FunctionTable.IsValid(guestAddress) && (Optimizations.AllowLcqInFunctionTable || func.HighCq))
{
Volatile.Write(ref FunctionTable.GetValue(guestAddress), (ulong)func.FuncPtr);
}
}
internal TranslatedFunction Translate(ulong address, ExecutionMode mode, bool highCq, bool singleStep = false)
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
{
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
var context = new ArmEmitterContext(
Memory,
CountTable,
FunctionTable,
Stubs,
address,
highCq,
_ptc.State != PtcState.Disabled,
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
mode: Aarch32Mode.User);
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
Logger.StartPass(PassName.Decoding);
Block[] blocks = Decoder.Decode(Memory, address, mode, highCq, singleStep ? DecoderMode.SingleInstruction : DecoderMode.MultipleBlocks);
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
Logger.EndPass(PassName.Decoding);
Logger.StartPass(PassName.Translation);
EmitSynchronization(context);
if (blocks[0].Address != address)
{
context.Branch(context.GetLabel(address));
}
ControlFlowGraph cfg = EmitAndGetCFG(context, blocks, out Range funcRange, out Counter<uint> counter);
ulong funcSize = funcRange.End - funcRange.Start;
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
Reduce JIT GC allocations (#2515) * Turn `MemoryOperand` into a struct * Remove `IntrinsicOperation` * Remove `PhiNode` * Remove `Node` * Turn `Operand` into a struct * Turn `Operation` into a struct * Clean up pool management methods * Add `Arena` allocator * Move `OperationHelper` to `Operation.Factory` * Move `OperandHelper` to `Operand.Factory` * Optimize `Operation` a bit * Fix `Arena` initialization * Rename `NativeList<T>` to `ArenaList<T>` * Reduce `Operand` size from 88 to 56 bytes * Reduce `Operation` size from 56 to 40 bytes * Add optimistic interning of Register & Constant operands * Optimize `RegisterUsage` pass a bit * Optimize `RemoveUnusedNodes` pass a bit Iterating in reverse-order allows killing dependency chains in a single pass. * Fix PPTC symbols * Optimize `BasicBlock` a bit Reduce allocations from `_successor` & `DominanceFrontiers` * Fix `Operation` resize * Make `Arena` expandable Change the arena allocator to be expandable by allocating in pages, with some of them being pooled. Currently 32 pages are pooled. An LRU removal mechanism should probably be added to it. Apparently MHR can allocate bitmaps large enough to exceed the 16MB limit for the type. * Move `Arena` & `ArenaList` to `Common` * Remove `ThreadStaticPool` & co * Add `PhiOperation` * Reduce `Operand` size from 56 from 48 bytes * Add linear-probing to `Operand` intern table * Optimize `HybridAllocator` a bit * Add `Allocators` class * Tune `ArenaAllocator` sizes * Add page removal mechanism to `ArenaAllocator` Remove pages which have not been used for more than 5s after each reset. I am on fence if this would be better using a Gen2 callback object like the one in System.Buffers.ArrayPool<T>, to trim the pool. Because right now if a large translation happens, the pages will be freed only after a reset. This reset may not happen for a while because no new translation is hit, but the arena base sizes are rather small. * Fix `OOM` when allocating larger than page size in `ArenaAllocator` Tweak resizing mechanism for Operand.Uses and Assignemnts. * Optimize `Optimizer` a bit * Optimize `Operand.Add<T>/Remove<T>` a bit * Clean up `PreAllocator` * Fix phi insertion order Reduce codegen diffs. * Fix code alignment * Use new heuristics for degree of parallelism * Suppress warnings * Address gdkchan's feedback Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that `Operand.Value` should usually not be modified directly. * Add fast path to `ArenaAllocator` * Assembly for `ArenaAllocator.Allocate(ulong)`: .L0: mov rax, [rcx+0x18] lea r8, [rax+rdx] cmp r8, [rcx+0x10] ja short .L2 .L1: mov rdx, [rcx+8] add rax, [rdx+8] mov [rcx+0x18], r8 ret .L2: jmp ArenaAllocator.AllocateSlow(UInt64) A few variable/field had to be changed to ulong so that RyuJIT avoids emitting zero-extends. * Implement a new heuristic to free pooled pages. If an arena is used often, it is more likely that its pages will be needed, so the pages are kept for longer (e.g: during PPTC rebuild or burst sof compilations). If is not used often, then it is more likely that its pages will not be needed (e.g: after PPTC rebuild or bursts of compilations). * Address riperiperi's feedback * Use `EqualityComparer<T>` in `IntrusiveList<T>` Avoids a potential GC hole in `Equals(T, T)`.
2021-08-17 18:08:34 +00:00
Logger.EndPass(PassName.Translation, cfg);
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
Logger.StartPass(PassName.RegisterUsage);
RegisterUsage.RunPass(cfg, mode);
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
Logger.EndPass(PassName.RegisterUsage);
var retType = OperandType.I64;
var argTypes = new OperandType[] { OperandType.I64 };
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
var options = highCq ? CompilerOptions.HighCq : CompilerOptions.None;
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
if (context.HasPtc && !singleStep)
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
{
options |= CompilerOptions.Relocatable;
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
}
CompiledFunction compiledFunc = Compiler.Compile(cfg, argTypes, retType, options, RuntimeInformation.ProcessArchitecture);
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
if (context.HasPtc && !singleStep)
{
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
Hash128 hash = Ptc.ComputeHash(Memory, address, funcSize);
_ptc.WriteCompiledFunction(address, funcSize, hash, highCq, compiledFunc);
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
}
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
GuestFunction func = compiledFunc.Map<GuestFunction>();
Reduce JIT GC allocations (#2515) * Turn `MemoryOperand` into a struct * Remove `IntrinsicOperation` * Remove `PhiNode` * Remove `Node` * Turn `Operand` into a struct * Turn `Operation` into a struct * Clean up pool management methods * Add `Arena` allocator * Move `OperationHelper` to `Operation.Factory` * Move `OperandHelper` to `Operand.Factory` * Optimize `Operation` a bit * Fix `Arena` initialization * Rename `NativeList<T>` to `ArenaList<T>` * Reduce `Operand` size from 88 to 56 bytes * Reduce `Operation` size from 56 to 40 bytes * Add optimistic interning of Register & Constant operands * Optimize `RegisterUsage` pass a bit * Optimize `RemoveUnusedNodes` pass a bit Iterating in reverse-order allows killing dependency chains in a single pass. * Fix PPTC symbols * Optimize `BasicBlock` a bit Reduce allocations from `_successor` & `DominanceFrontiers` * Fix `Operation` resize * Make `Arena` expandable Change the arena allocator to be expandable by allocating in pages, with some of them being pooled. Currently 32 pages are pooled. An LRU removal mechanism should probably be added to it. Apparently MHR can allocate bitmaps large enough to exceed the 16MB limit for the type. * Move `Arena` & `ArenaList` to `Common` * Remove `ThreadStaticPool` & co * Add `PhiOperation` * Reduce `Operand` size from 56 from 48 bytes * Add linear-probing to `Operand` intern table * Optimize `HybridAllocator` a bit * Add `Allocators` class * Tune `ArenaAllocator` sizes * Add page removal mechanism to `ArenaAllocator` Remove pages which have not been used for more than 5s after each reset. I am on fence if this would be better using a Gen2 callback object like the one in System.Buffers.ArrayPool<T>, to trim the pool. Because right now if a large translation happens, the pages will be freed only after a reset. This reset may not happen for a while because no new translation is hit, but the arena base sizes are rather small. * Fix `OOM` when allocating larger than page size in `ArenaAllocator` Tweak resizing mechanism for Operand.Uses and Assignemnts. * Optimize `Optimizer` a bit * Optimize `Operand.Add<T>/Remove<T>` a bit * Clean up `PreAllocator` * Fix phi insertion order Reduce codegen diffs. * Fix code alignment * Use new heuristics for degree of parallelism * Suppress warnings * Address gdkchan's feedback Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that `Operand.Value` should usually not be modified directly. * Add fast path to `ArenaAllocator` * Assembly for `ArenaAllocator.Allocate(ulong)`: .L0: mov rax, [rcx+0x18] lea r8, [rax+rdx] cmp r8, [rcx+0x10] ja short .L2 .L1: mov rdx, [rcx+8] add rax, [rdx+8] mov [rcx+0x18], r8 ret .L2: jmp ArenaAllocator.AllocateSlow(UInt64) A few variable/field had to be changed to ulong so that RyuJIT avoids emitting zero-extends. * Implement a new heuristic to free pooled pages. If an arena is used often, it is more likely that its pages will be needed, so the pages are kept for longer (e.g: during PPTC rebuild or burst sof compilations). If is not used often, then it is more likely that its pages will not be needed (e.g: after PPTC rebuild or bursts of compilations). * Address riperiperi's feedback * Use `EqualityComparer<T>` in `IntrusiveList<T>` Avoids a potential GC hole in `Equals(T, T)`.
2021-08-17 18:08:34 +00:00
Allocators.ResetAll();
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
return new TranslatedFunction(func, counter, funcSize, highCq);
}
2022-09-08 23:14:08 +00:00
private void BackgroundTranslate()
{
while (_threadCount != 0 && Queue.TryDequeue(out RejitRequest request))
{
TranslatedFunction func = Translate(request.Address, request.Mode, highCq: true);
Functions.AddOrUpdate(request.Address, func.GuestSize, func, (key, oldFunc) =>
{
EnqueueForDeletion(key, oldFunc);
return func;
});
if (_ptc.Profiler.Enabled)
2022-09-08 23:14:08 +00:00
{
_ptc.Profiler.UpdateEntry(request.Address, request.Mode, highCq: true);
2022-09-08 23:14:08 +00:00
}
RegisterFunction(request.Address, func);
}
}
private readonly struct Range
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
{
public ulong Start { get; }
public ulong End { get; }
public Range(ulong start, ulong end)
{
Start = start;
End = end;
}
}
private static ControlFlowGraph EmitAndGetCFG(
ArmEmitterContext context,
Block[] blocks,
out Range range,
out Counter<uint> counter)
{
counter = null;
ulong rangeStart = ulong.MaxValue;
ulong rangeEnd = 0;
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
for (int blkIndex = 0; blkIndex < blocks.Length; blkIndex++)
{
Block block = blocks[blkIndex];
if (!block.Exit)
{
if (rangeStart > block.Address)
{
rangeStart = block.Address;
}
if (rangeEnd < block.EndAddress)
{
rangeEnd = block.EndAddress;
}
}
if (block.Address == context.EntryAddress)
{
if (!context.HighCq)
{
EmitRejitCheck(context, out counter);
}
context.ClearQcFlag();
}
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
context.CurrBlock = block;
context.MarkLabel(context.GetLabel(block.Address));
if (block.Exit)
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
{
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
// Left option here as it may be useful if we need to return to managed rather than tail call in
// future. (eg. for debug)
bool useReturns = false;
InstEmitFlowHelper.EmitVirtualJump(context, Const(block.Address), isReturn: useReturns);
}
else
{
for (int opcIndex = 0; opcIndex < block.OpCodes.Count; opcIndex++)
{
OpCode opCode = block.OpCodes[opcIndex];
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
context.CurrOp = opCode;
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
bool isLastOp = opcIndex == block.OpCodes.Count - 1;
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
if (isLastOp)
{
context.SyncQcFlag();
if (block.Branch != null && !block.Branch.Exit && block.Branch.Address <= block.Address)
{
EmitSynchronization(context);
}
}
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
Reduce JIT GC allocations (#2515) * Turn `MemoryOperand` into a struct * Remove `IntrinsicOperation` * Remove `PhiNode` * Remove `Node` * Turn `Operand` into a struct * Turn `Operation` into a struct * Clean up pool management methods * Add `Arena` allocator * Move `OperationHelper` to `Operation.Factory` * Move `OperandHelper` to `Operand.Factory` * Optimize `Operation` a bit * Fix `Arena` initialization * Rename `NativeList<T>` to `ArenaList<T>` * Reduce `Operand` size from 88 to 56 bytes * Reduce `Operation` size from 56 to 40 bytes * Add optimistic interning of Register & Constant operands * Optimize `RegisterUsage` pass a bit * Optimize `RemoveUnusedNodes` pass a bit Iterating in reverse-order allows killing dependency chains in a single pass. * Fix PPTC symbols * Optimize `BasicBlock` a bit Reduce allocations from `_successor` & `DominanceFrontiers` * Fix `Operation` resize * Make `Arena` expandable Change the arena allocator to be expandable by allocating in pages, with some of them being pooled. Currently 32 pages are pooled. An LRU removal mechanism should probably be added to it. Apparently MHR can allocate bitmaps large enough to exceed the 16MB limit for the type. * Move `Arena` & `ArenaList` to `Common` * Remove `ThreadStaticPool` & co * Add `PhiOperation` * Reduce `Operand` size from 56 from 48 bytes * Add linear-probing to `Operand` intern table * Optimize `HybridAllocator` a bit * Add `Allocators` class * Tune `ArenaAllocator` sizes * Add page removal mechanism to `ArenaAllocator` Remove pages which have not been used for more than 5s after each reset. I am on fence if this would be better using a Gen2 callback object like the one in System.Buffers.ArrayPool<T>, to trim the pool. Because right now if a large translation happens, the pages will be freed only after a reset. This reset may not happen for a while because no new translation is hit, but the arena base sizes are rather small. * Fix `OOM` when allocating larger than page size in `ArenaAllocator` Tweak resizing mechanism for Operand.Uses and Assignemnts. * Optimize `Optimizer` a bit * Optimize `Operand.Add<T>/Remove<T>` a bit * Clean up `PreAllocator` * Fix phi insertion order Reduce codegen diffs. * Fix code alignment * Use new heuristics for degree of parallelism * Suppress warnings * Address gdkchan's feedback Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that `Operand.Value` should usually not be modified directly. * Add fast path to `ArenaAllocator` * Assembly for `ArenaAllocator.Allocate(ulong)`: .L0: mov rax, [rcx+0x18] lea r8, [rax+rdx] cmp r8, [rcx+0x10] ja short .L2 .L1: mov rdx, [rcx+8] add rax, [rdx+8] mov [rcx+0x18], r8 ret .L2: jmp ArenaAllocator.AllocateSlow(UInt64) A few variable/field had to be changed to ulong so that RyuJIT avoids emitting zero-extends. * Implement a new heuristic to free pooled pages. If an arena is used often, it is more likely that its pages will be needed, so the pages are kept for longer (e.g: during PPTC rebuild or burst sof compilations). If is not used often, then it is more likely that its pages will not be needed (e.g: after PPTC rebuild or bursts of compilations). * Address riperiperi's feedback * Use `EqualityComparer<T>` in `IntrusiveList<T>` Avoids a potential GC hole in `Equals(T, T)`.
2021-08-17 18:08:34 +00:00
Operand lblPredicateSkip = default;
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
2022-02-17 22:39:45 +00:00
if (context.IsInIfThenBlock && context.CurrentIfThenBlockCond != Condition.Al)
{
lblPredicateSkip = Label();
InstEmitFlowHelper.EmitCondBranch(context, lblPredicateSkip, context.CurrentIfThenBlockCond.Invert());
}
if (opCode is OpCode32 op && op.Cond < Condition.Al)
{
lblPredicateSkip = Label();
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
InstEmitFlowHelper.EmitCondBranch(context, lblPredicateSkip, op.Cond.Invert());
}
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
if (opCode.Instruction.Emitter != null)
{
opCode.Instruction.Emitter(context);
}
else
{
throw new InvalidOperationException($"Invalid instruction \"{opCode.Instruction.Name}\".");
}
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
Reduce JIT GC allocations (#2515) * Turn `MemoryOperand` into a struct * Remove `IntrinsicOperation` * Remove `PhiNode` * Remove `Node` * Turn `Operand` into a struct * Turn `Operation` into a struct * Clean up pool management methods * Add `Arena` allocator * Move `OperationHelper` to `Operation.Factory` * Move `OperandHelper` to `Operand.Factory` * Optimize `Operation` a bit * Fix `Arena` initialization * Rename `NativeList<T>` to `ArenaList<T>` * Reduce `Operand` size from 88 to 56 bytes * Reduce `Operation` size from 56 to 40 bytes * Add optimistic interning of Register & Constant operands * Optimize `RegisterUsage` pass a bit * Optimize `RemoveUnusedNodes` pass a bit Iterating in reverse-order allows killing dependency chains in a single pass. * Fix PPTC symbols * Optimize `BasicBlock` a bit Reduce allocations from `_successor` & `DominanceFrontiers` * Fix `Operation` resize * Make `Arena` expandable Change the arena allocator to be expandable by allocating in pages, with some of them being pooled. Currently 32 pages are pooled. An LRU removal mechanism should probably be added to it. Apparently MHR can allocate bitmaps large enough to exceed the 16MB limit for the type. * Move `Arena` & `ArenaList` to `Common` * Remove `ThreadStaticPool` & co * Add `PhiOperation` * Reduce `Operand` size from 56 from 48 bytes * Add linear-probing to `Operand` intern table * Optimize `HybridAllocator` a bit * Add `Allocators` class * Tune `ArenaAllocator` sizes * Add page removal mechanism to `ArenaAllocator` Remove pages which have not been used for more than 5s after each reset. I am on fence if this would be better using a Gen2 callback object like the one in System.Buffers.ArrayPool<T>, to trim the pool. Because right now if a large translation happens, the pages will be freed only after a reset. This reset may not happen for a while because no new translation is hit, but the arena base sizes are rather small. * Fix `OOM` when allocating larger than page size in `ArenaAllocator` Tweak resizing mechanism for Operand.Uses and Assignemnts. * Optimize `Optimizer` a bit * Optimize `Operand.Add<T>/Remove<T>` a bit * Clean up `PreAllocator` * Fix phi insertion order Reduce codegen diffs. * Fix code alignment * Use new heuristics for degree of parallelism * Suppress warnings * Address gdkchan's feedback Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that `Operand.Value` should usually not be modified directly. * Add fast path to `ArenaAllocator` * Assembly for `ArenaAllocator.Allocate(ulong)`: .L0: mov rax, [rcx+0x18] lea r8, [rax+rdx] cmp r8, [rcx+0x10] ja short .L2 .L1: mov rdx, [rcx+8] add rax, [rdx+8] mov [rcx+0x18], r8 ret .L2: jmp ArenaAllocator.AllocateSlow(UInt64) A few variable/field had to be changed to ulong so that RyuJIT avoids emitting zero-extends. * Implement a new heuristic to free pooled pages. If an arena is used often, it is more likely that its pages will be needed, so the pages are kept for longer (e.g: during PPTC rebuild or burst sof compilations). If is not used often, then it is more likely that its pages will not be needed (e.g: after PPTC rebuild or bursts of compilations). * Address riperiperi's feedback * Use `EqualityComparer<T>` in `IntrusiveList<T>` Avoids a potential GC hole in `Equals(T, T)`.
2021-08-17 18:08:34 +00:00
if (lblPredicateSkip != default)
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
{
context.MarkLabel(lblPredicateSkip);
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
}
2022-02-17 22:39:45 +00:00
if (context.IsInIfThenBlock && opCode.Instruction.Name != InstName.It)
{
context.AdvanceIfThenBlockState();
}
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
}
}
}
range = new Range(rangeStart, rangeEnd);
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
return context.GetControlFlowGraph();
}
internal static void EmitRejitCheck(ArmEmitterContext context, out Counter<uint> counter)
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
{
const int MinsCallForRejit = 100;
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
counter = new Counter<uint>(context.CountTable);
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
Operand lblEnd = Label();
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
Operand address = !context.HasPtc ?
Const(ref counter.Value) :
Const(ref counter.Value, Ptc.CountTableSymbol);
Operand curCount = context.Load(OperandType.I32, address);
Operand count = context.Add(curCount, Const(1));
context.Store(address, count);
context.BranchIf(lblEnd, curCount, Const(MinsCallForRejit), Comparison.NotEqual, BasicBlockFrequency.Cold);
context.Call(typeof(NativeInterface).GetMethod(nameof(NativeInterface.EnqueueForRejit)), Const(context.EntryAddress));
context.MarkLabel(lblEnd);
}
internal static void EmitSynchronization(EmitterContext context)
{
long countOffs = NativeContext.GetCounterOffset();
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
Operand lblNonZero = Label();
Operand lblExit = Label();
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
Operand countAddr = context.Add(context.LoadArgument(OperandType.I64, 0), Const(countOffs));
Operand count = context.Load(OperandType.I32, countAddr);
context.BranchIfTrue(lblNonZero, count, BasicBlockFrequency.Cold);
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
Operand running = context.Call(typeof(NativeInterface).GetMethod(nameof(NativeInterface.CheckSynchronization)));
context.BranchIfTrue(lblExit, running, BasicBlockFrequency.Cold);
Use a Jump Table for direct and indirect calls/jumps, removing transitions to managed (#975) * Implement Jump Table for Native Calls NOTE: this slows down rejit considerably! Not recommended to be used without codegen optimisation or AOT. - Does not work on Linux - A32 needs an additional commit. * A32 Support (WIP) * Actually write Direct Call pointers to the table That would help. * Direct Calls: Rather than returning to the translator, attempt to keep within the native stack frame. A return to the translator can still happen, but only by exceptionally bubbling up to it. Also: - Always translate lowCq as a function. Faster interop with the direct jumps, and this will be useful in future if we want to do speculative translation. - Tail Call Detection: after the decoding stage, detect if we do a tail call, and avoid translating into it. Detected if a jump is made to an address outwith the contiguous sequence of blocks surrounding the entry point. The goal is to reduce code touched by jit and rejit. * A32 Support * Use smaller max function size for lowCq, fix exceptional returns When a return has an unexpected value and there is no code block following this one, we now return the value rather than continuing. * CompareAndSwap (buggy) * Ensure CompareAndSwap does not get optimized away. * Use CompareAndSwap to make the dynamic table thread safe. * Tail call for linux, throw on too many arguments. * Combine CompareAndSwap 128 and 32/64. They emit different IR instructions since their PreAllocator behaviour is different, but now they just have one function on EmitterContext. * Fix issues separating from optimisations. * Use a stub to find and execute missing functions. This allows us to skip doing many runtime comparisons and branches, and reduces the amount of code we need to emit significantly. For the indirect call table, this stub also does the work of moving in the highCq address to the table when one is found. * Make Jump Tables and Jit Cache dynmically resize Reserve virtual memory, commit as needed. * Move TailCallRemover to its own class. * Multithreaded Translation (based on heuristic) A poor one, at that. Need to get core count for a better one, which means a lot of OS specific garbage. * Better priority management for background threads. * Bound core limit a bit more Past a certain point the load is not paralellizable and starts stealing from the main thread. Likely due to GC, memory, heap allocation thread contention. Reduce by one core til optimisations come to improve the situation. * Fix memory management on linux. * Temporary solution to some sync problems. This will make sure threads exit correctly, most of the time. There is a potential race where setting the sync counter to 0 does nothing (counter stays at what it was before, thread could take too long to exit), but we need to find a better way to do this anyways. Synchronization frequency has been tightened as we never enter blockwise segments of code. Essentially this means, check every x functions or loop iterations, before lowcq blocks existed and were worth just as much. Ideally it should be done in a better way, since functions can be anywhere from 1 to 5000 instructions. (maybe based on host timer, or an interrupt flag from a scheduler thread) * Address feedback minus CompareAndSwap change. * Use default ReservedRegion granularity. * Merge CompareAndSwap with its V128 variant. * We already got the source, no need to do it again. * Make sure all background translation threads exit. * Fix CompareAndSwap128 Detection criteria was a bit scuffed. * Address Comments.
2020-03-12 03:20:55 +00:00
context.Return(Const(0L));
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
context.MarkLabel(lblNonZero);
count = context.Subtract(count, Const(1));
context.Store(countAddr, count);
context.MarkLabel(lblExit);
}
public void InvalidateJitCacheRegion(ulong address, ulong size)
{
ulong[] overlapAddresses = Array.Empty<ulong>();
int overlapsCount = Functions.GetOverlaps(address, size, ref overlapAddresses);
if (overlapsCount != 0)
{
// If rejit is running, stop it as it may be trying to rejit a function on the invalidated region.
ClearRejitQueue(allowRequeue: true);
}
for (int index = 0; index < overlapsCount; index++)
{
ulong overlapAddress = overlapAddresses[index];
if (Functions.TryGetValue(overlapAddress, out TranslatedFunction overlap))
{
Functions.Remove(overlapAddress);
Volatile.Write(ref FunctionTable.GetValue(overlapAddress), FunctionTable.Fill);
EnqueueForDeletion(overlapAddress, overlap);
}
}
// TODO: Remove overlapping functions from the JitCache aswell.
// This should be done safely, with a mechanism to ensure the function is not being executed.
}
internal void EnqueueForRejit(ulong guestAddress, ExecutionMode mode)
{
2022-09-08 23:14:08 +00:00
Queue.Enqueue(guestAddress, mode);
}
private void EnqueueForDeletion(ulong guestAddress, TranslatedFunction func)
{
_oldFuncs.Enqueue(new(guestAddress, func));
}
private void ClearJitCache()
{
// Ensure no attempt will be made to compile new functions due to rejit.
ClearRejitQueue(allowRequeue: false);
List<TranslatedFunction> functions = Functions.AsList();
foreach (var func in functions)
{
JitCache.Unmap(func.FuncPtr);
func.CallCounter?.Dispose();
}
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
Functions.Clear();
while (_oldFuncs.TryDequeue(out var kv))
{
JitCache.Unmap(kv.Value.FuncPtr);
kv.Value.CallCounter?.Dispose();
}
}
private void ClearRejitQueue(bool allowRequeue)
{
2022-09-08 23:14:08 +00:00
if (!allowRequeue)
{
Queue.Clear();
return;
}
2022-09-08 23:14:08 +00:00
lock (Queue.Sync)
{
2022-09-08 23:14:08 +00:00
while (Queue.Count > 0 && Queue.TryDequeue(out RejitRequest request))
{
Add multi-level function table (#2228) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax*4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax*8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped
2021-05-29 21:06:28 +00:00
if (Functions.TryGetValue(request.Address, out var func) && func.CallCounter != null)
{
Volatile.Write(ref func.CallCounter.Value, 0);
}
}
}
}
Add a new JIT compiler for CPU code (#693) * Start of the ARMeilleure project * Refactoring around the old IRAdapter, now renamed to PreAllocator * Optimize the LowestBitSet method * Add CLZ support and fix CLS implementation * Add missing Equals and GetHashCode overrides on some structs, misc small tweaks * Implement the ByteSwap IR instruction, and some refactoring on the assembler * Implement the DivideUI IR instruction and fix 64-bits IDIV * Correct constant operand type on CSINC * Move division instructions implementation to InstEmitDiv * Fix destination type for the ConditionalSelect IR instruction * Implement UMULH and SMULH, with new IR instructions * Fix some issues with shift instructions * Fix constant types for BFM instructions * Fix up new tests using the new V128 struct * Update tests * Move DIV tests to a separate file * Add support for calls, and some instructions that depends on them * Start adding support for SIMD & FP types, along with some of the related ARM instructions * Fix some typos and the divide instruction with FP operands * Fix wrong method call on Clz_V * Implement ARM FP & SIMD move instructions, Saddlv_V, and misc. fixes * Implement SIMD logical instructions and more misc. fixes * Fix PSRAD x86 instruction encoding, TRN, UABD and UABDL implementations * Implement float conversion instruction, merge in LDj3SNuD fixes, and some other misc. fixes * Implement SIMD shift instruction and fix Dup_V * Add SCVTF and UCVTF (vector, fixed-point) variants to the opcode table * Fix check with tolerance on tester * Implement FP & SIMD comparison instructions, and some fixes * Update FCVT (Scalar) encoding on the table to support the Half-float variants * Support passing V128 structs, some cleanup on the register allocator, merge LDj3SNuD fixes * Use old memory access methods, made a start on SIMD memory insts support, some fixes * Fix float constant passed to functions, save and restore non-volatile XMM registers, other fixes * Fix arguments count with struct return values, other fixes * More instructions * Misc. fixes and integrate LDj3SNuD fixes * Update tests * Add a faster linear scan allocator, unwinding support on windows, and other changes * Update Ryujinx.HLE * Update Ryujinx.Graphics * Fix V128 return pointer passing, RCX is clobbered * Update Ryujinx.Tests * Update ITimeZoneService * Stop using GetFunctionPointer as that can't be called from native code, misc. fixes and tweaks * Use generic GetFunctionPointerForDelegate method and other tweaks * Some refactoring on the code generator, assert on invalid operations and use a separate enum for intrinsics * Remove some unused code on the assembler * Fix REX.W prefix regression on float conversion instructions, add some sort of profiler * Add hardware capability detection * Fix regression on Sha1h and revert Fcm** changes * Add SSE2-only paths on vector extract and insert, some refactoring on the pre-allocator * Fix silly mistake introduced on last commit on CpuId * Generate inline stack probes when the stack allocation is too large * Initial support for the System-V ABI * Support multiple destination operands * Fix SSE2 VectorInsert8 path, and other fixes * Change placement of XMM callee save and restore code to match other compilers * Rename Dest to Destination and Inst to Instruction * Fix a regression related to calls and the V128 type * Add an extra space on comments to match code style * Some refactoring * Fix vector insert FP32 SSE2 path * Port over the ARM32 instructions * Avoid memory protection races on JIT Cache * Another fix on VectorInsert FP32 (thanks to LDj3SNuD * Float operands don't need to use the same register when VEX is supported * Add a new register allocator, higher quality code for hot code (tier up), and other tweaks * Some nits, small improvements on the pre allocator * CpuThreadState is gone * Allow changing CPU emulators with a config entry * Add runtime identifiers on the ARMeilleure project * Allow switching between CPUs through a config entry (pt. 2) * Change win10-x64 to win-x64 on projects * Update the Ryujinx project to use ARMeilleure * Ensure that the selected register is valid on the hybrid allocator * Allow exiting on returns to 0 (should fix test regression) * Remove register assignments for most used variables on the hybrid allocator * Do not use fixed registers as spill temp * Add missing namespace and remove unneeded using * Address PR feedback * Fix types, etc * Enable AssumeStrictAbiCompliance by default * Ensure that Spill and Fill don't load or store any more than necessary
2019-08-08 18:56:22 +00:00
}
Add Profiled Persistent Translation Cache. (#769) * Delete DelegateTypes.cs * Delete DelegateCache.cs * Add files via upload * Update Horizon.cs * Update Program.cs * Update MainWindow.cs * Update Aot.cs * Update RelocEntry.cs * Update Translator.cs * Update MemoryManager.cs * Update InstEmitMemoryHelper.cs * Update Delegates.cs * Nit. * Nit. * Nit. * 10 fewer MSIL bytes for us * Add comment. Nits. * Update Translator.cs * Update Aot.cs * Nits. * Opt.. * Opt.. * Opt.. * Opt.. * Allow to change compression level. * Update MemoryManager.cs * Update Translator.cs * Manage corner cases during the save phase. Nits. * Update Aot.cs * Translator response tweak for Aot disabled. Nit. * Nit. * Nits. * Create DelegateHelpers.cs * Update Delegates.cs * Nit. * Nit. * Nits. * Fix due to #784. * Fixes due to #757 & #841. * Fix due to #846. * Fix due to #847. * Use MethodInfo for managed method calls. Use IR methods instead of managed methods about Max/Min (S/U). Follow-ups & Nits. * Add missing exception messages. Reintroduce slow path for Fmov_Vi. Implement slow path for Fmov_Si. * Switch to the new folder structure. Nits. * Impl. index-based relocation information. Impl. cache file version field. * Nit. * Address gdkchan comments. Mainly: - fixed cache file corruption issue on exit; - exposed a way to disable AOT on the GUI. * Address AcK77 comment. * Address Thealexbarney, jduncanator & emmauss comments. Header magic, CpuId (FI) & Aot -> Ptc. * Adaptation to the new application reloading system. Improvements to the call system of managed methods. Follow-ups. Nits. * Get the same boot times as on master when PTC is disabled. * Profiled Aot. * A32 support (#897). * #975 support (1 of 2). * #975 support (2 of 2). * Rebase fix & nits. * Some fixes and nits (still one bug left). * One fix & nits. * Tests fix (by gdk) & nits. * Support translations not only in high quality and rejit. Nits. * Added possibility to skip translations and continue execution, using `ESC` key. * Update SettingsWindow.cs * Update GLRenderer.cs * Update Ptc.cs * Disabled Profiled PTC by default as requested in the past by gdk. * Fix rejit bug. Increased number of parallel translations. Add stack unwinding stuffs support (1 of 2). Nits. * Add stack unwinding stuffs support (2 of 2). Tuned number of parallel translations. * Restored the ability to assemble jumps with 8-bit offset when Profiled PTC is disabled or during profiling. Modifications due to rebase. Nits. * Limited profiling of the functions to be translated to the addresses belonging to the range of static objects only. * Nits. * Nits. * Update Delegates.cs * Nit. * Update InstEmitSimdArithmetic.cs * Address riperiperi comments. * Fixed the issue of unjustifiably longer boot times at the second boot than at the first boot, measured at the same time or reference point and with the same number of translated functions. * Implemented a simple redundant load/save mechanism. Halved the value of Decoder.MaxInstsPerFunction more appropriate for the current performance of the Translator. Replaced by Logger.PrintError to Logger.PrintDebug in TexturePool.cs about the supposed invalid texture format to avoid the spawn of the log. Nits. * Nit. Improved Logger.PrintError in TexturePool.cs to avoid log spawn. Added missing code for FZ handling (in output) for fp max/min instructions (slow paths). * Add configuration migration for PTC Co-authored-by: Thog <me@thog.eu>
2020-06-16 18:28:02 +00:00
}