ryujinx/Ryujinx

mirror of https://github.com/Ryujinx/Ryujinx.git synced 2026-05-07 17:13:32 +00:00

Author	SHA1	Message	Date
Wunk	45ce540b9b	ARMeilleure: Add `gfni` acceleration (#3669 ) * ARMeilleure: Add `GFNI` detection This is intended for utilizing the `gf2p8affineqb` instruction * ARMeilleure: Add `gf2p8affineqb` Not using the VEX or EVEX-form of this instruction is intentional. There are `GFNI`-chips that do not support AVX(so no VEX encoding) such as Tremont(Lakefield) chips as well as Jasper Lake. `13df339fe7/GenuineIntel/GenuineIntel00806A1_Lakefield_LC_InstLatX64.txt (L1297-L1299)` `13df339fe7/GenuineIntel/GenuineIntel00906C0_JasperLake_InstLatX64.txt (L1252-L1254)` * ARMeilleure: Add `gfni` acceleration of `Rbit_V` Passes all `Rbit_V` unit tests on my `i9-11900k` ARMeilleure: Add `gfni` acceleration of `S{l,r}i_V` Also added a fast-path for when the shift amount is greater than the size of the element. * ARMeilleure: Add `gfni` acceleration of `Shl_V` and `Sshr_V` * ARMeilleure: Increment InternalVersion * ARMeilleure: Fix Intrinsic and Assembler Table alignment `gf2p8affineqb` is the longest instruction name I know of. It shouldn't get any wider than this. * ARMeilleure: Remove SSE2+SHA requirement for GFNI * ARMeilleure Add `X86GetGf2p8LogicalShiftLeft` Used to generate GF(2^8) 8x8 bit-matrices for bit-shifting for the `gf2p8affineqb` instruction. * ARMeilleure: Append `FeatureInfo7Ecx` to `FeatureInfo`	2022-10-02 11:17:19 +02:00
mageven	96bf7f8522	Avoid allocating unmanaged string per shader (#3730 ) * Avoid reallocating same unmanaged string per shader * Address PR feedback * Rename to _disposed	2022-10-02 10:59:34 +02:00
Ac_K	33e673ceb8	fatal: Implement Service (#3573 ) * fatal: Implement Service This PR adds a basic implementation of fatal service, guest processes call it when there is something wrong. But since we can already have all informations by debugging it's not really useful. In any case, that's avoid an unimplemented service exception. Structs/Enum are based on Atmosphère source code. After logs the error report, I call SvcBreak. Feedbacks are welcome on this, since some guests calls it right after fatal service so I can remove it if needed. * Addresses gdkchan feedback	2022-10-02 10:30:46 +02:00
gdkchan	9c2500de5f	Fix incorrect tessellation inputs/outputs (#3728 ) * Fix incorrect tessellation inputs/outputs * Shader cache version bump	2022-10-01 02:35:52 -03:00
gdkchan	dbe43c1719	Fix SSL GetCertificates with certificate ID set to All (#3727 ) * Fix SSL GetCertificates with certificate ID set to All * Fix last entry status value	2022-09-29 12:45:25 -03:00
riperiperi	f502cfaf62	Vulkan: Zero blend state when disabled or write mask is 0 (#3719 ) * Zero blend state when disabled or write mask is 0 Any difference in the blend state when blend is disabled is meaningless, but Ryujinx would compare different disabled blends and compile them as separate pipelines. This change ensures that all pipelines where blend state is meaningless record it as such, which avoids compiling a bunch of pipelines that are essentially identical. The NVIDIA driver is pretty forgiving when it comes to silly pipeline misses like this, but other drivers don't offer the same level of kindness. This should reduce stuttering on those drivers, and might improve overall performance very slightly due to less pipeline variants being in the hash table. * Fix blend possibly being wrong when an attachment is unmasked	2022-09-29 12:32:49 -03:00
gdkchan	1fd5cf2b4a	Fix ListOpenContextStoredUsers and stub LoadOpenContext (#3718 ) * Fix ListOpenContextStoredUsers and stub LoadOpenContext * Remove nonsensical comment	2022-09-27 21:24:52 -03:00
LDj3SNuD	814f75142e	Fpsr and Fpcr freed. (#3701 ) * Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback. * Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Fpsr and Fpcr freed. Handling/isolation of Fpsr and Fpcr via register for IR and via memory for Tests and Threads, with synchronization to context exchanges (explicit for SoftFloat); without having to call managed methods. Thanks to the inlining work of the previous two PRs and others in this. Tests performed locally in both release and debug modes, in both lowcq and highcq, with FastFP to true and false (explicit FP tests included). Tested with the title Tony Hawk's PS. Depends on shlreg. * Update InstEmitSimdHelper.cs * De-magic Masks. Remove the Stride and Len flags; Fpsr.NZCV are A32 only, then moved to Fpscr: this leads to emitting less IR in reference to Get/Set Fpsr/Fpcr/Fpscr methods in reference to Mrs/Msr (A64) and Vmrs/Vmsr (A32) instructions. * Addressed PR feedback.	2022-09-20 18:55:13 -03:00
riperiperi	4c0eb91d7e	Convert Quads to Triangles in Vulkan (#3715 ) * Add Index Buffer conversion for quads to Vulkan Also adds a reusable repeating pattern index buffer to use for non-indexed draws, and generalizes the conversion cache for buffers. * Fix some issues * End render pass before conversion * Resume transform feedback after we ensure we're in a pass. * Always generate UInt32 type indices for topology conversion * No it's not. * Remove unused code * Rely on TopologyRemap to convert quads to tris. * Remove double newline * Ensure render pass ends before stride or I8 conversion	2022-09-20 18:38:48 -03:00
gdkchan	da75a9a6ea	OpenGL: Fix blit from non-multisample to multisample texture (#3596 ) * OpenGL: Fix blit from non-multisample to multisample texture * New approach for multisample copy using compute shaders	2022-09-19 16:12:56 -03:00
MutantAura	41790aa743	Avalonia - Misc changes to UX (#3643 ) * Change navbar from compact to default and force text overflow globally * Fix settings window * Fix right stick control alignment * Initialize value and add logging for SDL IDs * Fix alignment of setting text and improve borders * Clean up padding and size of buttons on controller settings * Fix right side trigger alignment and correct styling * Revert axaml alignment * Fix alignment of volume widget * Fix timezone autocompletebox dropdown height * MainWindow: Line up volume status bar item * Remove margins and add padding to volume widget * Make volume text localizable. Co-authored-by: merry <git@mary.rs>	2022-09-19 16:04:22 -03:00
gdkchan	0cb1e926b5	Allow bindless textures with handles from unbound constant buffer (#3706 )	2022-09-19 15:35:47 -03:00
Emmanuel Hansen	6f0395538b	Avalonia - Use embedded window for avalonia (#3674 ) * wip * use embedded window * fix race condition on opengl Windows * fix glx issues on prime nvidia * fix mouse support win32 * clean up * addressed review * addressed review * fix warnings * fix sotware keyboard dialog * Update Ryujinx.Ava/Ui/Applet/SwkbdAppletDialog.axaml.cs Co-authored-by: gdkchan <gab.dark.100@gmail.com> * remove double semi Co-authored-by: gdkchan <gab.dark.100@gmail.com>	2022-09-19 15:05:26 -03:00
LDj3SNuD	b9f1ff3c77	Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. (#3700 ) * Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback. * Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Update InstEmitSimdHelper.cs	2022-09-19 14:49:10 -03:00
TSRBerry	a77af4c5e9	Readme: Fix broken shell image (#3708 )	2022-09-19 14:06:00 +02:00
merry	fbcf802fbc	A32/T32/A64: Implement Hint instructions (CSDB, SEV, SEVL, WFE, WFI, YIELD) (#3694 ) * OpCodeTable: Implement Hint instructions (CSDB, SEV, SEVL, WFE, WFI, YIELD) * A64: Remove catch-all Hint instruction * T16: Handle unallocated hint instructions Some thumb tests execute these assuming that they're nops. * T32: Fill out other Hint instructions * A32: Fill out other hint instructions	2022-09-14 18:18:15 -03:00
riperiperi	c3c41fa4bb	Periodically Flush Commands for Vulkan (#3689 ) * Periodically Flush Commands for Vulkan NVIDIA's OpenGL driver has a built-in mechanism to automatically flush commands to GPU when a lot have been queued. It's also pretty inconsistent, but we'll ignore that for now. Our Vulkan implementation only submits a command buffer (flush equivalent) when it needs to. This is typically when another command buffer needs to be sequenced after it, presenting a frame, or an edge case where we flush around GPU queries to get results sooner. This difference in flush behaviour causes a notable difference between Vulkan and OpenGL when we have to wait for commands. In the worst case, we will wait for a sync point that has just been created. In Vulkan, this sync point is created by flushing the command buffer, and storing a waitable fence that signals its completion. Our command buffer contains _every command that we queued since the last submit_, which could be an entire frame's worth of draws. This has a huge effect on CPU <-> GPU latency. The more commands in a command buffer, the longer we have to wait for it to complete, which results in wasted time. Because we don't know when the guest will force us to wait, we always want the smallest possible latency. By periodically flushing, we ensure that each command buffer takes a more consistent, smaller amount of time to execute, and that the back of the GPU queue isn't as far away when we need to wait for something to happen. This also might reduce time that the GPU is left inactive while commands are being built. The main affected game is Pokemon Sword, which got significantly faster in overworld areas due to reduced waiting time when it flushes a shadow map from the main GPU thread. Another affected game is BOTW, which gets faster depending on the area. This game flushes textures/buffers from its game thread, which is the bottleneck. Flush latency and throughput may be improved on other games that are inexplicably slower than OpenGL. It's possible that certain games could have their performance _decreased_ slightly due to flushes not being free, but it is unlikely. Also, flushing to get query results sooner has been tweaked to improve the number of full draw skips that can be done. (tested in SMO) * Remove unused variable * Fix possible issue with early query flush	2022-09-14 13:48:31 -03:00
gdkchan	356e480bf5	Fix partial unmap reprotection on Windows (#3702 )	2022-09-14 17:46:37 +02:00
gdkchan	8e119a1e96	Implement PLD and SUB (imm16) on T32, plus UADD8, SADD8, USUB8 and SSUB8 on both A32 and T32 (#3693 )	2022-09-13 19:51:40 -03:00
merry	e05bf90af6	T32: Implement Asimd instructions (#3692 )	2022-09-13 18:25:37 -03:00
gdkchan	66f16f4392	Fix bindless 1D textures having a buffer type on the shader (#3697 ) * Fix bindless 1D textures having a buffer type on the shader * Shader cache version bump	2022-09-13 08:53:55 +02:00
gdkchan	729ff5337c	Fix increment on Arm32 NEON VLDn/VSTn instructions with regs > 1 (#3695 ) * Fix increment on Arm32 NEON VLDn/VSTn instructions with regs > 1 * PPTC version bump * PR feedback	2022-09-13 08:24:09 +02:00
gdkchan	2492e7e808	Fix R4G4B4A4 format on Vulkan (#3696 )	2022-09-13 07:59:38 +02:00
riperiperi	36172ab43b	Scale SamplesPassed counter by RT scale on report (#3680 ) * Scale SamplesPassed counter by RT scale on report Adds a scale factor for samples passed counter report based on the render target scale at the time. This ensures that when a game reads this counter, it appears similar to the result at 1x. This doesn't cover cases where the the render target scale changes during the queried draws, though that might be better to handle along with other scope related issues in a future rework of counters. Games generally don't count for occlusion queries over render target changes anyways. Fixes an issue in the Splatoon games where the special charge would scale too quickly at high res, points at the end of the game would be broken (but still provide a correct winner), and playing at a low res would make it impossible to swim in ink. May also affect LOD scaling in The Witcher 3. * Update Ryujinx.Graphics.Gpu/Engine/Threed/SemaphoreUpdater.cs Co-authored-by: gdkchan <gab.dark.100@gmail.com> Co-authored-by: gdkchan <gab.dark.100@gmail.com>	2022-09-11 15:58:15 +00:00
gdkchan	4d69286a9c	Implement VRINT (vector) Arm32 NEON instructions (#3691 )	2022-09-11 15:44:27 +00:00
merry	1529e6cf0d	T32: Add Vfp instructions (#3690 )	2022-09-10 23:03:14 -03:00
gdkchan	f468db7602	Implement Thumb (32-bit) memory (ordered), multiply, extension and bitfield instructions (#3687 ) * Implement Thumb (32-bit) memory (ordered), multiply and bitfield instructions * Remove public from interface * Fix T32 BL immediate and implement signed and unsigned extend instructions	2022-09-10 22:51:00 -03:00
gdk	c5f1d1749a	Revert address space mirror changes	2022-09-10 16:23:49 +02:00
gdk	7dd69f2d0e	Allocation free tree lookup	2022-09-10 16:23:49 +02:00
gdk	c646638680	Update several methods to use GetNode directly and avoid array allocations	2022-09-10 16:23:49 +02:00
gdk	65f2a82b97	Optimize PlaceholderManager.UnreserveRange	2022-09-10 16:23:49 +02:00
gdk	93dd6d525a	Fix potential issue with partial unmap We must also do the unmap operation with the RWLock, otherwise faults on the unmapped region will cause crashes and the whole thing becomes pointless	2022-09-10 16:23:49 +02:00
gdk	96d4ad952c	Fix reprotection regression	2022-09-10 16:23:49 +02:00
gdk	6a07f80b76	Make RBTree node fields internal again Prevents someone from accidentaly messing with them and leaving the tree in a invalid state	2022-09-10 16:23:49 +02:00
gdk	22214ac664	Delete unused code	2022-09-10 16:23:49 +02:00
gdk	45e520a27c	Rewrite PlaceholderManager4KB to use intrusive RBTree, and to coalesce free placeholders Also make the other placeholder manager use intrusive RBTree, allows the IntervalTree that was added just for this to be deleted	2022-09-10 16:23:49 +02:00
gdk	5b5810a46a	Defer address space mirror mapping and use it only if strictly needed	2022-09-10 16:23:49 +02:00
gdkchan	619ac86bd0	Do not output ViewportIndex on SPIR-V if GPU does not support it (#3644 ) * Do not output ViewportIndex on SPIR-V if GPU does not support it * Bump shader cache version	2022-09-10 13:20:23 +00:00
EmulationFanatic	7a1ab71c73	Update README.MD verbiage and compatibility	2022-09-10 15:07:37 +02:00
riperiperi	dc4ba3993b	Rebind textures if format changes or they're buffer textures	2022-09-10 14:12:50 +02:00
gdkchan	81f1a4dc31	Allocate work buffer for audio renderer instead of using guest supplied memory (#3276 ) * Allocate work buffer for audio renderer instead of using guest supplied memory * Typo * Use GC.AllocateArray to allocate pinned array	2022-09-10 01:16:24 +00:00
gdkchan	c64524a240	Add ADD (zx imm12), NOP, MOV (rs), LDA, TBB, TBH, MOV (zx imm16) and CLZ thumb instructions (#3683 ) * Add ADD (zx imm12), NOP, MOV (register shifted), LDA, TBB, TBH, MOV (zx imm16) and CLZ thumb instructions, fix LDRD, STRD, CBZ, CBNZ and BLX (reg) * Bump PPTC version	2022-09-09 22:09:11 -03:00
gdkchan	db45688aa8	Implement VRSRA, VRSHRN, VQSHRUN, VQMOVN, VQMOVUN, VQADD, VQSUB, VRHADD, VPADDL, VSUBL, VQDMULH and VMLAL Arm32 NEON instructions (#3677 ) * Implement VRSRA, VRSHRN, VQSHRUN, VQMOVN, VQMOVUN, VQADD, VQSUB, VRHADD, VPADDL, VSUBL, VQDMULH and VMLAL Arm32 NEON instructions * PPTC version * Fix VQADD/VQSUB * Improve MRC/MCR handling and exception messages In case data is being recompiled as code, we don't want to throw at emit stage, instead we should only throw if it actually tries to execute	2022-09-09 21:47:38 -03:00
riperiperi	c6d82209ab	Restride vertex buffer when stride causes attributes to misalign in Vulkan. (#3679 ) * Vertex Buffer Alignment part 1 * Update CacheByRange * Add Stride Change compute shader, fix storage buffers in helpers * An AMD exclusive * Reword * Change rules - stride conversion when attrs misalign * Fix stupid mistake * Fix background pipeline compile * Improve a few things. * Fix some feedback * Address Feedback (the shader binary didn't change when i changed the source to use the subgroup size) * Fix bug where rewritten buffer would be disposed instantly.	2022-09-08 20:30:19 -03:00
FICTURE7	ee1825219b	Clean up rejit queue (#2751 )	2022-09-08 20:14:08 -03:00
LDj3SNuD	7baa08dcb4	Implemented in IR the managed methods of the Saturating region ... (#3665 ) * Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback.	2022-09-08 19:40:41 -03:00
gdkchan	408bd63b08	Transform shader LDC into constant buffer access if offset is constant (#3672 ) * Transform shader LDC into constant buffer access if offset is constant * Shader cache version bump	2022-09-07 20:25:22 -03:00
Mary	df99257d7f	bsd: improve socket poll We should report errors even when not requested. This also ensure we only clear the bits that were requested on the output. Finally, this fix when input events is 0.	2022-09-07 22:58:41 +02:00
Mary-nyan	f3835dc78b	bsd: implement SendMMsg and RecvMMsg (#3660 ) * bsd: implement sendmmsg and recvmmsg * Fix wrong increment of vlen	2022-09-07 22:37:15 +02:00
EmulationFanatic	51bb8707ef	Update bug report template (#3676 ) Adds some verbiage to indicate that game-specific issues should be posted instead on the game compatibility list, unless it is a provable regression.	2022-09-06 22:30:07 +02:00

1 2 3 4 5 ...

2266 commits