Commit graph

760 commits

Author SHA1 Message Date
Nguyen Dac Nam 96a4abe12d
shader_decode: implement BREV on BFE
Implement reverse parallel follow: https://graphics.stanford.edu/~seander/bithacks.html#ReverseParallel
2020-03-13 14:13:31 +07:00
Nguyen Dac Nam 911c56ccef
node_helper: add IBitfieldExtract case 2020-03-13 12:50:32 +07:00
Nguyen Dac Nam 465ba30d08
shader_decode: Reimplement BFE instructions 2020-03-13 12:48:01 +07:00
ReinUsesLisp eb5861e0a2 engines/maxwell_3d: Add TFB registers and store them in shader registry 2020-03-09 18:40:53 -03:00
ReinUsesLisp b1acb4f73f shader/registry: Address feedback 2020-03-09 18:40:53 -03:00
ReinUsesLisp 66a8a3e887 shader/registry: Cache tessellation state 2020-03-09 18:40:07 -03:00
ReinUsesLisp 0528be5c92 shader/registry: Store graphics and compute metadata
Store information GLSL forces us to provide but it's dynamic state in
hardware (workgroup sizes, primitive topology, shared memory size).
2020-03-09 18:40:07 -03:00
ReinUsesLisp e8efd5a901 video_core: Rename "const buffer locker" to "registry" 2020-03-09 18:40:06 -03:00
ReinUsesLisp bd8b9bbcee gl_shader_cache: Rework shader cache and remove post-specializations
Instead of pre-specializing shaders and then post-specializing them,
drop the later and only "specialize" the shader while decoding it.
2020-03-09 18:40:06 -03:00
bunnei 0361aa1915
Merge pull request #3451 from ReinUsesLisp/indexed-textures
vk_shader_decompiler: Implement indexed textures
2020-03-05 11:42:46 -05:00
Nguyen Dac Nam 85a4222a8c
nit: move comment to right place. 2020-02-29 13:50:10 +07:00
Nguyen Dac Nam 6c0c2dfabc shader_decode: Fix LD, LDG when track constant buffer 2020-02-28 13:11:19 +07:00
Nguyen Dac Nam db2f547434
shader: FMUL switch to using LUT (#3441)
* shader: add FmulPostFactor LUT table

* shader: FMUL apply LUT

* Update src/video_core/engines/shader_bytecode.h

Co-Authored-By: Mat M. <mathew1800@gmail.com>

* nit: mistype

* clang-format & add missing import

* shader: remove post factor LUT.

* shader: move post factor LUT to function and fix incorrect order.

* clang-format

* shader: FMUL: add static to post factor LUT

* nit: typo

Co-authored-by: Mat M. <mathew1800@gmail.com>
2020-02-27 11:14:25 -05:00
bunnei 1f57f679a4
Merge pull request #3440 from namkazt/patch-6
shader: implement LOP3 fast replace for old function
2020-02-26 10:24:35 -05:00
ReinUsesLisp 1dda77d392 shader: Simplify indexed sampler usages 2020-02-24 01:26:07 -03:00
ReinUsesLisp 7dc488a375 shader/texture: Fix illegal 3D texture assert
Fix typo in the illegal 3D texture assert logic. We care about catching
arrayed 3D textures or 3D shadow textures, not regular 3D textures.
2020-02-21 15:57:27 -03:00
Nguyen Dac Nam 10d8afb302
nit: add const to where it need. 2020-02-21 21:16:45 +07:00
Nguyen Dac Nam 1956a34ee5
shader: implement LOP3 fast replace for old function
ref: https://devtalk.nvidia.com/default/topic/1070081/cuda-programming-and-performance/reverse-lut-for-lop3-lut/
2020-02-21 19:08:07 +07:00
bunnei bf0c929d4c
Merge pull request #3415 from ReinUsesLisp/texture-code
shader/texture: Allow 2D shadow arrays and simplify code
2020-02-19 20:06:14 -05:00
Nguyen Dac Nam 1b6308727c
shader_conversion: I2F : add Assert for case src_size is Short 2020-02-19 11:40:35 +07:00
Nguyen Dac Nam a2c2c5768f
fix warning 2020-02-19 11:10:26 +07:00
Nguyen Dac Nam a8508f2bc0
clang-format fix 2020-02-19 11:02:59 +07:00
Nguyen Dac Nam 556f3a6e9a
shader_conversion: add conversion I2F for Short 2020-02-19 10:54:37 +07:00
ReinUsesLisp 6910ade146 shader/texture: Allow 2D shadow arrays and simplify code
Shadow sampler 2D arrays are supported on OpenGL, so there's no reason
to forbid these. Enable textureLod usage on these.

Minor style changes.
2020-02-15 02:36:28 -03:00
bunnei 63a59b9935
Merge pull request #3379 from ReinUsesLisp/cbuf-offset
shader/decode: Fix constant buffer offsets
2020-02-14 13:22:53 -05:00
bunnei 90df4b8e2b
Merge pull request #3369 from ReinUsesLisp/shf
shader/shift: Implement SHF
2020-02-07 22:06:57 -05:00
ReinUsesLisp bf9a822b87 shader/decode: Fix constant buffer offsets
Some instances were using cbuf34.offset instead of cbuf34.GetOffset().
This returned the an invalid offset. Address those instances and rename
offset to "shifted_offset" to avoid future bugs.
2020-02-05 12:19:09 -03:00
bunnei 08c508b1c4
Merge pull request #3357 from ReinUsesLisp/bfi-rc
shader/bfi: Implement register-constant buffer variant
2020-02-04 15:14:13 -05:00
bunnei bf21aacc74
Merge pull request #3356 from ReinUsesLisp/fcmp
shader/arithmetic: Implement FCMP
2020-02-04 11:36:59 -05:00
bunnei c31ec00d67
Merge pull request #3337 from ReinUsesLisp/vulkan-staged
yuzu: Implement Vulkan frontend
2020-02-03 16:56:25 -05:00
ReinUsesLisp 223a89a19f shader: Remove curly braces initializers on shared pointers 2020-02-01 22:52:10 -03:00
bunnei b5bbe7e752
Merge pull request #3282 from FernandoS27/indexed-samplers
Partially implement Indexed samplers in general and specific code in GLSL
2020-02-01 20:41:40 -05:00
ReinUsesLisp 729ca120e3 shader/shift: Implement SHIFT_RIGHT_{IMM,R}
Shifts a pair of registers to the right and returns the low register.
2020-02-01 21:20:02 -03:00
ReinUsesLisp 017474c3f8 shader/shift: Implement SHF_LEFT_{IMM,R}
Shifts a pair of registers to the left and returns the high register.
2020-02-01 21:19:44 -03:00
bunnei c593e45dbd
Merge pull request #3347 from ReinUsesLisp/local-mem
shader/memory: Implement LDL.S16, LDS.S16, STL.S16 and STS.S16
2020-01-30 10:59:52 -05:00
ReinUsesLisp 9f0162e4b5 shader/other: Fix skips for SYNC and BRK 2020-01-29 17:53:11 -03:00
ReinUsesLisp 270177f38a shader/other: Stub S2R LaneId 2020-01-29 17:53:11 -03:00
ReinUsesLisp 137a8aa55c shader/bfi: Implement register-constant buffer variant
It's the same as the variant that was implemented, but it takes the
operands from another source.
2020-01-27 01:20:38 -03:00
ReinUsesLisp e3fc3459c8 shader/arithmetic: Implement FCMP
Compares the third operand with zero, then selects between the first and
second.
2020-01-27 01:15:44 -03:00
ReinUsesLisp d95d4ac843 shader/memory: Implement ATOM.ADD
ATOM operates atomically on global memory. For now only add ATOM.ADD
since that's what was found in commercial games.

This asserts for ATOM.ADD.S32 (handling the others as unimplemented),
although ATOM.ADD.U32 shouldn't be any different.

This change forces us to change the default type on SPIR-V storage
buffers from float to uint. We could also alias the buffers, but it's
simpler for now to just use uint. While we are at it, abstract the code
to avoid repetition.
2020-01-26 01:54:24 -03:00
Fernando Sahmkow bb8eb15d39 Shader_IR: Address feedback. 2020-01-25 09:04:59 -04:00
ReinUsesLisp d26e74f0a3 shader/memory: Implement STL.S16 and STS.S16 2020-01-25 03:16:10 -03:00
ReinUsesLisp 9a2cdf8520 shader/memory: Implement unaligned LDL.S16 and LDS.S16 2020-01-25 03:16:10 -03:00
ReinUsesLisp 531f25a037 shader/memory: Move unaligned load/store to functions 2020-01-25 03:16:10 -03:00
ReinUsesLisp 96638f57c9 shader/memory: Implement LDL.S16 and LDS.S16 2020-01-25 03:15:55 -03:00
Fernando Sahmkow 806f569143 Shader_IR: Change name of TrackSampler function so it does not confuse with the type. 2020-01-24 16:44:48 -04:00
Fernando Sahmkow 3919b7b8a9 Shader_IR: Corrections, styling and extras. 2020-01-24 16:44:48 -04:00
Fernando Sahmkow 7c530e0666 Shader_IR: Propagate bindless index into the GL compiler. 2020-01-24 16:44:47 -04:00
Fernando Sahmkow 3c34678627 Shader_IR: Implement Injectable Custom Variables to the IR. 2020-01-24 16:43:31 -04:00
Fernando Sahmkow 037ea431ce Shader_IR: deduce size of indexed samplers 2020-01-24 16:43:31 -04:00
Fernando Sahmkow f4603d23c5 Shader_IR: Setup Indexed Samplers on the IR 2020-01-24 16:43:30 -04:00
Fernando Sahmkow 603c861532 Shader_IR: Implement initial code for tracking indexed samplers. 2020-01-24 16:43:30 -04:00
Fernando Sahmkow 64496f2456 Shader_IR: Address Feedback 2020-01-24 16:43:30 -04:00
Fernando Sahmkow b97608ca64 Shader_IR: Allow constant access of guest driver. 2020-01-24 16:43:30 -04:00
Fernando Sahmkow dc5cfa8d28 Shader_IR: Address Feedback 2020-01-24 16:43:29 -04:00
Fernando Sahmkow 1e4b6bef6f Shader_IR: Store Bound buffer on Shader Usage 2020-01-24 16:43:29 -04:00
Fernando Sahmkow c921e496eb GPU: Implement guest driver profile and deduce texture handler sizes. 2020-01-24 16:43:29 -04:00
bunnei a104b985a8
Merge pull request #3273 from FernandoS27/txd-array
Shader_IR: Implement TXD Array.
2020-01-24 14:02:40 -05:00
ReinUsesLisp 63ba41a26d shader/memory: Implement ATOMS.ADD.U32 2020-01-16 17:30:55 -03:00
Lioncash a1eee1749e control_flow: Silence -Wreorder warning for CFGRebuildState
Organizes the initializer list in the same order that the variables
would actually be initialized in.
2020-01-14 13:28:48 -05:00
bunnei 55f95e7f26
Merge pull request #3287 from ReinUsesLisp/ldg-stg-16
shader_ir/memory: Implement u16 and u8 for STG and LDG
2020-01-14 09:57:08 -05:00
ReinUsesLisp 13021b534c shader_ir/texture: Simplify AOFFI code 2020-01-09 03:50:37 -03:00
ReinUsesLisp e2a2a556b9 shader_ir/memory: Implement u16 and u8 for STG and LDG
Using the same technique we used for u8 on LDG, implement u16.

In the case of STG, load memory and insert the value we want to set
into it with bitfieldInsert. Then set that value.
2020-01-09 02:12:29 -03:00
bunnei cd0a7dfdbc
Merge pull request #3258 from FernandoS27/shader-amend
Shader_IR: add the ability to amend code in the shader ir.
2020-01-04 14:05:17 -05:00
Fernando Sahmkow 3dd6b55851 Shader_IR: Address Feedback 2020-01-04 14:40:57 -04:00
Fernando Sahmkow a1667a7b46 Shader_IR: Implement TXD Array.
This commit extends the compilation of TXD to support array samplers on
TXD.
2020-01-04 13:28:02 -04:00
bunnei 028b2718ed
Merge pull request #3239 from ReinUsesLisp/p2r
shader/p2r: Implement P2R Pr
2019-12-31 20:37:16 -05:00
Fernando Sahmkow b3371ed09e Shader_IR: add the ability to amend code in the shader ir.
This commit introduces a mechanism by which shader IR code can be
amended and extended. This useful for track algorithms where certain
information can derived from before the track such as indexes to array
samplers.
2019-12-30 15:31:48 -04:00
bunnei 8a76f816a4
Merge pull request #3228 from ReinUsesLisp/ptp
shader/texture: Implement AOFFI and PTP for TLD4 and TLD4S
2019-12-26 21:43:44 -05:00
bunnei 16dcfacbfc
Merge pull request #3235 from ReinUsesLisp/ldg-u8
shader/memory: Implement LDG.U8 and unaligned U8 loads
2019-12-21 22:50:28 -05:00
ReinUsesLisp 38d3a48873
shader/p2r: Implement P2R Pr
P2R dumps predicate or condition codes state to a register. This is
useful for unit testing.
2019-12-20 18:02:41 -03:00
ReinUsesLisp cf27b59493
shader/r2p: Refactor P2R to support P2R 2019-12-20 17:55:42 -03:00
bunnei 7be65c6a68
Merge pull request #3234 from ReinUsesLisp/i2f-u8-selector
shader/conversion: Implement byte selector in I2F
2019-12-19 22:36:26 -05:00
ReinUsesLisp ae8d4b6c0c
shader/memory: Implement LDG.U8 and unaligned U8 loads
LDG can load single bytes instead of full integers or packs of integers.
These have the advantage of loading bytes that are not aligned to 4
bytes.

To emulate these this commit gets the byte being referenced (by doing
"address & 3" and then using that to extract the byte from the loaded
integer:

result = bitfieldExtract(loaded_integer, (address % 4) * 8, 8)
2019-12-18 01:21:46 -03:00
ReinUsesLisp a7d6bd1ef1
shader/conversion: Implement byte selector in I2F
I2F's byte selector is used to choose what bytes to convert to float.
e.g. if the input is 0xaabbccdd and the selector is ".B3" it will
convert 0xaa. The default (when it's not shown in nvdisasm) is ".B0", in
that example the default would convert 0xdd to float.
2019-12-18 00:41:22 -03:00
ReinUsesLisp 15a753b9a5
shader/texture: Properly shrink unused entries in size mismatches
When a image format mismatches we were inserting zeroes to the texture
itself. This was not handling cases were the mismatch uses less
coordinates than the guest shader code. Address that by resizing the
vector.
2019-12-17 23:38:10 -03:00
ReinUsesLisp e09c1fbc1f
shader/texture: Implement TLD4.PTP 2019-12-16 04:09:24 -03:00
ReinUsesLisp 844e4a297b
shader/texture: Enable arrayed TLD4 2019-12-16 02:37:21 -03:00
ReinUsesLisp 3d2c44848b
shader/texture: Implement AOFFI for TLD4S 2019-12-16 02:06:42 -03:00
ReinUsesLisp 3d9fff82c0
shader/texture: Remove unnecesary parenthesis 2019-12-16 01:52:33 -03:00
Fernando Sahmkow c0ee0aa1a8 Shader_IR: Correct TLD4S Depth Compare. 2019-12-11 19:53:17 -04:00
Fernando Sahmkow af89723fa3 Shader_Ir: Correct TLD4S encoding and implement f16 flag. 2019-12-11 19:53:17 -04:00
Fernando Sahmkow 271a3264f3 Shader_Ir: default failed tracks on bindless samplers to null values. 2019-12-11 19:53:16 -04:00
ReinUsesLisp 425a254fa2
shader: Implement MEMBAR.GL
Implement using memoryBarrier in GLSL and OpMemoryBarrier on SPIR-V.
2019-12-10 16:45:03 -03:00
ReinUsesLisp 0b5b93053d
shader_ir/other: Implement S2R InvocationId 2019-12-09 23:52:28 -03:00
ReinUsesLisp 9ad6327fbd
shader: Keep track of shaders using warp instructions 2019-12-09 23:40:41 -03:00
ReinUsesLisp 6233b1db08
shader_ir/memory: Implement patch stores 2019-12-09 23:25:21 -03:00
bunnei e36814d6d5
Merge pull request #3109 from FernandoS27/new-instr
Implement FLO & TXD Instructions on GPU Shaders
2019-12-06 18:18:16 -05:00
Lioncash 9403979c22 video_core/const_buffer_locker: Make use of std::tie in HasEqualKeys()
Tidies it up a little bit visually.
2019-11-27 05:53:43 -05:00
Lioncash 930e311526 video_core/const_buffer_locker: Remove unused includes 2019-11-27 05:51:13 -05:00
Lioncash 9341ca7979 video_core/const_buffer_locker: Remove #pragma once from cpp file
Silences a compiler warning.
2019-11-27 05:50:51 -05:00
ReinUsesLisp c8a48aacc0
video_core: Unify ProgramType and ShaderStage into ShaderType 2019-11-22 21:28:48 -03:00
ReinUsesLisp dc9961f341
shader/texture: Handle TLDS texture type mismatches
Some games like "Fire Emblem: Three Houses" bind 2D textures to offsets
used by instructions of 1D textures. To handle the discrepancy this
commit uses the the texture type from the binding and modifies the
emitted code IR to build a valid backend expression.

E.g.: Bound texture is 2D and instruction is 1D, the emitted IR samples
a 2D texture in the coordinate ivec2(X, 0).
2019-11-22 21:28:47 -03:00
ReinUsesLisp 32c1bc6a67
shader/texture: Deduce texture buffers from locker
Instead of specializing shaders to separate texture buffers from 1D
textures, use the locker to deduce them while they are being decoded.
2019-11-22 21:28:47 -03:00
ReinUsesLisp 24f4198cee
shader/other: Reduce DEPBAR log severity
While DEPBAR is stubbed it doesn't change anything from our end. Shading
languages handle what this instruction does implicitly. We are not
getting anything out fo this log except noise.
2019-11-19 21:26:40 -03:00
Fernando Sahmkow c8473f399e Shader_IR: Address Feedback 2019-11-18 07:34:34 -04:00
Fernando Sahmkow cd0f5dfc17 Shader_IR: Implement TXD instruction. 2019-11-14 11:15:27 -04:00
Fernando Sahmkow f3d1b370aa Shader_IR: Implement FLO instruction. 2019-11-14 11:15:27 -04:00
Fernando Sahmkow b6f6733131
Merge pull request #3081 from ReinUsesLisp/fswzadd-shuffles
shader: Implement FSWZADD and reimplement SHFL
2019-11-14 10:27:27 -04:00
Rodrigo Locatti cf770a68a5
Merge pull request #3084 from ReinUsesLisp/cast-warnings
video_core: Treat implicit conversions as errors
2019-11-13 02:16:22 -03:00
ReinUsesLisp 096f339a2a video_core: Silence implicit conversion warnings 2019-11-08 22:48:50 +00:00
ReinUsesLisp 56e237d1f9
shader_ir/warp: Implement FSWZADD 2019-11-07 20:08:41 -03:00
ReinUsesLisp 08b2b1080a
gl_shader_decompiler: Reimplement shuffles with platform agnostic intrinsics 2019-11-07 20:08:41 -03:00
bunnei b6ae48966d
Merge pull request #3032 from ReinUsesLisp/simplify-control-flow-brx
shader/control_flow: Abstract repeated code chunks in BRX tracking
2019-11-07 01:30:01 -05:00
Rodrigo Locatti ff5a0f370c
shader/control_flow: Specify constness on caller lambdas
Update src/video_core/shader/control_flow.cpp

Co-Authored-By: Mat M. <mathew1800@gmail.com>

Update src/video_core/shader/control_flow.cpp

Co-Authored-By: Mat M. <mathew1800@gmail.com>

Update src/video_core/shader/control_flow.cpp

Co-Authored-By: Mat M. <mathew1800@gmail.com>

Update src/video_core/shader/control_flow.cpp

Co-Authored-By: Mat M. <mathew1800@gmail.com>

Update src/video_core/shader/control_flow.cpp

Co-Authored-By: Mat M. <mathew1800@gmail.com>

Update src/video_core/shader/control_flow.cpp

Co-Authored-By: Mat M. <mathew1800@gmail.com>
2019-11-07 01:44:09 -03:00
ReinUsesLisp 7b069252f8
shader/control_flow: Use callable template instead of std::function 2019-11-07 01:44:08 -03:00
ReinUsesLisp 46c3047283
shader/control_flow: Abstract repeated code chunks in BRX tracking
Remove copied and pasted for cycles into a common templated function.
2019-11-07 01:44:08 -03:00
ReinUsesLisp ae7dfa93be
shader/control_flow: Silence Intellisense cast warnings 2019-11-07 01:44:08 -03:00
ReinUsesLisp deb1b54eed
shader/control_flow: Remove brace initializer in std containers
These containers have a default constructor.
2019-11-07 01:44:08 -03:00
ReinUsesLisp 39c66abd91
shader/decode: Reduce severity of arithmetic rounding warnings 2019-11-07 01:43:38 -03:00
ReinUsesLisp c4374d0d41
shader/arithmetic: Reduce RRO stub severity 2019-11-07 01:43:38 -03:00
ReinUsesLisp 35d40b74b3
shader/texture: Remove NODEP warnings
These warnings don't offer meaningful information while decoding
shaders. Remove them.
2019-11-07 01:43:38 -03:00
Rodrigo Locatti 654b77d2ec
Merge pull request #3039 from ReinUsesLisp/cleanup-samplers
shader/node: Unpack bindless texture encoding
2019-11-06 04:54:11 +00:00
Fernando Sahmkow 23cabc98db Shader_IR: Fix regression on TLD4
Originally on the last commit I thought TLD4 acted the same as TLD4S and 
didn't have a mask. It actually does have a component mask. This commit 
corrects that.
2019-10-30 21:14:57 -04:00
Fernando Sahmkow 9293c3a0f2 Shader_IR: Fix TLD4 and add Bindless Variant.
This commit fixes an issue where not all 4 results of tld4 were being
written, the color component was defaulted to red, among other things.
It also implements the bindless variant.
2019-10-30 12:02:03 -04:00
ReinUsesLisp a993df1ee2
shader/node: Unpack bindless texture encoding
Bindless textures were using u64 to pack the buffer and offset from
where they come from. Drop this in favor of separated entries in the
struct.

Remove the usage of std::set in favor of std::list (it's not std::vector
to avoid reference invalidations) for samplers and images.
2019-10-29 20:53:48 -03:00
Rodrigo Locatti 26f3e18c5c
Merge pull request #2976 from FernandoS27/cache-fast-brx-rebased
Implement Fast BRX, fix TXQ and addapt the Shader Cache for it
2019-10-26 16:56:13 -03:00
Fernando Sahmkow be856a38d6 Shader_IR: Address Feedback. 2019-10-26 15:38:30 -04:00
Rodrigo Locatti a0d79085c4
Merge pull request #3027 from lioncash/lookup
shader_ir: Use std::array with std::pair instead of std::unordered_map
2019-10-26 05:49:15 -03:00
Rodrigo Locatti d52598173d
Merge pull request #3013 from FernandoS27/tld4s-fix
Shader_Ir: Fix TLD4S from using a component mask.
2019-10-25 20:06:26 -03:00
ReinUsesLisp 78f3e8a757 gl_shader_cache: Implement locker variants invalidation 2019-10-25 09:01:32 -04:00
ReinUsesLisp ec85648af3 gl_shader_disk_cache: Store and load fast BRX 2019-10-25 09:01:31 -04:00
ReinUsesLisp fa2c297f3e const_buffer_locker: Minor style changes 2019-10-25 09:01:31 -04:00
ReinUsesLisp 7b81ba4d8a gl_shader_decompiler: Move entries to a separate function 2019-10-25 09:01:31 -04:00
Fernando Sahmkow 1244f2d368 Shader_IR: Implement Fast BRX and allow multi-branches in the CFG. 2019-10-25 09:01:31 -04:00
Fernando Sahmkow a05120ec0b Shader_IR: Correct typo in Consistent method. 2019-10-25 09:01:30 -04:00
Fernando Sahmkow 33fcec3502 Shader_IR: allow lookup of texture samplers within the shader_ir for instructions that don't provide it 2019-10-25 09:01:30 -04:00
Fernando Sahmkow 8909f52166 Shader_IR: Implement Fast BRX and allow multi-branches in the CFG. 2019-10-25 09:01:30 -04:00
Fernando Sahmkow acd6441134 Shader_Cache: setup connection of ConstBufferLocker 2019-10-25 09:01:29 -04:00
Fernando Sahmkow 1a58f45d76 VideoCore: Unify const buffer accessing along engines and provide ConstBufferLocker class to shaders. 2019-10-25 09:01:29 -04:00
Fernando Sahmkow 2ef696c85a Shader_IR: Implement BRX tracking. 2019-10-25 09:01:29 -04:00
Lioncash 382717172e shader_ir: Use std::array with pair instead of unordered_map
Given the overall size of the maps are very small, we can use arrays of
pairs here instead of always heap allocating a new map every time the
functions are called. Given the small size of the maps, the difference
in container lookups are negligible, especially given the entries are
already sorted.
2019-10-24 00:25:38 -04:00
Lioncash 1f5401c89c video_core/shader: Resolve instances of variable shadowing
Silences a few -Wshadow warnings.
2019-10-23 23:00:31 -04:00
Fernando Sahmkow 1509d2ffbd Shader_Ir: Fix TLD4S from using a component mask.
TLD4S always outputs 4 values, the previous code checked a component 
mask and omitted those values that weren't part of it. This commit 
corrects that and makes sure all 4 values are set.
2019-10-22 10:59:07 -04:00
ReinUsesLisp 1ea07954fb shader_ir/memory: Ignore global memory when tracking fails
Ignore global memory operations instead of invoking undefined behaviour
when constant buffer tracking fails and we are blasting through asserts,
ignore the operation.

In the case of LDG this means filling the destination registers with
zeroes; for STG this means ignore the instruction as a whole.

The default behaviour is still to abort execution on failure.
2019-10-22 02:49:17 -03:00
Lioncash 074b38b7a9 video_core/shader/ast: Make ShowCurrentState() and SanityCheck() const member functions
These can also trivially be made const member functions, with the
addition of a few consts.
2019-10-17 20:59:48 -04:00
Lioncash 222f4b45eb video_core/shader/ast: Make ASTManager::Print a const member function
Given all visiting functions never modify the nodes, we can trivially
make this a const member function.
2019-10-17 20:56:39 -04:00
Lioncash 7831e86c34 video_core/shader/ast: Make ExprPrinter members private
This member already has an accessor, so there's no need for it to be
public.
2019-10-17 20:39:36 -04:00
Lioncash a2eccbf075 video_core/shader/ast: Make Indent() return a string_view
The returned string is simply a substring of our constexpr tabs
string_view, so we can just use a string_view here as well, since the
original string_view is guaranteed to always exist.

Now the function is fully non-allocating.
2019-10-17 20:29:00 -04:00
Lioncash 15d177a6ac video_core/shader/ast: Make Indent() private
It's never used outside of this class, so we can narrow its scope down.
2019-10-17 20:26:13 -04:00
Lioncash 7f6a8a33d4 video_core/shader/ast: Rename Ident() to Indent()
This can be confusing, given "ident" is generally used as a shorthand
for "identifier".
2019-10-17 20:26:13 -04:00
Lioncash 081530686c video_core/shader/ast: Make use of fmt where applicable
Makes a few strings nicer to read and also eliminates a bit of string
churn with operator+.
2019-10-17 20:26:10 -04:00
bunnei 9fe8072c67
Merge pull request #2980 from lioncash/warn
maxwell_3d: Silence truncation warnings
2019-10-17 14:02:16 -04:00
Lioncash 77b4916b33 control_flow: Silence truncation warnings
This can be trivially fixed by making the input size a size_t.
CFGRebuildState's constructor parameter is already a std::size_t, so
this just makes the size type fully conform with it.
2019-10-15 19:10:28 -04:00
Lioncash 67658dd6e8 shader/node: std::move Meta instance within OperationNode constructor
Allows usages of the constructor to avoid an unnecessary copy.
2019-10-15 18:21:59 -04:00
ReinUsesLisp 3d0f357307
shader/half_set_predicate: Fix HSETP2 for constant buffers
HSETP2 when used with a constant buffer parses the second operand type
as F32. This is not configurable.
2019-10-07 14:49:47 -03:00
ReinUsesLisp 632c9e4ee3
shader/half_set_predicate: Reduce DEBUG_ASSERT to LOG_DEBUG 2019-10-07 14:48:58 -03:00
Lioncash f883cd4f0e video_core/control_flow: Eliminate variable shadowing warnings 2019-10-05 09:14:27 -04:00
Lioncash 25702b6256 video_core/control_flow: Eliminate pessimizing moves
These can inhibit the ability of a compiler to perform RVO.
2019-10-05 09:14:27 -04:00
Lioncash d82b181d44 video_core/ast: Unindent most of IsFullyDecompiled() by one level 2019-10-05 09:14:27 -04:00
Lioncash 6c41d1cd7e video_core/ast: Make ShowCurrentState() take a string_view instead of std::string
Allows the function to be non-allocating in terms of the output string.
2019-10-05 09:14:27 -04:00
Lioncash 3c54edae24 video_core/ast: Eliminate variable shadowing warnings 2019-10-05 09:14:26 -04:00
Lioncash 5a0a9c7449 video_core/ast: Replace std::string with a constexpr std::string_view
Same behavior, but without the need to heap allocate
2019-10-05 09:14:26 -04:00
Lioncash 3a20d9734f video_core/ast: Default the move constructor and assignment operator
This is behaviorally equivalent and also fixes a bug where some members
weren't being moved over.
2019-10-05 09:14:26 -04:00
Lioncash 43503a69bf video_core/{ast, expr}: Organize forward declaration
Keeps them alphabetically sorted for readability.
2019-10-05 09:14:26 -04:00
Lioncash 50ad745585 video_core/expr: Supply operator!= along with operator==
Provides logical symmetry to the interface.
2019-10-05 09:14:26 -04:00
Lioncash 8eb1398f8d video_core/{ast, expr}: Use std::move where applicable
Avoids unnecessary atomic reference count increments and decrements.
2019-10-05 09:14:23 -04:00
Lioncash 8e0c80f269 video_core/ast: Supply const accessors for data where applicable
Provides const equivalents of data accessors for use within const
contexts.
2019-10-05 08:22:03 -04:00
Fernando Sahmkow e6eae4b815 Shader_ir: Address feedback 2019-10-04 18:52:57 -04:00
Fernando Sahmkow 3c09d9abe6 Shader_Ir: Address Feedback and clang format. 2019-10-04 18:52:57 -04:00
Fernando Sahmkow 7c756baa77 Shader_IR: clean up AST handling and add documentation. 2019-10-04 18:52:55 -04:00
Fernando Sahmkow 5ea740beb5 Shader_IR: Correct OutwardMoves for Ifs 2019-10-04 18:52:54 -04:00
Fernando Sahmkow b3c46d6948 Shader_IR: corrections and clang-format 2019-10-04 18:52:53 -04:00
Fernando Sahmkow 2e9a810423 Shader_IR: allow else derivation to be optional. 2019-10-04 18:52:52 -04:00
Fernando Sahmkow ca9901867e vk_shader_compiler: Implement the decompiler in SPIR-V 2019-10-04 18:52:51 -04:00
Fernando Sahmkow 0366c18d87 Shader_IR: mark labels as unused for partial decompile. 2019-10-04 18:52:51 -04:00
Fernando Sahmkow 47e4f6a52c Shader_Ir: Refactor Decompilation process and allow multiple decompilation modes. 2019-10-04 18:52:50 -04:00
Fernando Sahmkow 38fc995f6c gl_shader_decompiler: Implement AST decompiling 2019-10-04 18:52:50 -04:00
Fernando Sahmkow 6fdd501113 shader_ir: Declare Manager and pass it to appropiate programs. 2019-10-04 18:52:49 -04:00
Fernando Sahmkow 8be6e1c522 shader_ir: Corrections to outward movements and misc stuffs 2019-10-04 18:52:48 -04:00
Fernando Sahmkow 4fde66e609 shader_ir: Add basic goto elimination 2019-10-04 18:52:48 -04:00
Fernando Sahmkow c17953978b shader_ir: Initial Decompile Setup 2019-10-04 18:52:47 -04:00
bunnei 376f1a4432
Merge pull request #2869 from ReinUsesLisp/suld
shader/image: Implement SULD and fix SUATOM
2019-09-23 21:47:03 -04:00
David 9d69206cd0
Merge pull request #2870 from FernandoS27/multi-draw
Implement a MME Draw commands Inliner and correct host instance drawing
2019-09-22 23:13:02 +10:00
Rodrigo Locatti 9286976948
Merge pull request #2878 from FernandoS27/icmp
shader_ir: Implement ICMP
2019-09-21 18:06:07 -03:00
ReinUsesLisp 44000971e2
gl_shader_decompiler: Use uint for images and fix SUATOM
In the process remove implementation of SUATOM.MIN and SUATOM.MAX as
these require a distinction between U32 and S32. These have to be
implemented with imageCompSwap loop.
2019-09-21 17:33:52 -03:00
ReinUsesLisp 675f23aedc
shader/image: Implement SULD and remove irrelevant code
* Implement SULD as float.
* Remove conditional declaration of GL_ARB_shader_viewport_layer_array.
2019-09-21 17:32:48 -03:00
Fernando Sahmkow 527b841c15 Shader_IR: ICMP corrections and fixes 2019-09-21 14:28:03 -04:00
bunnei 88d857499b
Merge pull request #2855 from ReinUsesLisp/shfl
shader_ir/warp: Implement SHFL for Nvidia devices
2019-09-20 17:10:42 -04:00
Fernando Sahmkow 4b81d19a1a Shader_IR: Implement ICMP. 2019-09-19 20:56:29 -04:00
Fernando Sahmkow 7606da5611 VideoCore: Corrections to the MME Inliner and removal of hacky instance management. 2019-09-19 11:41:29 -04:00
bunnei b31880dc5e
Merge pull request #2784 from ReinUsesLisp/smem
shader_ir: Implement shared memory
2019-09-18 16:26:05 -04:00
ReinUsesLisp 0526bf1895 shader_ir/warp: Implement SHFL 2019-09-17 17:44:07 -03:00
ReinUsesLisp 36abf67e79 shader/image: Implement SUATOM and fix SUST 2019-09-10 20:22:31 -03:00
bunnei 34b2c60f95
Merge pull request #2823 from ReinUsesLisp/shr-clamp
shader/shift: Implement SHR wrapped and clamped variants
2019-09-10 11:56:17 -04:00
ReinUsesLisp 1f43e5296f gl_shader_decompiler: Keep track of written images and mark them as modified 2019-09-05 23:26:05 -03:00
ReinUsesLisp 3a450c1395 kepler_compute: Implement texture queries 2019-09-05 20:35:51 -03:00
ReinUsesLisp 4de04eba39 shader_ir: Implement LD_S
Loads from shared memory.
2019-09-05 01:38:37 -03:00
ReinUsesLisp f17415d431 shader_ir: Implement ST_S
This instruction writes to a memory buffer shared with threads within
the same work group. It is known as "shared" memory in GLSL.
2019-09-05 01:38:37 -03:00
ReinUsesLisp 77ef4fa907 shader/shift: Implement SHR wrapped and clamped variants
Nvidia defaults to wrapped shifts, but this is undefined behaviour on
OpenGL's spec. Explicitly mask/clamp according to what the guest shader
requires.
2019-09-04 01:55:24 -03:00
ReinUsesLisp dfae2d141a half_set_predicate: Fix predicate assignments 2019-09-04 01:54:23 -03:00
bunnei 81fbc5370d
Merge pull request #2812 from ReinUsesLisp/f2i-selector
shader_ir/conversion: Implement F2I and F2F F16 selector
2019-09-03 22:35:33 -04:00
bunnei d4f33b822b
Merge pull request #2811 from ReinUsesLisp/fsetp-fix
float_set_predicate: Add missing negation bit for the second operand
2019-09-03 22:34:34 -04:00
Rodrigo Locatti 4d4f9cc104 video_core: Silent miscellaneous warnings (#2820)
* texture_cache/surface_params: Remove unused local variable

* rasterizer_interface: Add missing documentation commentary

* maxwell_dma: Remove unused rasterizer reference

* video_core/gpu: Sort member declaration order to silent -Wreorder warning

* fermi_2d: Remove unused MemoryManager reference

* video_core: Silent unused variable warnings

* buffer_cache: Silent -Wreorder warnings

* kepler_memory: Remove unused MemoryManager reference

* gl_texture_cache: Add missing override

* buffer_cache: Add missing include

* shader/decode: Remove unused variables
2019-08-30 14:08:00 -04:00
bunnei f8cc5668f8
Merge pull request #2758 from ReinUsesLisp/packed-tid
shader/decode: Implement S2R Tic
2019-08-29 12:58:43 -04:00
ReinUsesLisp e3534700d7 shader_ir/conversion: Split int and float selector and implement F2F H1 2019-08-28 16:09:33 -03:00
ReinUsesLisp b13fbc25b8 shader_ir/conversion: Implement F2I F16 Ra.H1 2019-08-27 23:40:40 -03:00
ReinUsesLisp 6207751b00 float_set_predicate: Add missing negation bit for the second operand 2019-08-27 21:57:43 -03:00
ReinUsesLisp 4e35177e23 shader_ir: Implement VOTE
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics

Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.

To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:

* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true

ballotARB, also known as "uint64_t(activeThreadsNV())", emits

VOTE.ANY Rd, PT, PT;

on nouveau's compiler. This doesn't match exactly to Nvidia's code

VOTE.ALL Rd, PT, PT;

Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
2019-08-21 14:50:38 -03:00
bunnei dfdd20142e
Merge pull request #2777 from ReinUsesLisp/hsetp2-fe3h-fix
half_set_predicate: Fix HSETP2_C constant buffer offset
2019-08-21 10:29:17 -04:00