Commit graph

33 commits

Author SHA1 Message Date
Ben Avison 8425d9d5d0 SDL_blit: use a named enum for required hardware bits in dispatch tables 2019-10-24 21:15:09 -04:00
Sam Lantinga 715e070d29 SDL_blit_N.c: Correct vec_perm() application on little-endian 64-bit PowerPC
The LE transformation for vec_perm has an implicit assumption that the
permutation is being used to reorder vector elements (in this case 4-byte
integer word elements), not to reorder bytes within those elements.  Although
this is legal behavior, it is not anticipated by the transformation performed
by the compilers.

This causes pygame-1.9.1 test failure on PPC64LE because blitted pixmaps are
corrupted there due to how SDL uses vec_perm().

From RedHat / Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=1392465
Original patch was provided by: Menanteau Guy <menantea@linux.vnet.ibm.com>
2019-09-06 08:50:19 -07:00
Sylvain Becker 2fd4aee149 Un-activate some routine on mips because they are slowers (Bug 4503) 2019-02-23 09:36:56 +01:00
Sylvain Becker 47fb781b94 BlitNtoN BlitNtoNKey: remove non-aligned word read/store (bpp 3<->4) (Bug 4503)
Mips and (old) ARM doesn't allow word read/write when adress isn't 4bytes
aligned. So just remove that.
2019-02-22 09:30:45 +01:00
Sylvain Becker 90a075d75f Fix windows build 2019-02-18 22:48:14 +01:00
Sylvain Becker e9a7b6973a Fix bug 4053: Blit issues on Big Endian CPU 2019-02-18 22:06:53 +01:00
Sylvain Becker afd1b3dae4 Fix invalid memory access and optimise Blit_3or4_to_3or4__*
Fix invalid write at last pixel of the surface:
  when surface has no padding (pitch == w * bpp) and bpp is 3
  with Blit, no colorkey, and NO_ALPHA same or inverse rgb triplet

Optimise by using int32 access:

BGR24 -> ARGB8888 :  faster x1.897875   (362405 -> 190953)
RGB24 -> ABGR8888 :  faster x1.660416   (363304 -> 218803)

ABGR8888 -> RGB24 :  faster x1.686319   (334962 -> 198635)
ARGB8888 -> BGR24 :  faster x1.691868   (324524 -> 191814)
BGR24 -> RGB888 :  faster x1.678459   (326811 -> 194709)
BGR888 -> RGB24 :  faster x1.731772   (327724 -> 189242)
RGB24 -> BGR888 :  faster x1.690989   (328916 -> 194511)
RGB888 -> BGR24 :  faster x1.698333   (326175 -> 192056)
2019-02-17 16:20:23 +01:00
Sylvain Becker 1aa2ad2fe8 Better naming for the blit permutation variables 2019-02-09 17:40:32 +01:00
Sylvain Becker f6a2ae6007 Faster blit colorkey or not, applied to bpp: 3->4 and 4->3
===== BlitNtoNKey ========
ABGR8888 -> BGR24 :  faster x3   (2168709 -> 562738)
ABGR8888 -> RGB24 :  faster x3   (2165055 -> 567458)

ARGB8888 -> BGR24 :  faster x3   (2169109 -> 564338)
ARGB8888 -> RGB24 :  faster x3   (2165266 -> 567081)

BGR24 -> ABGR8888 :  faster x3   (2997675 -> 891636)
BGR24 -> ARGB8888 :  faster x3   (2985449 -> 892028)
BGR24 -> BGR888 :  faster x3   (2961611 -> 891913)
BGR24 -> BGRA8888 :  faster x3   (3116305 -> 891534)
BGR24 -> BGRX8888 :  faster x3   (3179654 -> 896978)
BGR24 -> RGB888 :  faster x3   (2968191 -> 895112)
BGR24 -> RGBA8888 :  faster x3   (2998428 -> 893147)
BGR24 -> RGBX8888 :  faster x3   (2976529 -> 914853)

BGR888 -> BGR24 :  faster x3   (2161906 -> 563921)
BGR888 -> RGB24 :  faster x3   (2168228 -> 566634)

BGRA8888 -> BGR24 :  faster x4   (2270501 -> 561873)
BGRA8888 -> RGB24 :  faster x3   (2163179 -> 567330)

BGRX8888 -> BGR24 :  faster x3   (2162911 -> 562322)
BGRX8888 -> RGB24 :  faster x3   (2169617 -> 570927)

RGB24 -> ABGR8888 :  faster x3   (2977061 -> 925975)
RGB24 -> ARGB8888 :  faster x3   (2978148 -> 923680)
RGB24 -> BGR888 :  faster x3   (3001413 -> 935074)
RGB24 -> BGRA8888 :  faster x3   (2959003 -> 924096)
RGB24 -> BGRX8888 :  faster x3   (2965240 -> 927100)
RGB24 -> RGB888 :  faster x3   (2983921 -> 926063)
RGB24 -> RGBA8888 :  faster x3   (2963908 -> 925457)
RGB24 -> RGBX8888 :  faster x3   (2967957 -> 931700)

RGB888 -> BGR24 :  faster x3   (2173299 -> 563226)
RGB888 -> RGB24 :  faster x3   (2218374 -> 566164)

RGBA8888 -> BGR24 :  faster x3   (2166355 -> 561381)
RGBA8888 -> RGB24 :  faster x3   (2170322 -> 566729)

RGBX8888 -> BGR24 :  faster x3   (2168524 -> 564072)
RGBX8888 -> RGB24 :  faster x3   (2163680 -> 566956)

===== BlitNtoN ========

BGR24 -> BGRA8888 :  faster x3   (2458958 -> 797557)
BGR24 -> BGRX8888 :  faster x3   (2486085 -> 797745)
BGR24 -> RGBA8888 :  faster x3   (2422116 -> 797637)
BGR24 -> RGBX8888 :  faster x3   (2454426 -> 799085)

BGRA8888 -> BGR24 :  faster x4   (2468206 -> 524486)
BGRA8888 -> RGB24 :  faster x4   (2463581 -> 525561)

BGRX8888 -> BGR24 :  faster x4   (2583355 -> 524468)
BGRX8888 -> RGB24 :  faster x4   (2477242 -> 524284)

RGB24 -> BGRA8888 :  faster x2   (2453414 -> 818415)
RGB24 -> BGRX8888 :  faster x3   (2414915 -> 800863)
RGB24 -> RGBA8888 :  faster x3   (2461114 -> 798148)
RGB24 -> RGBX8888 :  faster x3   (2400922 -> 799203)

RGBA8888 -> BGR24 :  faster x4   (2494472 -> 526428)
RGBA8888 -> RGB24 :  faster x4   (2462260 -> 526791)

RGBX8888 -> BGR24 :  faster x4   (2541115 -> 524390)
RGBX8888 -> RGB24 :  faster x4   (2469059 -> 525416)
2019-02-09 17:20:53 +01:00
Sylvain Becker 604b44f20f Fix wrong access and simplify 2019-02-08 17:15:30 +01:00
Sylvain Becker 5ed30f844d Some simplification of previous commit 2019-02-07 22:45:50 +01:00
Sylvain Becker 5fd228921c Faster blit with CopyAlpha, no ColorKey
Applied to following formats:

ABGR8888 -> BGRA8888 :  faster x3   (2727179 -> 704761)
ABGR8888 -> RGBA8888 :  faster x3   (2707808 -> 705309)

ARGB8888 -> BGRA8888 :  faster x3   (2745371 -> 712437)
ARGB8888 -> RGBA8888 :  faster x3   (2746230 -> 705236)

BGRA8888 -> ABGR8888 :  faster x3   (2745026 -> 707045)
BGRA8888 -> ARGB8888 :  faster x3   (2752760 -> 727373)
BGRA8888 -> RGBA8888 :  faster x3   (2769544 -> 704607)

RGBA8888 -> ABGR8888 :  faster x3   (2725058 -> 706669)
RGBA8888 -> ARGB8888 :  faster x3   (2704866 -> 707132)
RGBA8888 -> BGRA8888 :  faster x3   (2710351 -> 704615)
2019-02-07 22:03:30 +01:00
Sylvain Becker 704e62bbf4 Code factorization of the pixel format permutation 2019-02-07 21:49:24 +01:00
Sylvain Becker 0a007a9bea Fix wrong comment 2019-02-07 18:52:49 +01:00
Sylvain Becker e5192384d0 Faster blit with no ColorKey
Applied to following formats:

ABGR8888 -> BGRX8888 :  faster x5   (3177493 -> 630439)
ABGR8888 -> RGBX8888 :  faster x5   (3178104 -> 628925)

ARGB8888 -> BGRX8888 :  faster x4   (3141089 -> 629448)
ARGB8888 -> RGBX8888 :  faster x5   (3216413 -> 630465)

BGR888 -> BGRA8888 :  faster x4   (3145403 -> 637701)
BGR888 -> BGRX8888 :  faster x4   (3142106 -> 630144)
BGR888 -> RGBA8888 :  faster x4   (3202685 -> 649384)
BGR888 -> RGBX8888 :  faster x4   (3170617 -> 658670)

BGRA8888 -> BGR888 :  faster x4   (3203308 -> 657697)
BGRA8888 -> RGB888 :  faster x5   (3201475 -> 631747)
BGRA8888 -> RGBX8888 :  faster x5   (3274544 -> 630409)

BGRX8888 -> ABGR8888 :  faster x4   (3149753 -> 638682)
BGRX8888 -> ARGB8888 :  faster x5   (3164101 -> 631273)
BGRX8888 -> BGR888 :  faster x4   (3144454 -> 630712)
BGRX8888 -> RGB888 :  faster x4   (3160490 -> 638047)
BGRX8888 -> RGBA8888 :  faster x5   (3308988 -> 631232)
BGRX8888 -> RGBX8888 :  faster x5   (3216775 -> 638065)

RGB888 -> BGRA8888 :  faster x4   (3143135 -> 655146)
RGB888 -> BGRX8888 :  faster x4   (3141790 -> 653771)
RGB888 -> RGBA8888 :  faster x5   (3214402 -> 637001)
RGB888 -> RGBX8888 :  faster x4   (3143082 -> 630009)

RGBA8888 -> BGR888 :  faster x3   (3157048 -> 920375)
RGBA8888 -> BGRX8888 :  faster x5   (3196692 -> 632996)
RGBA8888 -> RGB888 :  faster x4   (3141570 -> 652151)

RGBX8888 -> ABGR8888 :  faster x5   (3175401 -> 631218)
RGBX8888 -> ARGB8888 :  faster x4   (3144690 -> 639440)
RGBX8888 -> BGR888 :  faster x4   (3144250 -> 630171)
RGBX8888 -> BGRA8888 :  faster x5   (3220321 -> 630731)
RGBX8888 -> BGRX8888 :  faster x4   (3178453 -> 637445)
RGBX8888 -> RGB888 :  faster x5   (3203623 -> 632596)
2019-02-07 18:51:14 +01:00
Sylvain Becker 7372295ec9 Faster blit when using No Alpha or Set Alpha, + ColorKey
Applied to following formats:

ABGR8888 -> BGRX8888 :  faster x4   (2794295 -> 610587)
ABGR8888 -> RGB888 :  faster x4   (2835693 -> 615561)
ABGR8888 -> RGBX8888 :  faster x4   (2880475 -> 610479)

ARGB8888 -> BGR888 :  faster x4   (2802718 -> 610702)
ARGB8888 -> BGRX8888 :  faster x4   (2792481 -> 606311)
ARGB8888 -> RGBX8888 :  faster x4   (2821621 -> 624745)

BGR888 -> ARGB8888 :  faster x4   (2791705 -> 637889)
BGR888 -> BGRA8888 :  faster x4   (2793195 -> 652299)
BGR888 -> BGRX8888 :  faster x4   (2800713 -> 609326)
BGR888 -> RGB888 :  faster x4   (2812260 -> 610471)
BGR888 -> RGBA8888 :  faster x4   (2792327 -> 629288)
BGR888 -> RGBX8888 :  faster x4   (2799224 -> 607073)

BGRA8888 -> BGR888 :  faster x4   (2800520 -> 606897)
BGRA8888 -> RGB888 :  faster x4   (2825274 -> 616156)
BGRA8888 -> RGBX8888 :  faster x4   (2812530 -> 610340)

BGRX8888 -> ABGR8888 :  faster x4   (2793940 -> 628596)
BGRX8888 -> ARGB8888 :  faster x4   (2822686 -> 638899)
BGRX8888 -> BGR888 :  faster x4   (2818141 -> 613659)
BGRX8888 -> RGB888 :  faster x4   (2929017 -> 611794)
BGRX8888 -> RGBA8888 :  faster x4   (2799709 -> 629750)
BGRX8888 -> RGBX8888 :  faster x4   (2911010 -> 605640)

RGB888 -> ABGR8888 :  faster x4   (2800671 -> 631542)
RGB888 -> BGR888 :  faster x4   (2802644 -> 604461)
RGB888 -> BGRA8888 :  faster x4   (2801919 -> 628729)
RGB888 -> BGRX8888 :  faster x4   (2938244 -> 604135)
RGB888 -> RGBA8888 :  faster x4   (2912447 -> 642185)
RGB888 -> RGBX8888 :  faster x4   (2831676 -> 634293)

RGBA8888 -> BGR888 :  faster x4   (2928896 -> 614960)
RGBA8888 -> BGRX8888 :  faster x4   (2821422 -> 608146)
RGBA8888 -> RGB888 :  faster x4   (2825927 -> 617184)

RGBX8888 -> ABGR8888 :  faster x4   (2803852 -> 654129)
RGBX8888 -> ARGB8888 :  faster x4   (2923615 -> 642644)
RGBX8888 -> BGR888 :  faster x4   (2806523 -> 610447)
RGBX8888 -> BGRA8888 :  faster x4   (2813388 -> 630305)
RGBX8888 -> BGRX8888 :  faster x4   (2800052 -> 607881)
RGBX8888 -> RGB888 :  faster x4   (2807722 -> 610263)
2019-02-07 17:52:28 +01:00
Sylvain Becker bb9a9080dc Fix pointer warnings 2019-02-07 16:13:25 +01:00
Sylvain Becker 3543a44ae4 Faster blit when using CopyAlpha + ColorKey
Applied to following formats:

ABGR8888 -> ARGB8888 :  faster x7   (3959672 -> 537227)
ABGR8888 -> BGRA8888 :  faster x7   (4008716 -> 532064)
ABGR8888 -> RGBA8888 :  faster x7   (3998576 -> 530964)

ARGB8888 -> ABGR8888 :  faster x7   (3942420 -> 532503)
ARGB8888 -> BGRA8888 :  faster x7   (3995382 -> 527722)
ARGB8888 -> RGBA8888 :  faster x7   (4259330 -> 543033)

BGRA8888 -> ABGR8888 :  faster x7   (4110411 -> 529402)
BGRA8888 -> ARGB8888 :  faster x7   (4071906 -> 538393)
BGRA8888 -> RGBA8888 :  faster x6   (4038320 -> 585141)

RGBA8888 -> ABGR8888 :  faster x7   (3937018 -> 534127)
RGBA8888 -> ARGB8888 :  faster x7   (3979577 -> 537810)
RGBA8888 -> BGRA8888 :  faster x7   (3975656 -> 528355)
2019-02-07 15:12:17 +01:00
Sylvain Becker 7b8bac5958 Add fast paths in BlitNtoNKey
All following conversions are faster (with colorkey, but no blending).
(ratio isn't very accurate)

ABGR8888 -> BGR888 :  faster x9   (2699035 -> 297425)

ARGB8888 -> RGB888 :  faster x8   (2659266 -> 296137)

BGR24 -> BGR24 :  faster x5   (2232482 -> 445897)
BGR24 -> RGB24 :  faster x4   (2150023 -> 448576)

BGR888 -> ABGR8888 :  faster x8   (2649957 -> 307595)

BGRA8888 -> BGRX8888 :  faster x9   (2696041 -> 297596)

BGRX8888 -> BGRA8888 :  faster x8   (2662011 -> 299463)
BGRX8888 -> BGRX8888 :  faster x9   (2733346 -> 295045)

RGB24 -> BGR24 :  faster x4   (2154551 -> 485262)
RGB24 -> RGB24 :  faster x4   (2149878 -> 484870)

RGB888 -> ARGB8888 :  faster x8   (2762877 -> 324946)

RGBA8888 -> RGBX8888 :  faster x8   (2657855 -> 297753)

RGBX8888 -> RGBA8888 :  faster x8   (2661360 -> 296655)
RGBX8888 -> RGBX8888 :  faster x8   (2649287 -> 308268)
2019-01-30 22:50:20 +01:00
Sylvain Becker a052d81bdf Add explicit unsigned int and char types in (for bug 4290) 2019-01-30 15:31:07 +01:00
Sylvain Becker 1128d57316 Fixed bug 4290 - add fastpaths for format conversion in BlitNtoN
All following conversion are faster (no colorkey, no blending).
(ratio isn't very accurate)

ABGR8888 -> ARGB8888 :  faster x6   (2655837 -> 416607)
ABGR8888 -> BGR24 :  faster x7   (2470117 -> 325693)
ABGR8888 -> RGB24 :  faster x7   (2478107 -> 335445)
ABGR8888 -> RGB888 :  faster x9   (3178524 -> 333859)

ARGB8888 -> ABGR8888 :  faster x6   (2648366 -> 406977)
ARGB8888 -> BGR24 :  faster x7   (2474978 -> 327819)
ARGB8888 -> BGR888 :  faster x9   (3189072 -> 326710)
ARGB8888 -> RGB24 :  faster x7   (2473689 -> 324729)

BGR24 -> ABGR8888 :  faster x6   (2268763 -> 359946)
BGR24 -> ARGB8888 :  faster x6   (2306393 -> 359213)
BGR24 -> BGR888 :  faster x6   (2231141 -> 324195)
BGR24 -> RGB24 :  faster x4   (1557835 -> 322033)
BGR24 -> RGB888 :  faster x6   (2229854 -> 323849)

BGR888 -> ARGB8888 :  faster x8   (3215202 -> 363137)
BGR888 -> BGR24 :  faster x7   (2474775 -> 347916)
BGR888 -> RGB24 :  faster x7   (2532783 -> 327354)
BGR888 -> RGB888 :  faster x9   (3134634 -> 344987)

RGB24 -> ABGR8888 :  faster x6   (2229486 -> 358919)
RGB24 -> ARGB8888 :  faster x6   (2271587 -> 358521)
RGB24 -> BGR24 :  faster x4   (1530913 -> 321149)
RGB24 -> BGR888 :  faster x6   (2227284 -> 327453)
RGB24 -> RGB888 :  faster x6   (2227125 -> 329061)

RGB888 -> ABGR8888 :  faster x8   (3163292 -> 362445)
RGB888 -> BGR24 :  faster x7   (2469489 -> 327127)
RGB888 -> BGR888 :  faster x9   (3190526 -> 326022)
RGB888 -> RGB24 :  faster x7   (2479084 -> 324982)
2019-01-30 15:23:33 +01:00
Sam Lantinga 5e13087b0f Updated copyright for 2019 2019-01-04 22:01:14 -08:00
Sam Lantinga 6e35e42145 Working on bug 3921 - Add some Fastpath to BlitNtoNKey and BlitNtoNKeyCopyAlpha
Sylvain

I did various benches. with clang 6.0.0 on linux, and ndk-r16b on android (NDK_TOOLCHAIN_VERSION=clang).

- still see a x10 speed factor.
- with duff_loops, it does not use vectorisation (but doesn't seem to be a problem).

on linux my patch is already at full speed on -O2, whereas the duff_loops need -O3 (200 ms at -03, and 300ms at -02).

I realized that on Android, I had a slight variation which fits best.
both on linux with -O2 and -O3, and on android with 02/03 and armeabi-v7a/arm64.

Here's the patch.
2018-10-01 14:43:03 -07:00
Ozkan Sezer 922623e1b6 SDL_blit_N.c (BlitNtoNKeyCopyAlpha): fix -Wshadow warnings by adding _
suffix to the temp Pixel local in the DUFFS_LOOP.
SDL_blit.h (ASSEMBLE_RGB):  add _ prefix to temp Pixel locals to avoid
  any possible shadowings.


The warnings were like the following:

In file included from src/video/SDL_blit_N.c:26:0:
src/video/SDL_blit_N.c: In function 'BlitNtoNKeyCopyAlpha':
src/video/SDL_blit_N.c:2421:24: warning: declaration of 'Pixel' shadows a previous local [-Wshadow]
                 Uint32 Pixel = ((*src32 & rgbmask) == ckey) ? *dst32 : *src32;
                        ^
src/video/SDL_blit.h:475:21: note: in definition of macro 'DUFFS_LOOP8'
     case 0: do {    pixel_copy_increment; /* fallthrough */             \
                     ^
src/video/SDL_blit_N.c:2419:13: note: in expansion of macro 'DUFFS_LOOP'
             DUFFS_LOOP(
             ^
src/video/SDL_blit_N.c:2399:12: warning: shadowed declaration is here [-Wshadow]
     Uint32 Pixel;
            ^
2018-10-01 21:29:11 +03:00
Sam Lantinga 7df0f4fdac Fixed bug 4277 - warnings patch
Sylvain

Patch a few warnings when using:
-Wmissing-prototypes -Wdocumentation -Wdocumentation-unknown-command

They are automatically enabled with -Wall
2018-09-27 14:56:29 -07:00
Sam Lantinga e3cc5b2c6b Updated copyright for 2018 2018-01-03 10:03:25 -08:00
Philipp Wiesemann e5d9b25d8c Fixed comment style. 2017-02-26 21:20:39 +01:00
Sam Lantinga 45b774e3f7 Updated copyright for 2017 2017-01-01 18:33:28 -08:00
Sam Lantinga 818d1d3e80 Fixed bug 1646 - Warnings from clang with -Weverything 2016-11-15 01:30:08 -08:00
Sam Lantinga 39ba2ab835 Fixed NULL pointer dereference, thanks Ozkan Sezer 2016-10-22 17:53:03 -07:00
Sam Lantinga 5b14a943a8 Fixed bug 3466 - Can't build 2.0.5 on ppc64
/home/fedora/SDL2-2.0.5/src/video/SDL_blit_N.c: In function 'calc_swizzle32':
/home/fedora/SDL2-2.0.5/src/video/SDL_blit_N.c:127:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
     const vector unsigned char plus = VECUINT8_LITERAL(0x00, 0x00, 0x00, 0x00,
     ^
2016-10-22 11:01:55 -07:00
Sam Lantinga 42065e785d Updated copyright to 2016 2016-01-02 10:10:34 -08:00
Philipp Wiesemann 0e45984fa0 Fixed crash if initialization of EGL failed but was tried again later.
The internal function SDL_EGL_LoadLibrary() did not delete and remove a mostly
uninitialized data structure if loading the library first failed. A later try to
use EGL then skipped initialization and assumed it was previously successful
because the data structure now already existed. This led to at least one crash
in the internal function SDL_EGL_ChooseConfig() because a NULL pointer was
dereferenced to make a call to eglBindAPI().
2015-06-21 17:33:46 +02:00