This is the one that splits the "left wing" into two for loops to
bubble out the conditional that decides if it should read from the
left padding or the input buffer.
I still believe the optimization is good, but the basic logic of it
was incorrect, and needs to be reexamined and fixed before going
back into revision control.
- Calculate `j * RESAMPLER_SAMPLES_PER_ZERO_CROSSING` once per loop
iteration since we use it multiple times.
- Do the left-wing loop in two sections: while `srcframe < 0` and then
the remaining calculations when `srcframe >= 0`. This bubbles a conditional
out of every iteration of a tight loop, giving us a boost. We could
_probably_ do this to the right-wing loop too, but it's less straightforward
there.
- The real win: Use floats instead of doubles. This almost doubles the speed
of the entire function on Intel CPUs, and for embedded things without
hardware-level support for doubles, the speedup is enormous. This in
theory might reduce audio quality, though, and I had to put a check in
place to avoid a division-by-zero that we avoided at higher precision, but
this is likely to be worth keeping for at least the Sony PSP and other
smaller platforms, if not everyone.
- reorganize the loop which checks for the right wave-format
- use the return value of UpdateAudioStream
- ensure SetError is called in SDL_NewAudioStream
Don't rely on checking __clang_major__ since it is not comparable
between different vendors. Don't use "#pragma clang attribute" since it
is only available in relatively recent versions, there's no obvious way
to check if it's supported, and just using __attribute__ directly (for
gcc as well) results in simpler code anyway.
This is needed to support CHERI, and thus Arm's experimental Morello
prototype, where pointers are implemented using unforgeable capabilities
that include bounds and permissions metadata to provide fine-grained
spatial and referential memory safety, as well as revocation by sweeping
memory to provide heap temporal memory safety.
On most systems (anything with a flat memory hierarchy rather than using
segment-based addressing), size_t and uintptr_t are the same type.
However, on CHERI, size_t is just an integer offset, whereas uintptr_t
is still a capability as described above. Casting a pointer to size_t
will strip the metadata and validity tag, and casting from size_t to a
pointer will result in a null-derived capability whose validity tag is
not set, and thus cannot be dereferenced without faulting.
The audio and cursor casts were harmless as they intend to stuff an
integer into a pointer, but using uintptr_t is the idiomatic way to do
that and silences our compiler warnings (which our build tool makes
fatal by default as they often indicate real problems). The iconv and
egl casts were true positives as SDL_iconv_t and iconv_t are pointer
types, as is NativeDisplayType on most OSes, so this would have trapped
at run time when using the round-tripped pointers. The gles2 casts were
also harmless; the OpenGL API defines this argument to be a pointer type
(and uses the argument name "pointer"), but it in fact represents an
integer offset, so like audio and cursor the additional idiomatic cast
is needed to silence the warning.
On modern CPUs, there's no penalty for using the unaligned instruction on
aligned memory, but now it can vectorize unaligned data too, which even if
it's not optimal, is still going to be faster than the scalar fallback.
Fixes#4532.
janisozaur
There are many cases which are not able to be handled by SDL's audio conversion routines, including too low (negative) rate, too high rate (impossible to allocate).
This patch aims to report such issues early and handle others in a graceful manner. The "INT32_MAX / RESAMPLER_SAMPLES_PER_ZERO_CROSSING" value is the conservative approach in terms of what can _technically_ be supported, but its value is 4'194'303, or just shy of 4.2MHz. I highly doubt any sane person would use such rates, especially in SDL2, so I would like to drive this limit further down, but would need some assistance to do that, as doing so would have to introduce an arbitrary value. Are you OK with such approach? What would a good value be? Wikipedia (https://en.wikipedia.org/wiki/High-resolution_audio) lists 96kHz as the highest sampling rate in use, even if I quadruple it for a good measure, to 384kHz it's still an order of magnitude lower than 4MHz.