We don't compile in the assembly code if compiler optimisations are disabled as
the number of registers used in the assembly code doesn't work with the -O0
option. Also anyone select -O0 probably doesn't want to compile in the assembly
code anyway.
This fix adds the ebx register to the clobber list for the i386 inline assembly
for the multiply helper function.
ebx was used but not listed, so when the compiler chose to also use it, ebx was
getting corrupted. I'm surprised this wasn't spotted sooner.
Fixes Github issues #1550.
On x32, pointers are only 4-bytes wide and need to be loaded using the "movl"
instruction instead of "movq" to avoid loading garbage into the register.
The MULADDC routines for x86-64 are adjusted to work on x32 as well by getting
gcc to load all the registers for us in advance (and storing them later) by
using better register constraints. The b, c, D and S constraints correspond to
the rbx, rcx, rdi and rsi registers respectively.