On x32, pointers are only 4-bytes wide and need to be loaded using the "movl"
instruction instead of "movq" to avoid loading garbage into the register.
The MULADDC routines for x86-64 are adjusted to work on x32 as well by getting
gcc to load all the registers for us in advance (and storing them later) by
using better register constraints. The b, c, D and S constraints correspond to
the rbx, rcx, rdi and rsi registers respectively.