This results in smaller and likely more efficient code. It does require some architecture specific code for each architecture, but I've kept the amount of code as small as possible.