Some source code wasn't part of `FMT_PATHS` so wasn't checked for
correct formatting. This change includes all this source code and
excludes cgo/testdata because it contains files that can't be parsed.
This results in smaller and likely more efficient code. It does require
some architecture specific code for each architecture, but I've kept the
amount of code as small as possible.