This is for consistency with Clang, which always adds a CPU flag even if
it's not specified in CFLAGS.
This commit also adds some tests to make sure the Clang target-cpu
matches the CPU property in the JSON files.
This does have an effect on the generated binaries. The effect is very
small though: on average just 0.2% increase in binary size, apparently
because Cortex-M3 and Cortex-M4 are compiled a bit differently. However,
when rebased on top of https://github.com/tinygo-org/tinygo/pull/2218
(minsize), the difference drops to -0.1% (a slight decrease on average).