I found that when I enable ThinLTO, a miscompilation triggers that had
been hidden all the time previously. The bug appears to happen as
follows:
1. TinyGo generates a function with a runtime.trackPointer call, but
without an alloca (or the alloca gets optimized away).
2. LLVM sees that no alloca needs to be kept alive across the
runtime.trackPointer call, and therefore it adds the 'tail' flag.
One of the effects of this flag is that it makes it undefined
behavior to keep allocas alive across the call (which is still safe
at that point).
3. The GC lowering pass adds a stack slot alloca and converts
runtime.trackPointer calls into alloca stores.
The last step triggers the bug: the compiler inserts an alloca where
there was none before but that's not valid as long as the 'tail' flag is
present.
This patch fixes the bug in a somewhat dirty way, by always creating a
dummy alloca so that LLVM won't do the optimization in step 2 (and
possibly other optimizations that rely on there being no alloca
instruction).
This implements the block-based GC as a partially precise GC. This means
that for most heap allocations it is known which words contain a pointer
and which don't. This should in theory make the GC faster (because it
can skip non-pointer object) and have fewer false positives in a GC
cycle. It does however use a bit more RAM to store the layout of each
object.
Right now this GC seems to be slower than the conservative GC, but
should be less likely to run out of memory as a result of false
positives.
LLVM wasn't aware that runtime.stackChainStart must be kept live and
can't be optimized away. With this hack, it is forced to consider
stackChainStart live at the time of the stack scan.
This fixes the corruption in https://github.com/tinygo-org/tinygo/issues/3277
This reverts commit 0b3a7280fa and updates
the documentation a little bit to explain the purpose of -gc=none. (I'm
thinking about the attiny10 by the way where defaulting to -gc=none
makes sense).
Scanning of allocas was entirely broken on WebAssembly. The code
intended to do this was never run. There were also no tests.
Looking into this further, I found that it is actually not really
necessary to do that: the C stack can be scanned conservatively and in
fact this was already done for goroutine stacks (because they live on
the heap and are always referenced). It wasn't done for the system stack
however.
With these fixes, I believe code should be both faster *and* more
correct.
I found this in my work to get opaque pointers supported in LLVM 15,
because the code that was never reached now finally got run and was
actually quite buggy.
The extalloc collector has been broken for a while, and it doesn't seem reasonable to fix right now.
In addition, after a recent change it no longer compiles.
In the future similar functionality can hopefully be reintroduced, but for now this seems to be the most reasonable option.
This change swaps the stack chain when switching goroutines, ensuring that the chain is maintained consistently.
This is only really currently necessary with asyncify on wasm.
The wasm build tag together with GOARCH=arm was causing problems in the
internal/cpu package. In general, I think having two architecture build
tag will only cause problems (in this case, wasm and arm) so I've
removed the wasm build tag and replaced it with tinygo.wasm.
This is similar to the tinygo.riscv build tag, which is used for older
Go versions that don't yet have RISC-V support in the standard library
(and therefore pretend to be GOARCH=arm instead).
The only architecture that actually needs special support for scanning
the stack is WebAssembly. All others allow raw access to the stack with
a small bit of assembly. Therefore, don't manually keep track of all
these objects on the stack manually and instead just use conservative
stack scanning.
This results in a massive code size decrease in the affected targets
(only tested linux/amd64 for code size) - sometimes around 33%. It also
allows for future improvements such as using proper stackful goroutines.