Debugging/Fixing a Third-Party Go Package Bug
MongoDB's product, MMS, uses Go for all of the agents that run on MMS (Monitoring, Backup, Automation) as well as for some small utilities. All of these binaries use the gowin32 package in some form or another. I have recently begun working on a feature in the Automation agent that requires gowin32 and I noticed when writing code like the following that my binary would crash:
package main
import (
_ "github.com/winlabs/gowin32"
)
func main() {
_ = 0
}
The fatal error that occurred was:
fatal error: invalid heap pointer```
On the other hand, adding a print statement suppressed the crashing behavior:
```go
package main
import (
_ "github.com/winlabs/gowin32"
)
func main() {
_ = 0
fmt.Println("Stop crashing")
}
Adding the Println
was essentially a red
herring and threw me off for a bit. This was the first time
that I encountered an issue like this in Go and as such I
didn't apply much common sense as to what that error message
meant.
A key hint as to what's going on here is the fact that I was using Go 1.4. All of the agent code is currently using Go 1.3 and the error never occurred there.
The Poor Man's Debugger
My first thought was that if I'm importing a package and
it crashes without actually using anything from it, then
there must be some kind of init()
or side-effect
that causes the invalid heap pointer. Of course gowin32 has
no init()
s but it has global (package-scoped)
variables all over the place. The problem was finding out
which var
was screwing me over.
The best idea I could think of at the time was to simply start deleting source files until the crash disappeared. It took a while to find it, but winuser.go was the culprit. Thankfully it was a small file and the global variables were causing the crash.
Realizing the Bug
According to the Go 1.4 release notes, as of Go
1.3, the runtime is supposed to crash when it finds a memory
word that should hold a valid pointer within the Go runtime.
This means that a value like 7 in a *uint16
should crash the runtime. Looking at
MAKEINTRESOURCE
in winuser.go, it was
taking a uint16
and literally converting it into
a *uint16
. The Windows predefined RT_* resources
range from 1-24 which clearly would count as invalid
pointers. But why did this work in Go 1.3 if the runtime
should crash during garbage collection? It turns out the
garbage collector in Go 1.3 ignored invalid pointers that are
located on the heap (see
https://github.com/golang/go/issues/9454#issuecomment-68169315).
Reproducing the Bug
After realizing what was going on, I just wanted to prove I was right. Coming up with the following reproduction was simple and nice to explain:
package main
import (
"runtime"
"unsafe"
)
func MAKEINTRESOURCE(integer uint16) *uint8 {
return (*uint8)(unsafe.Pointer(uintptr(integer)))
}
var (
test1 = MAKEINTRESOURCE(1)
)
func main() {
runtime.GC()
}
Forcing the garbage collector to run would always crash the runtime with an invalid heap pointer in this example.
Fixing the Bug
This fix was nice and short. I swapped out the
*uint16
for a uintptr which is designed to hold
the memory address value as an integral value. Being an
integral type and not a pointer, it doesn't get treated as a
pointer in the runtime. This enables us to pass around the
Windows Resource
references all over the code
and all the way down to the syscalls. Thankfully this fix was
also backwards compatible with existing code due to type
aliasing of the original *uint16
from
MAKEINTRESOURCE
.
For the actual pull request, see https://github.com/winlabs/gowin32/pull/38.
Retrospective
It turns out that this bug was keeping us from upgrading to Go 1.4 on the binaries that were using gowin32. I wasn't aware of this since I stumbled on the bug in my free time. Another engineer came to the same conclusion as to what was wrong but arrived at the answer in a different way. In retrospect, I would have solved this faster had I actually looked for the JIRA ticket; but it was a nice exercise in debugging and I learned a lot more about Go internals.