Playing with Go's Abstract Syntax Tree and code generation

Note: There is a project link in the end of the article if you just want to see the code!

This week I came across a not very serious issue in Go: suppose you have a series of types that implement a given interface. They have arbitrary fields and you want to serialize and deserialize them to/from JSON. It’s easy enough to do it when you know the struct you want to marshal and unmarshal, but in this case, we are talking about a JSON item we don’t know beforehand to which structure to map to.

What we would need to do in this situation is add some information to the JSON struct with the type, then assert the type with some kind of switch and unmarshal to the correct struct based on that type. Simple enough. Here’s an example:

type Doctor interface {
    Diagnose(Person)(Disease, error)
}

type FamilyPhysician struct {
    SubSpecialty string
}

type Pediatrician struct{
    HasStickers bool
}

[
    {
        "type": "FamilyPhysician",
        "entity": {
            "subSpecialty": "bureocracy"
        }
    },
    {
        "type": "Pediatrician",
        "entity": {
            "hasStickers": "true"
        }
    }
]

But what I really don’t want to is to keep a list of all the types that implement the given interface. If we’re talking about a switch, it would be probably two lines for each new type I create, and, coming from a python background, that is simply too much work.

For a while I have wanted to delve deeper into Go’s AST (Abstract Syntax Tree) package, and this gave me the perfect excuse to spend some time getting to know it better.

Let’s look at our basic requirements:

We need to parse a package to find a specific interface and it’s methods;
We need to parse a package to find all structs that implement that interface;
We need to pass that list into a code generator that produces some code with all those types;
There is no step 4.

Step 0 could be: you should be somewhat familiar with abstract syntax trees, but tl.dr is it’s a tree of all the tokens for your file/package/program that has turned your code into a structured representation of all the declarations, functions, instructions, etc.

Parse a package… any package

We start with our imports, all courtesy of Go’s standard library:

import (
    "go/ast"
    "go/parser"
    "go/token"
)

You shouldn’t need to add this as your LSP or Goland should do that work for you. If you aren’t using one, either you don’t need to read this anymore or you shouldn’t be reading this yet.

Given a directory ./somedir where our worst ever named package lives, we can load everything up like this:

dir := "./somedir"
fset := token.NewFileSet()
pkgs, err := parser.ParseDir(fset, dir, nil, parser.SkipObjectResolution)
if err != nil {
    return nil, err
}

ParseDir returns a map with the package name as key and the AST’s (multiple) as value for the packages found in our directory. We return the err if it happened because we are not animals but we don’t give more context to the error because this is our day off.

We don’t really care about the package name, only the AST, which comes as a type *ast.File. Everything is a pointer in this library, start getting used to it.

for _, pkg := range pkgs {
    for _, file := range pkg.Files {
        parseFile(file)  // cryptically named function to ensure job security
    }
}

Intermission - If you should take something away from this post, it’s this

I can’t stress this enough as this is my second time using go’s AST packages and this next line is a game changer:

ast.Print(fset, file)

This prints the file’s AST to the standard output in a readable format. Readable by humans like you and me. You can then study it and see how everything is represented, so that you can then design your code to fetch the data you want.

Fetch that interface!

We are doing some basic assumptions for this, but mainly that our interface (or later on our methods) is not nested inside some function. That would only serve to give us headaches and we don’t want any.

Our *ast.File has a nice field called Decls which stores a slice of, you guessed it ast.Decl, an interface that can be implemented by several types. Our expected type is GenDecl. From the documentation,

A GenDecl node (generic declaration node) represents an import, constant, type or variable declaration.

We will have to traverse the slice to find our interface. We also need to check if the specific declaration we’re analyzing from the slice is actually a type.

genDecl, ok := decl.(*ast.GenDecl)
if !ok {
    return
}

if genDecl.Tok != token.TYPE {
    return
}

Now we know it’s a type but is it an interface?

// note (notadoctor): at this time I decided to stop being reasonable and 
// give decent names
iFaceType, ok := typeSpec.Type.(*ast.InterfaceType)  
if !ok {
    return
}

After this we are sure we’re dealing with an interface. We’ll skip checking if it’s our target interface, just create a structure with our needed data and return up for someone else to deal with it.

// ...some time ago

type interfaceData struct{
	name    string
	methods []string
}

// so now we can do...

var data interfaceData

data.name = typeSpec.Name.Name

for _, method := range iFaceType.Methods.List {
    var methodName string
    for _, name := range method.Names {
        methodName = name.Name
    }

    iDecl.methods = append(iDecl.methods, methodName)
}

Disclaimer here, I don’t really know if method.Names can return more than one name in the methods list, but taking the last one doesn’t seem unreasonable. For that matter, taking the first one doesn’t either and should be some nanoseconds faster.

Match that Method!

Matching the method is actually easier than matching the interface. Methods are functions, functions are Decls with type *ast.FuncDecl that have a receiver (if they didn’t, they’d be normal functions). In order to match them all, we just catch all the receivers and methods they implement and check if they implement all the methods in the interface.

One thing absent in this is, of course, more strict checking of methods and interfaces. We know real world Go would check the method arguments and return types to know if a type actually implements an interface but probably this is enough for our use case. If not, you, the reader, can take it as a personal challenge to enforce real interface rules.

Onwards to match that method, given an ast.Decl!

fnDecl, ok := decl.(*ast.FuncDecl)
if !ok {
    return
}

if fnDecl.Recv == nil {
    return
}

for _, field := range fnDecl.Recv.List {
    ident := field.Type.(*ast.Ident)
    method.receiverType = ident.Name

    break
}

method.MethodName = fnDecl.Name.Name

That’s it! We now have methods with the receiver type and the method name, which we can cross with our interface and check if it’s one of the types we intend to use.

Next steps

Passing a list of types to a template generator should be easy (and out of scope for this post). The major point of our little expedition is to be able to find what we are looking for in our code so that we can generate more code from that.

The great advantage of generating code is having errors at build time, rather than runtime.

For this case, we can add a generator comment to the file where we have our interface, like this:

//go:generate go run ./gen -input=./packageofimplementers -interface=Barker -template=switch.tmpl -output=loader.go

Then adding a generate command in our Makefile:

go generate ./...
go build .

And our generated code would be regenerated everytime we run make.

Another very useful trick with the AST is reading struct tags and generating code guided by their existence. It’s a way to do away with some uses of reflection.

Both reflection and code generation are valid approaches. Both generate code that is not the nicest to read. But you will probably find your code generation bugs at compile time, and your reflection bugs on staging or production logs.

Conclusion

With a little bit of studying Go’s wonderful ast package and the AST file printouts, you can generate code at compile time that prevents you from wasting countless seconds doing stuff by hand!

I wrote a working example which you can check out in my Gitlab, here. It has pomeranians because a toy project should use a toy breed of dog.

I hope you enjoyed my writeup and learned something!