Sometimes API bindings are not officially generated for all languages. Generating code from an API definition makes the binding process of an API a much more simple process than translating it field by field and function by function. When there is no public API definition, one way to overcome those problems is by translating from official code definitions.
Background
This blog is currently being written in Notion and deployed automatically every night in a Github Action. See my last project to know more about this.
The mentioned project currently uses a library written in Go to communicate with Notion’s API. Although, this library is being updated by contributing users (not working for Notion) and it makes it hard to stay updated to the official API; and copying definitions by hand introduces many unnoticeable bugs.
Proposed Solution
One possible solution I read was using Wasmer to create bindings to the Typescript API, so any language (that supports Wasmer) could create bindings over that WebAssembly runtime. This solution has portability as its main trait. I automatically discarded this option because, as you could have guessed, it has the extreme overhead of depending on a WebAssembly runtime; which is a lot for such a small API.
Another solution, proposed by one of the library contributors, was generating the structs based on the official implementation of the API. This would solve all of this problems; and also reduce significantly the amount of time to keep up to date.
This proposed solution mentioned a really interesting method used by the LSP implementation of Go,gopls
. Too keep up with the protocol definition of Language Server Protocol (written in typescript), they use the typescript compiler API to inspect the definitions and generate Go code from these.At first, I thought it was a crazy idea (well, I still think it). But I couldn’t hold myself from trying to implement it, so here is how it went.
Implementation
There are a lot of questions regarding how to translate Typescript type definitions to have usable code and with the same quality as handmade translations.
Type aliases
Notion’s SDK uses Typescript’s type aliases to define the model types of the communication between client and API.
type IdRequest = string | string
type TextRequest = string
type RichTextItemRequest =
| {
text: { content: string; link?: { url: TextRequest } | null }
type?: "text"
annotations?: {
bold?: boolean
italic?: boolean
strikethrough?: boolean
underline?: boolean
code?: boolean
color?:
| "default"
| "gray"
| "brown"
| "orange"
| "yellow"
| "green"
| "blue"
| "purple"
| "pink"
| "red"
| "gray_background"
| "brown_background"
| "orange_background"
| "yellow_background"
| "green_background"
| "blue_background"
| "purple_background"
| "pink_background"
| "red_background"
}
}
...
We can easily loop for each type alias in a file using the compiler API.
The problem comes with getting the type representation of those. To simplify the AST types, I created two structs: one to hold a type definition and another for attributes definition.
interface TypeDef {
id?: string;
// type will be defined if it is a basic type.
type?: string;
// value will be defined if it is a literal type
value?: any;
// attributes will be defined if it is a type literal.
attributes?: AttribDef[];
level?: number;
isInterface?: boolean;
}
...
interface AttribDef {
id: string;
optional?: boolean;
type: TypeDef;
jsonName?: string;
}
First of all, sorry for the all optional properties, but I don’t know the idiomatic ways. This structs store some useful information about the types: where are they defined, which identifier and even a value if they are a constant.
The next step seems easy, just navigate the AST recursively storing the types represented with this structs. But reality strikes when you get deeper into Typescript type aliases declarations.
type BlockObjectRequest =
| {
heading_1: { text: Array<RichTextItemRequest> }
type?: "heading_1"
object?: "block"
}
| {
heading_2: { text: Array<RichTextItemRequest> }
type?: "heading_2"
object?: "block"
}
| {
heading_3: { text: Array<RichTextItemRequest> }
type?: "heading_3"
object?: "block"
}
| {
embed: { url: string; caption?: Array<RichTextItemRequest> }
type?: "embed"
object?: "block"
}
The code above is an example of a type alias used to define blocks in Notion. For someone who hasn’t touched Typescript ever (like me until 5 days ago), those vertical lines mean it is a union type.
Union types
A union type is a type that can be any of the sub-types in the union. Trying to express the concept of a union type is difficult in Go.
What it is being done in the Go version of the API is defining an interface type Block
and implement concrete types for each of the sub-types of the union, such as Heading1
, Heading2
… Those concrete types implement the interface.
The important concept of this is that we have to extract an interface with the common attributes of the union (in the previous example, type and object attributes). Also, we will later have to decide the name of the concrete types.
Creating the interface
In order to create the interface, we will iterate all the children AST nodes of the union type to get our own representation of a type tree.
Once we have our own representation of the union subtypes, we can intersect the array of attributes so we keep the common ones.
The common attributes will be in all concrete types and they will also be part of the interface as methods to be implemented by the concrete types.
Embedding types
If you look close to the official API, the type contains a lot of embedded structs, which have no name. My first implementation consisted on simply embedding types as the official API did, giving them no name. This worked very well and I was able to generate most of the content easily.
But, as you could have imagined, it couldn’t be perfect. Yes, the types followed the API definitions. Although, being able to have a small codebase and reusing common structs is a good practice that I couldn’t just ignore.
Embedding 100 times the definition of a RichText (text with formatting, links, etc.) is probably the worst decision you can make. Take in mind that you couldn’t be able to write any function that processed the RichText, and creating an interface would be impossible because of the unnamed types.
So… I had to follow another path, I called it context naming.
Context naming
Context naming is a solution for assigning names for unnamed embedded structs. To do so, we use the closest name we can find to be related with the type. For example:
type BlockEmbed struct {
Embed struct {
URL string
}
...
}
Could be easily converted into:
type Embed struct {
Url string
}
type BlockEmbed struct {
Embed Embed
}
In case we found some previously defined type with the same name, we could compare them and check if they contained the same fields. In case of being able to use the existing type or update it with optional attributes, we wouldn’t need to create this new type.
In case the existing type could not be reused or merged, we could add more context to the name, such as BlockEmbedEmbed
.
If we follow Go’s name convention, we also need to convert the names generated to Go ones; from snake_case, as the API uses, to CamelCase. Also, some names such as Id
or Url
are usually used as all uppercases, like ID
.
Bad news 😿
Okay… so here is when I decided to leave this project. I didn’t expect deciding names for the types would become the most tedious task. There are some name conventions in Go, and we also had to follow the style of the API. We could have rewritten a new API implementation using our own name conventions, but the purpose of this project was to automate and simplify (it’s okay, you can laugh) the existing API implementation instead of starting from zero.
Not because the task was too difficult, I could have solved name collisions by splitting parent names and taking more context if names collided. It is because I realize an automatic script wouldn’t think of names as good as a human.
Think of the last example of embed
, but now using the official API definitions. This will generate:
type Block interface {}
...
type Embed struct {
Url string
...
}
type BlockEmbed struct {
Embed Embed
...
}
In case the types didn’t match, we will end up with names like BlockEmbedEmbed
and trust me, this one is not the worst that could be generated.
Also, this is not the main reason I’m not continuing this project. The main reason is that the original intention of Notion was to make an OpenAPI public definition so the model types, server and client would be generated automatically. This could be used to generate the bindings in Go with just a command using existing tools, such as oapi-codegen.
Conclusion
Typescript compiler API is an excellent resource to make any kind of inspecting tool for the language; such as looking for bad practices, automated documentation or even generating code.
Although, language translation is not an easy task when dealing with totally different languages and, even worse, paradigms. The translation could be implemented to suit the Notion API by implementing many more heuristics and specific conditions, but waiting for the “promised” OpenAPI definition may be the smartest solution.
It was a great topic to learn about and I hope more languages implement some kind of language inspection tools.