I Ditched Jekyll and Built a Static Site Generator Based on Go and SQLite

Sunday, December 29, 2024

I've got quite the yak shave to share (have you shaved a yak before? If not, I highly recommend it).

It goes like this:

You start with the desire to wax your car.
To wax your car, you need a water hose. Only, your water hose is busted so you need to go down to the hardware store to get a new hose.
To get to the hardware store, you have to drive across a bridge. The bridge requires a pass or ticket. You can't find your pass, but you know your neighbor has one.
However, your neighbor won't lend you his pass until you return a pillow that you borrowed. The reason you haven't returned it is because the pillow is missing some stuffing.
The pillow was originally stuffed with yak hair. In order to re-stuff the pillow you need to get some new yak hair.
And that's how you end up shaving a yak, when all you really wanted to do was wax your car.

In summary: I had an interesting idea I wanted to write in long form (not on Twitter!), but it had been quite a while since I wrote a blog post. Unfortunately, so many years had passed since my previous blog post that I couldn't even get Jekyll to work. So, naturally, I ended up building a static site generator.

But First, Some History ¶

My first blog was based on WordPress. It served as my journal while I studied abroad in China in 2007.

It worked pretty well! However, it used a gnarly cPanel-based "infrastructure" and was hosted on a fairly iffy host called midPhase (which is still around, somehow?).

For one reason or another, I decided to not renew my hosting plan with them, so they shut down my account and deleted all of my content. Years later, I regretted this decision.

Fortunately, I had been syndicating all of the content of that blog through Feedburner, and I was able to extract all of the posts I made using their RSS feed. I lost my photos, though.

Aside: I'm not really sure what happened to Feedburner, but whatever it is now, was not what it used to be. I just logged in and don't really understand what it is.

Several years later (2009/2010), I started blogging again, albeit with some infrequency, on Blogger. I posted only a few times there, but it still lives. I haven't had the heart, or desire, to take it down.

The Age of Jekyll ¶

At some point soon thereafter (2010-2011ish), I must have gotten annoyed with having all of my content owned and hosted by a third-party (having been burned by my experience with my original WordPress blog), and also wanted to try out a cool new static-site generator called Jekyll written by Tom Preston-Werner. This post from Chris Parsons is emblematic of the Jekyll zeitgeist in late 2009 (HN submission here).

At the time, Jekyll was also the only supported service for deploying static sites to GitHub Pages. This made the setup process very straightforward and easy.

For the next several years, writing new posts was a breeze; create new markdown document. Save. Commit. Push. Published.

About a year later, I decided to add a gem or two that wasn't compatible with GitHub Pages, so as a result, I migrated everything to CloudFront and S3. That was fun.

And then I took a little bit of a blogging break.

Oops.

What I hadn't considered was that Ruby is not a platform or language that cares too much about backwards-compatibility. I also didn't consider that Jekyll was early-stage software and would change a lot (anyone could have predicted this 😬).

To be fair, this wasn't a problem for quite a while. But as the years piled on, the Ruby ecosystem's penchant for breaking changes started to slowly bite me: Jekyll 4 was released in 2019. And then Ruby 3 came out in 2020. And soon enough I realized I literally couldn't publish a new blog post without upgrading >10 gems (iykyk).

So I just stopped!

Which, when you think about it, is kind of wild: I stopped blogging because I literally couldn't run bundle exec jekyll build. And it just wasn't going to be worth the time to migrate everything to the latest versions. I kept pushing it off, and, before you know it, several years had passed by.

Durable, Portable, and Easy-to-understand ¶

In the last two weeks, I decided enough was enough. I needed to make the call on whether to nuke my entire setup, or try and figure out a migration strategy to a new system. Jekyll just wasn't going to cut it; even if I fixed the issues now, in just a few years this entire problem would resurface just like a game of whack-a-mole.

Not to mention, I also have a few other static sites running Jekyll, all with the same issues, and they would need to be moved too. But what system to migrate to? Obviously, it's never anyone's immediate thought to build their own static site generator, so I wasn't going to do that. I'm not that unhinged (or am I?).

So I came up with a list of good alternatives, and I literally ported my blog to each one of them. Aside: I still kind of can't believe I did that.

With some human assistance, ChatGPT threw together this feature matrix, which does a fairly good job of summarizing the pros/cons between the options I landed on:

Feature	Hugo	Vite / React Router	Astro
Speed	🟢 Fastest	🟠 Moderate (JavaScript-heavy)	🟢 Fast (optimized builds)
Mental Overhead	🟢 Low (simple templates)	🟠 Moderate (React concepts, routing)	🟢 Low (modern, intuitive workflow)
Interactivity	🔴 Limited	🟢 Full React SPA/MPA	🟢 Hybrid (JS islands)
Flexibility	🟢 Content-focused	🟢 Component-based SPA	🟢 Best of both worlds
Community Support	🟢 Mature	🟢 Mature (React ecosystem)	🟠 Growing
Future Proof	🟢 Stable and reliable	🟠 Moderate (npm/Node issues)	🟠 Moderate (npm/Node issues)
SEO Optimization	🟢 Excellent (purely static)	🟠 Requires extra effort (SSR or hydration)	🟢 Excellent (static-first with flexibility)

So that feature matrix is maybe helpful, but my actual experiences went like this:

Hugo ¶

Converting templates was frustrating since error messages were hard to understand.
Documentation was extensive but poorly organized. I often found myself digging through pages of search results to uncover how to do simple things, like how to display an image.
Tons of mental overhead with respect to how templates are actually rendered. See their documentation on template lookup order as an example.
Unclear when to use shortcodes versus variables / other alternatives.

Tons of boilerplate to do simple things, such as including a CSS file. E.g.:

{{ $opts := dict "transpiler" "libsass" "targetPath" "css/style.css" }}
{{ with resources.Get "sass/main.scss" | toCSS $opts | minify | fingerprint }}
  <link rel="stylesheet" href="{{ .RelPermalink }}" integrity="{{ .Data.Integrity }}" crossorigin="anonymous">
{{ end }}

Vite / React Router ¶

I'm super familiar with this stack, so getting set up was quick.
Had to roll my own post management system, and there wasn't anything off the shelf that would essentially work as a CMS.
The site was fast to compile and run.
Even though I could prerender all of the pages, my prediction is that SEO would take a hit due to the sloppy HTML.
Based on the Node stack so I would expect the entire thing to require an upgrade in a few years. :(

Astro ¶

Heard tons of good things about this setup lately so I decided why the heck not.
I thought the entire frontmatter architecture, where TypeScript code lives at the top of .astro files, was very weird.
Didn't love the custom file format (".astro"). Requires an editor plugin to read them correctly and none of my auto formatting tools worked out of the box.
It was fast-ish.
Also built on the Node stack, and a newer technology, so inevitably I'd run into some issues in the future.

Recap ¶

In the end, Hugo's documentation and verbosity ultimately scared me away. Vite/React Router was nice, but it wouldn't be great SEO and I'd have to deal with Node.js. Astro was a nice experience, but I thought the entire .astro file format and the somewhat proprietary "feeling" of the stack to be a turnoff. And being JavaScript-ecosystem-based as well, I'd still have to deal with the future-proofing thing.

Maybe I'd have to settle. But there was something I realized in the process of trying all of these services out that led me down a unexpected path. First, let's review how I ported things over.

Porting ¶

Trying out a new static site generator involved several steps:

Read docs for 15m+ to learn the basics.
Setup stack.
Copy all markdown files from old Jekyll setup to new location.
Copy all templates to from old Jekyll setup to new location.
Copy static files.
Run dev server and iterate for several hours until all posts rendered somewhat correctly.
- This involved much searching and replacing, and repeated updates. Even with AI assistance, this was a slog.

So that was fun. At some point, while porting to Astro, I realized it could plug into a custom loader, and read from a database. "That's interesting", I thought. "Could I just port all of my blog posts to a database, and then make text edits simply by running a SQL query?" Well, yes. I could.

And then, if I ever needed to move to a different static-site generator in the future, I could just keep using the same database file to regenerate the files or read them from a loader. Sweet. Maybe Astro is the move!

So I wrote a little script that converted all of my markdown files to a SQLite database, and started writing some code in Astro to read those posts. But after about an hour of wrangling with the docs, I couldn't get it to work! It was just too confusing. At this point I realized this was not sustainable. If I took a month break from blogging this was not going to be something I was going to remember.

I felt lost and frustrated. What was I to do?

But then I stepped back, and I realized that my entire mental model of how a static site generator should work was off. I had been thinking about it completely wrong.

Logic != Data ¶

Blog posts are data! Static pages are data! Templates, however, are code. They contain "business logic". Business logic belongs in code.

Data belongs in a database. Why am I storing my blog posts in files, and treating them like code? They should live in a database that any static site generator can read from. That way my data is permanent and not tied to my static site generator. That's how it should be.

Weirdly, though, not many (any?) popular static site generators use a database as a backend. Pretty much everyone uses flat files. Which is kinda nice (you can see diffs!) but honestly, I don't really care about diffs. I just want my blog to work, be stable, and my data to be portable.

At this point I realized where I was headed: I was going to need to build a static site generator.

It would need to be based on technology that would be permanently future proof, with a data backend that would never become outdated.

A Wild Static Site Generator Appears ¶

A Wild Static Site Generator Appears

The fact that Hugo was built on Go gave me some hints about what direction to take this in, and I'd built much software in Go that still worked without changes for over a decade. So the programming language choice was clear: I would build it in Go.

For the data side, the choice was also quite clear since I already had a SQLite database from my Astro experiment with all of my blog posts and pages. I didn't even need to do any additional work there. There was some schema and data modification, sure, but it just involved a few SQL queries and I was set.

Some Notes on SQLite ¶

First of all, SQLite is just awesome. It's incredible software. It can literally do anything. Ok, maybe not, but it's really flexible and solid.

One thing you may not know is that SQLite is a great "filesystem". It can store files super efficiently, and there's even a tool written by the SQLite authors called sqlar which is effectively equivalent to ZIP in terms of performance and space requirements, with the added benefit that you can treat the entire archive as a database!

You might be wondering: "Dan, can you really store a SQLite database in a Git repo?" The answer is yes.

I had an intuition that Git's delta compression would store SQLite quite efficiently, and based on my own research and that of others, it turns out that I was right: Git excels at storing SQLite databases in version control. It's on par, even, with plain text.

Add in a custom diff handler, specify which filetypes should get treatment with .gitattributes, and we now have a setup that shows diffs and is as space-efficient as storing raw markdown files.

Go ¶

Go is remarkably solid. I've written programs in Go over a decade ago that run without modification to anything, even the build system.

And that's expected, as one of Go's core design principles is backwards compatibility. From the linked document:

In the quoted text from “Go 1 and the Future of Go Programs” at the top of this post, the ellipsis hid the following qualifier:
At some indefinite point, a Go 2 specification may arise, but until that time, [… all the compatibility details …].
That raises an obvious question: when should we expect the Go 2 specification that breaks old Go 1 programs?
The answer is never. Go 2, in the sense of breaking with the past and no longer compiling old programs, is never going to happen. Go 2 in the sense of being the major revision of Go 1 we started toward in 2017 has already happened.
There will not be a Go 2 that breaks Go 1 programs. Instead, we are going to double down on compatibility, which is far more valuable than any possible break with the past. In fact, we believe that prioritizing compatibility was the most important design decision we made for Go 1.
So what you will see over the next few years is plenty of new, exciting work, but done in a careful, compatible way, so that we can keep your upgrades from one toolchain to the next as boring as possible.

How fricking awesome is that? Whatever you write in Go, will continue to work, forever.

Go does have its own warts, but so does every language, and one must prioritize needs in order to make software design decisions, and in my case, reliability and future proofing trumped almost everything else.

Go is also very simple. Simple is good!

Doing The Thing ¶

So yeah, I built a static site generator using Go, using SQLite as the database, stored in version control, and it's now running this very blog, right now. I wrote a GitHub action that generates the pages (takes <20s to build and deploy the entire site!) and it's now hosted on GitHub Pages.

GitHub Actions

I even wrote a little editor that runs when I run the site locally using go run, so I can edit posts in the browser instead of using a database management tool (I got tired of that quite quickly!).

I'm super happy with how everything turned out. Dependency management is a breeze, and the entire thing is blazing fast. It preprocesses SCSS and code blocks, and minifies images, CSS, HTML, and JavaScript.

There are obviously a lot of rough edges given that this is pre-alpha software, but those will be smoothed out soon enough as I port a few other websites over to the new system.

Some things I want to do next:

Allow an entire website to be bundled as a single executable file, with all static content included, so it can run as a full web server. Is there something like Cosmopolitan, but for Go?
Make it go installable so it can just work as a binary.
Write a standard spec for the database schema that will handle most static sites.
Turn it into a library so I can simply import it into another Go file to extend behavior.
Improve the editing experience, using something like Editor.js.
Open source it. 🎉

All in all, this was a fun learning experience and I think I built something pretty cool. I'm really excited to migrate my other websites and see my work pay off (I'm really hoping this will be the last time! 😅🤞🏻).