ficd.sh/content/blog/implementing-kakoune-syntax-highlighting.md

4.9 KiB

title date
Implementing Kakoune Syntax Highlighting In Pygments July 04, 2025

As a programmer, one thing I care about a lot is syntax highlighting. In fact, the main reason I created Ashen was to have more control over it. In my view, if you're going to spend all day looking at text, it should at least look pleasant. This naturally carries over to blogging as well.

Over the past few months, I've become obsessed with Kakoune. I've been customizing it extensively, writing plugins, contributing to the wiki, and participating in its small (but incredibly active and welcoming) community. And, well, when I get this into something, I want to write about it!

However, here's the problem: Kakoune doesn't have many users. It has around 10k stars on GitHub; while Helix, a project that was directly inspired by it, has over 38 thousand. I don't mind that Kakoune is "unpopular". I enjoy the smaller, tighter-knit community — but I'd be lying if I said it wasn't inconvenient at times.

One such time is getting Kakoune syntax highlighting on my blog. Most SSG setups (including Zona, my home-brewed project) rely on external libraries to provide code highlighting. For example, this website uses Pygments, which is a mature Python library. Now, Pygments boasts support for "a wide range of 597 languages and other text formats".

Kakoune is not among them. Meaning that, if I wanted Kakoune highlighting, I'd have to do it myself. Now, perhaps unsurprisingly, Kakoune provides highlighting for its own syntax. Helpfully, this highlighting is itself implemented as Kakoune commands (sometimes referred to as Kakscript). Why is this helpful? Because Kakoune highlighters are defined in regular expressions; saving us some mental work if we want to port highlighting to another platform.

There's a caveat, however: the regex engine must be capable of recursion. This is thanks to the weirdness that is Kakoune's shell blocks, and how they interact with balanced delimiters.

Without getting too detailed, Kakoune's balanced strings are... complicated. This wouldn't normally be a problem, because strings that aren't wrapped in double/single quotes aren't highlighted anyways. However, that's not true for shell blocks: the contents of %sh{...} should be highlighted as POSIX shell script.

The problem? The %sh delimiter can be anything. Literally. Kakoune's standard RC itself uses as a delimiter. This means that the following two snippets are parsed the exact same:

evaluate-commands %sh{
  printf '%s\n' "%sh{ echo 'hi' }"
}
evaluate-commands %sh∴
  printf '%s\n' "%sh{ echo 'hi' }"
∴

All of this makes implementing a true Kakoune lexer for a library like Pygments, which doesn't natively support recursive regex, a non-trivial task. To be honest, I barely understand how it's done in Kakoune in the first place.

Luckily, a friend of mine pointed out something very interesting the other day when he sent me a Kakoune snippet over Discord with highlighting. It didn't look great, but it was actually highlighted! In Discord!

As it turns out, all he did was denote the code block as sh instead of kak — Kakoune's actual syntax (the parts outside balanced %sh strings) is visually very similar to POSIX sh. After this realization, implementing a Kakoune Lexer was a much more straightforward task: all I had to do was extend the existing Bash Lexer and add some keywords!

Of course, the result isn't perfect. The lexer can't tell the difference between inside and outside %sh strings; shell keywords are highlighted at the root level of the code, and Kakoune keywords are highlighted inside shell blocks. The correct way would be properly detecting balanced %sh strings, and delegating their contents to the Bash Lexer. The following snippet (at the time of writing) is not highlighted correctly:

set buffer filetype kak
evaluate-commands %sh{
  echo define-command is-kak %< info -title is-kak 'Is Kak!' >
}

By contrast, here's how the %sh string should look:

echo define-command is-kak %< info -title is-kak 'Is Kak!' >

Properly detecting these strings isn't currently possible with Pygments' RegexLexer. I'd need to subclass the base lexer and implement my own token scanning. Is it possible? Absolutely. Do I want to do it? Absolutely not.

For now, please enjoy the janky, but functional Kakoune syntax highlighting I created. The plugin is also available as the pygments-kakoune package on sr.ht and PyPI if you want to use it in your own projects.