4.9 KiB
title | date |
---|---|
Implementing Kakoune Syntax Highlighting In Pygments | July 04, 2025 |
As a programmer, one thing I care about a lot is syntax highlighting. In fact, the main reason I created Ashen was to have more control over it. In my view, if you're going to spend all day looking at text, it should at least look pleasant. This naturally carries over to blogging as well.
Over the past few months, I've become obsessed with Kakoune. I've been customizing it extensively, writing plugins, contributing to the wiki, and participating in its small (but incredibly active and welcoming) community. And, well, when I get this into something, I want to write about it!
However, here's the problem: Kakoune doesn't have many users. It has around 10k stars on GitHub; while Helix, a project that was directly inspired by it, has over 38 thousand. I don't mind that Kakoune is "unpopular". I enjoy the smaller, tighter-knit community — but I'd be lying if I said it wasn't inconvenient at times.
One such time is getting Kakoune syntax highlighting on my blog. Most SSG setups (including Zona, my home-brewed project) rely on external libraries to provide code highlighting. For example, this website uses Pygments, which is a mature Python library. Now, Pygments boasts support for "a wide range of 597 languages and other text formats".
Kakoune is not among them. Meaning that, if I wanted Kakoune highlighting, I'd have to do it myself. Now, perhaps unsurprisingly, Kakoune provides highlighting for its own syntax. Helpfully, this highlighting is itself implemented as Kakoune commands (sometimes referred to as Kakscript). Why is this helpful? Because Kakoune highlighters are defined in regular expressions; saving us some mental work if we want to port highlighting to another platform.
There's a caveat, however: the regex engine must be capable of recursion. This is thanks to the weirdness that is Kakoune's shell blocks, and how they interact with balanced delimiters.
Without getting too detailed, Kakoune's balanced strings are...
complicated.
This wouldn't normally be a problem, because strings that aren't wrapped in
double/single quotes aren't highlighted anyways. However, that's not true for
shell blocks: the contents of %sh{...}
should be highlighted as POSIX shell
script.
The problem? The %sh
delimiter can be anything. Literally. Kakoune's
standard RC itself uses %§
as a delimiter. This means that the following
two snippets are parsed the exact same:
evaluate-commands %sh{
printf '%s\n' "%sh{ echo 'hi' }"
}
evaluate-commands %sh∴
printf '%s\n' "%sh{ echo 'hi' }"
∴
All of this makes implementing a true Kakoune lexer for a library like Pygments, which doesn't natively support recursive regex, a non-trivial task. To be honest, I barely understand how it's done in Kakoune in the first place.
Luckily, a friend of mine pointed out something very interesting the other day when he sent me a Kakoune snippet over Discord with highlighting. It didn't look great, but it was actually highlighted! In Discord!
As it turns out, all he did was denote the code block as sh
instead of kak
—
Kakoune's actual syntax (the parts outside balanced %sh
strings) is
visually very similar to POSIX sh
. After this realization, implementing a
Kakoune Lexer was a much more straightforward task: all I had to do was extend
the existing Bash Lexer and add some keywords!
Of course, the result isn't perfect. The lexer can't tell the difference
between inside and outside %sh
strings; shell keywords are highlighted at the
root level of the code, and Kakoune keywords are highlighted inside shell
blocks. The correct way would be properly detecting balanced %sh
strings,
and delegating their contents to the Bash Lexer. The following snippet (at the
time of writing) is not highlighted correctly:
set buffer filetype kak
evaluate-commands %sh{
echo define-command is-kak %< info -title is-kak 'Not Kak!' >
}
By contrast, here's how the %sh
string should look:
echo define-command is-kak %< info -title is-kak 'Not Kak!' >
Properly detecting these strings isn't currently possible with Pygments'
RegexLexer
. I'd need to subclass the base lexer and implement my own token
scanning. Is it possible? Absolutely. Do I want to do it? Absolutely not.
For now, please enjoy the janky, but functional Kakoune syntax highlighting I
created. The plugin is also available as the pygments-kakoune
package on
sr.ht and
PyPI if you want to use it in your
own projects.