--- title: Implementing Kakoune Syntax Highlighting In Pygments date: July 04, 2025 --- As a programmer, one thing I care about a _lot_ is syntax highlighting. In fact, the main reason I created [Ashen] was to have more control over it. In my view, if you're going to spend all day looking at text, _it should at least look pleasant_. This naturally carries over to blogging as well. Over the past few months, I've become obsessed with Kakoune. I've been customizing it extensively, writing plugins, contributing to the wiki, and participating in its small (but _incredibly_ active and welcoming) community. And, well, when I get _this_ into something, I want to write about it! However, here's the problem: Kakoune doesn't have many users. It has around 10k stars on GitHub; while [Helix], a project that was directly inspired by it, has over 38 thousand. I don't mind that Kakoune is "unpopular". I enjoy the smaller, tighter-knit community — but I'd be lying if I said it wasn't inconvenient at times. One such time is _getting Kakoune syntax highlighting on my blog_. Most SSG setups (including [Zona], my home-brewed project) rely on external libraries to provide code highlighting. For example, this website uses [Pygments], which is a mature Python library. Now, Pygments boasts support for "a wide range of 597 languages and other text formats". **Kakoune is _not_ among them**. Meaning that, if I wanted Kakoune highlighting, I'd have to do it myself. Now, perhaps unsurprisingly, Kakoune provides highlighting for its own syntax. Helpfully, this highlighting is _itself_ implemented as Kakoune commands (sometimes referred to as Kakscript). Why is this helpful? Because Kakoune highlighters are defined in regular expressions; saving us some mental work if we want to port highlighting to another platform. There's a caveat, however: the regex engine must be capable of recursion. This is thanks to the weirdness that is Kakoune's shell blocks, and how they interact with balanced delimiters. Without getting too detailed, Kakoune's balanced strings are... [complicated](https://github.com/mawww/kakoune/blob/master/doc/pages/command-parsing.asciidoc). This wouldn't normally be a problem, because strings that aren't wrapped in double/single quotes aren't highlighted anyways. However, that's not true for shell blocks: the contents of `%sh{...}` should be highlighted as POSIX shell script. The problem? The `%sh` delimiter can be _anything_. Literally. Kakoune's standard RC **itself** uses `%§` as a delimiter. This means that the following two snippets are parsed the exact same: ```kak evaluate-commands %sh{ printf '%s\n' "%sh{ echo 'hi' }" } ``` ```kak evaluate-commands %sh∴ printf '%s\n' "%sh{ echo 'hi' }" ∴ ``` All of this makes implementing a true Kakoune lexer for a library like Pygments, which doesn't natively support recursive regex, a non-trivial task. To be honest, I barely understand how it's done in Kakoune in the first place. Luckily, a friend of mine pointed out something very interesting the other day when he sent me a Kakoune snippet over Discord _with highlighting._ It didn't look great, but it was actually highlighted! _In **Discord**!_ As it turns out, all he did was denote the code block as `sh` instead of `kak` — Kakoune's _actual_ syntax (the parts outside balanced `%sh` strings) is _visually_ very similar to POSIX `sh`. After this realization, implementing a Kakoune Lexer was a much more straightforward task: all I had to do was extend the existing Bash Lexer and add some keywords! Of course, the result isn't _perfect_. The lexer can't tell the difference between inside and outside `%sh` strings; shell keywords are highlighted at the root level of the code, and Kakoune keywords are highlighted inside shell blocks. The _correct_ way would be properly detecting balanced `%sh` strings, and delegating their contents to the Bash Lexer. The following snippet (at the time of writing) is **not** highlighted correctly: ```kak set buffer filetype kak evaluate-commands %sh{ echo define-command is-kak %< info -title is-kak 'Not Kak!' > } ``` Properly detecting these strings isn't currently possible with Pygments' `RegexLexer`. I'd need to subclass the base lexer and implement my own token scanning. Is it possible? Absolutely. Do I want to do it? **Absolutely not**. For now, please enjoy the janky, _but functional_ Kakoune syntax highlighting I created. The plugin is also available as the `pygments-kakoune` package on [sr.ht](https://git.sr.ht/~ficd/pygments-kakoune) and [PyPI](https://pypi.org/project/pygments-kakoune/) if you want to use it in your own projects. [Zona]: https://git.sr.ht/~ficd/zona [Ashen]: https://sr.ht/~ficd/ashen [Kakoune]: https://kakoune.org [Helix]: https://github.com/helix-editor/helix [Pygments]: https://pygments.org/