ficd.sh/content/blog/implementing-kakoune-syntax-highlighting.md
Daniel Fichtinger 5a77fb4ee1
All checks were successful
/ deploy (push) Successful in 19s
updated zona links
2025-07-13 01:43:09 -04:00

111 lines
4.9 KiB
Markdown

---
title: Implementing Kakoune Syntax Highlighting In Pygments
date: July 04, 2025
---
As a programmer, one thing I care about a _lot_ is syntax highlighting. In fact,
the main reason I created [Ashen] was to have more control over it. In my view,
if you're going to spend all day looking at text, _it should at least look
pleasant_. This naturally carries over to blogging as well.
Over the past few months, I've become obsessed with Kakoune. I've been
customizing it extensively, writing plugins, contributing to the wiki, and
participating in its small (but _incredibly_ active and welcoming) community.
And, well, when I get _this_ into something, I want to write about it!
However, here's the problem: Kakoune doesn't have many users. It has around 10k
stars on GitHub; while [Helix], a project that was directly inspired by it, has
over 38 thousand. I don't mind that Kakoune is "unpopular". I enjoy the smaller,
tighter-knit community — but I'd be lying if I said it wasn't inconvenient at
times.
One such time is _getting Kakoune syntax highlighting on my blog_. Most SSG
setups (including [zona], my home-brewed project) rely on external libraries to
provide code highlighting. For example, this website uses [Pygments], which is a
mature Python library. Now, Pygments boasts support for "a wide range of 597
languages and other text formats".
**Kakoune is _not_ among them**. Meaning that, if I wanted Kakoune highlighting,
I'd have to do it myself. Now, perhaps unsurprisingly, Kakoune provides
highlighting for its own syntax. Helpfully, this highlighting is _itself_
implemented as Kakoune commands (sometimes referred to as Kakscript). Why is
this helpful? Because Kakoune highlighters are defined in regular expressions;
saving us some mental work if we want to port highlighting to another platform.
There's a caveat, however: the regex engine must be capable of recursion. This
is thanks to the weirdness that is Kakoune's shell blocks, and how they interact
with balanced delimiters.
Without getting too detailed, Kakoune's balanced strings are...
[complicated](https://github.com/mawww/kakoune/blob/master/doc/pages/command-parsing.asciidoc).
This wouldn't normally be a problem, because strings that aren't wrapped in
double/single quotes aren't highlighted anyways. However, that's not true for
shell blocks: the contents of `%sh{...}` should be highlighted as POSIX shell
script.
The problem? The `%sh` delimiter can be _anything_. Literally. Kakoune's
standard RC **itself** uses `%§` as a delimiter. This means that the following
two snippets are parsed the exact same:
```kak
evaluate-commands %sh{
printf '%s\n' "%sh{ echo 'hi' }"
}
```
```kak
evaluate-commands %sh∴
printf '%s\n' "%sh{ echo 'hi' }"
```
All of this makes implementing a true Kakoune lexer for a library like Pygments,
which doesn't natively support recursive regex, a non-trivial task. To be
honest, I barely understand how it's done in Kakoune in the first place.
Luckily, a friend of mine pointed out something very interesting the other day
when he sent me a Kakoune snippet over Discord _with highlighting._ It didn't
look great, but it was actually highlighted! _In **Discord**!_
As it turns out, all he did was denote the code block as `sh` instead of `kak`
Kakoune's _actual_ syntax (the parts outside balanced `%sh` strings) is
_visually_ very similar to POSIX `sh`. After this realization, implementing a
Kakoune Lexer was a much more straightforward task: all I had to do was extend
the existing Bash Lexer and add some keywords!
Of course, the result isn't _perfect_. The lexer can't tell the difference
between inside and outside `%sh` strings; shell keywords are highlighted at the
root level of the code, and Kakoune keywords are highlighted inside shell
blocks. The _correct_ way would be properly detecting balanced `%sh` strings,
and delegating their contents to the Bash Lexer. The following snippet (at the
time of writing) is **not** highlighted correctly:
```kak
set buffer filetype kak
evaluate-commands %sh{
echo define-command is-kak %< info -title is-kak 'Is Kak!' >
}
```
By contrast, here's how the `%sh` string _should_ look:
```sh
echo define-command is-kak %< info -title is-kak 'Is Kak!' >
```
Properly detecting these strings isn't currently possible with Pygments'
`RegexLexer`. I'd need to subclass the base lexer and implement my own token
scanning. Is it possible? Absolutely. Do I want to do it? **Absolutely not**.
For now, please enjoy the janky, _but functional_ Kakoune syntax highlighting I
created. The plugin is also available as the `pygments-kakoune` package on
[sr.ht](https://git.sr.ht/~ficd/pygments-kakoune) and
[PyPI](https://pypi.org/project/pygments-kakoune/) if you want to use it in your
own projects.
[zona]: https://git.ficd.sh/ficd/zona
[Ashen]: https://sr.ht/~ficd/ashen
[Kakoune]: https://kakoune.org
[Helix]: https://github.com/helix-editor/helix
[Pygments]: https://pygments.org/
*[SSG]: Static Site Generator