111 lines
4.9 KiB
Markdown
111 lines
4.9 KiB
Markdown
---
|
|
title: Implementing Kakoune Syntax Highlighting In Pygments
|
|
date: July 04, 2025
|
|
---
|
|
|
|
As a programmer, one thing I care about a _lot_ is syntax highlighting. In fact,
|
|
the main reason I created [Ashen] was to have more control over it. In my view,
|
|
if you're going to spend all day looking at text, _it should at least look
|
|
pleasant_. This naturally carries over to blogging as well.
|
|
|
|
Over the past few months, I've become obsessed with Kakoune. I've been
|
|
customizing it extensively, writing plugins, contributing to the wiki, and
|
|
participating in its small (but _incredibly_ active and welcoming) community.
|
|
And, well, when I get _this_ into something, I want to write about it!
|
|
|
|
However, here's the problem: Kakoune doesn't have many users. It has around 10k
|
|
stars on GitHub; while [Helix], a project that was directly inspired by it, has
|
|
over 38 thousand. I don't mind that Kakoune is "unpopular". I enjoy the smaller,
|
|
tighter-knit community — but I'd be lying if I said it wasn't inconvenient at
|
|
times.
|
|
|
|
One such time is _getting Kakoune syntax highlighting on my blog_. Most SSG
|
|
setups (including [zona], my home-brewed project) rely on external libraries to
|
|
provide code highlighting. For example, this website uses [Pygments], which is a
|
|
mature Python library. Now, Pygments boasts support for "a wide range of 597
|
|
languages and other text formats".
|
|
|
|
**Kakoune is _not_ among them**. Meaning that, if I wanted Kakoune highlighting,
|
|
I'd have to do it myself. Now, perhaps unsurprisingly, Kakoune provides
|
|
highlighting for its own syntax. Helpfully, this highlighting is _itself_
|
|
implemented as Kakoune commands (sometimes referred to as Kakscript). Why is
|
|
this helpful? Because Kakoune highlighters are defined in regular expressions;
|
|
saving us some mental work if we want to port highlighting to another platform.
|
|
|
|
There's a caveat, however: the regex engine must be capable of recursion. This
|
|
is thanks to the weirdness that is Kakoune's shell blocks, and how they interact
|
|
with balanced delimiters.
|
|
|
|
Without getting too detailed, Kakoune's balanced strings are...
|
|
[complicated](https://github.com/mawww/kakoune/blob/master/doc/pages/command-parsing.asciidoc).
|
|
This wouldn't normally be a problem, because strings that aren't wrapped in
|
|
double/single quotes aren't highlighted anyways. However, that's not true for
|
|
shell blocks: the contents of `%sh{...}` should be highlighted as POSIX shell
|
|
script.
|
|
|
|
The problem? The `%sh` delimiter can be _anything_. Literally. Kakoune's
|
|
standard RC **itself** uses `%§` as a delimiter. This means that the following
|
|
two snippets are parsed the exact same:
|
|
|
|
```kak
|
|
evaluate-commands %sh{
|
|
printf '%s\n' "%sh{ echo 'hi' }"
|
|
}
|
|
```
|
|
|
|
```kak
|
|
evaluate-commands %sh∴
|
|
printf '%s\n' "%sh{ echo 'hi' }"
|
|
∴
|
|
```
|
|
|
|
All of this makes implementing a true Kakoune lexer for a library like Pygments,
|
|
which doesn't natively support recursive regex, a non-trivial task. To be
|
|
honest, I barely understand how it's done in Kakoune in the first place.
|
|
|
|
Luckily, a friend of mine pointed out something very interesting the other day
|
|
when he sent me a Kakoune snippet over Discord _with highlighting._ It didn't
|
|
look great, but it was actually highlighted! _In **Discord**!_
|
|
|
|
As it turns out, all he did was denote the code block as `sh` instead of `kak` —
|
|
Kakoune's _actual_ syntax (the parts outside balanced `%sh` strings) is
|
|
_visually_ very similar to POSIX `sh`. After this realization, implementing a
|
|
Kakoune Lexer was a much more straightforward task: all I had to do was extend
|
|
the existing Bash Lexer and add some keywords!
|
|
|
|
Of course, the result isn't _perfect_. The lexer can't tell the difference
|
|
between inside and outside `%sh` strings; shell keywords are highlighted at the
|
|
root level of the code, and Kakoune keywords are highlighted inside shell
|
|
blocks. The _correct_ way would be properly detecting balanced `%sh` strings,
|
|
and delegating their contents to the Bash Lexer. The following snippet (at the
|
|
time of writing) is **not** highlighted correctly:
|
|
|
|
```kak
|
|
set buffer filetype kak
|
|
evaluate-commands %sh{
|
|
echo define-command is-kak %< info -title is-kak 'Is Kak!' >
|
|
}
|
|
```
|
|
|
|
By contrast, here's how the `%sh` string _should_ look:
|
|
|
|
```sh
|
|
echo define-command is-kak %< info -title is-kak 'Is Kak!' >
|
|
```
|
|
|
|
Properly detecting these strings isn't currently possible with Pygments'
|
|
`RegexLexer`. I'd need to subclass the base lexer and implement my own token
|
|
scanning. Is it possible? Absolutely. Do I want to do it? **Absolutely not**.
|
|
|
|
For now, please enjoy the janky, _but functional_ Kakoune syntax highlighting I
|
|
created. The plugin is also available as the `pygments-kakoune` package on
|
|
[sr.ht](https://git.sr.ht/~ficd/pygments-kakoune) and
|
|
[PyPI](https://pypi.org/project/pygments-kakoune/) if you want to use it in your
|
|
own projects.
|
|
|
|
[zona]: https://git.ficd.sh/ficd/zona
|
|
[Ashen]: https://sr.ht/~ficd/ashen
|
|
[Kakoune]: https://kakoune.org
|
|
[Helix]: https://github.com/helix-editor/helix
|
|
[Pygments]: https://pygments.org/
|
|
*[SSG]: Static Site Generator
|