From a08b48cc51cf94c8975e1467f6304dac79ea8c36 Mon Sep 17 00:00:00 2001 From: Daniel Fichtinger Date: Fri, 4 Jul 2025 14:38:25 -0400 Subject: [PATCH] wrote kakoune lexer blog post --- ...mplementing-kakoune-syntax-highlighting.md | 104 ++++++++++++++++++ content/blog/pygments-kakoune-lexer.md | 7 -- 2 files changed, 104 insertions(+), 7 deletions(-) create mode 100644 content/blog/implementing-kakoune-syntax-highlighting.md delete mode 100644 content/blog/pygments-kakoune-lexer.md diff --git a/content/blog/implementing-kakoune-syntax-highlighting.md b/content/blog/implementing-kakoune-syntax-highlighting.md new file mode 100644 index 0000000..656869b --- /dev/null +++ b/content/blog/implementing-kakoune-syntax-highlighting.md @@ -0,0 +1,104 @@ +--- +title: Implementing Kakoune Syntax Highlighting In Pygments +date: July 04, 2025 +--- + +As a programmer, one thing I care about a _lot_ is syntax highlighting. In fact, +the main reason I created [Ashen] was to have more control over it. In my view, +if you're going to spend all day looking at text, _it should at least look +pleasant_. This naturally carries over to blogging as well. + +Over the past few months, I've become obsessed with Kakoune. I've been +customizing it extensively, writing plugins, contributing to the wiki, and +participating in its small (but _incredibly_ active and welcoming) community. +And, well, when I get _this_ into something, I want to write about it! + +However, here's the problem: Kakoune doesn't have many users. It has around 10k +stars on GitHub; while [Helix], a project that was directly inspired by it, has +over 38 thousand. I don't mind that Kakoune is "unpopular". I enjoy the smaller, +tighter-knit community — but I'd be lying if I said it wasn't inconvenient at +times. + +One such time is _getting Kakoune syntax highlighting on my blog_. Most SSG +setups (including [Zona], my home-brewed project) rely on external libraries to +provide code highlighting. For example, this website uses [Pygments], which is a +mature Python library. Now, Pygments boasts support for "a wide range of 597 +languages and other text formats". + +**Kakoune is _not_ among them**. Meaning that, if I wanted Kakoune highlighting, +I'd have to do it myself. Now, perhaps unsurprisingly, Kakoune provides +highlighting for its own syntax. Helpfully, this highlighting is _itself_ +implemented as Kakoune commands (sometimes referred to as Kakscript). Why is +this helpful? Because Kakoune highlighters are defined in regular expressions; +saving us some mental work if we want to port highlighting to another platform. + +There's a caveat, however: the regex engine must be capable of recursion. This +is thanks to the weirdness that is Kakoune's shell blocks, and how they interact +with balanced delimiters. + +Without getting too detailed, Kakoune's balanced strings are... +[complicated](https://github.com/mawww/kakoune/blob/master/doc/pages/command-parsing.asciidoc). +This wouldn't normally be a problem, because strings that aren't wrapped in +double/single quotes aren't highlighted anyways. However, that's not true for +shell blocks: the contents of `%sh{...}` should be highlighted as POSIX shell +script. + +The problem? The `%sh` delimiter can be _anything_. Literally. Kakoune's +standard RC **itself** uses `%§` as a delimiter. This means that the following +two snippets are parsed the exact same: + +```kak +evaluate-commands %sh{ + printf '%s\n' "%sh{ echo 'hi' }" +} +``` + +```kak +evaluate-commands %sh∴ + printf '%s\n' "%sh{ echo 'hi' }" +∴ +``` + +All of this makes implementing a true Kakoune lexer for a library like Pygments, +which doesn't natively support recursive regex, a non-trivial task. To be +honest, I barely understand how it's done in Kakoune in the first place. + +Luckily, a friend of mine pointed out something very interesting the other day +when he sent me a Kakoune snippet over Discord _with highlighting._ It didn't +look great, but it was actually highlighted! _In **Discord**!_ + +As it turns out, all he did was denote the code block as `sh` instead of `kak` — +Kakoune's _actual_ syntax (the parts outside balanced `%sh` strings) is +_visually_ very similar to POSIX `sh`. After this realization, implementing a +Kakoune Lexer was a much more straightforward task: all I had to do was extend +the existing Bash Lexer and add some keywords! + +Of course, the result isn't _perfect_. The lexer can't tell the difference +between inside and outside `%sh` strings; shell keywords are highlighted at the +root level of the code, and Kakoune keywords are highlighted inside shell +blocks. The _correct_ way would be properly detecting balanced `%sh` strings, +and delegating their contents to the Bash Lexer. The following snippet (at the +time of writing) is **not** highlighted correctly: + +```kak +set buffer filetype kak +evaluate-commands %sh{ + echo define-command is-kak %< info -title is-kak 'Not Kak!' > +} +``` + +Properly detecting these strings isn't currently possible with Pygments' +`RegexLexer`. I'd need to subclass the base lexer and implement my own token +scanning. Is it possible? Absolutely. Do I want to do it? **Absolutely not**. + +For now, please enjoy the janky, _but functional_ Kakoune syntax highlighting I +created. The plugin is also available as the `pygments-kakoune` package on +[sr.ht](https://git.sr.ht/~ficd/pygments-kakoune) and +[PyPI](https://pypi.org/project/pygments-kakoune/) if you want to use it in your +own projects. + +[Zona]: https://git.sr.ht/~ficd/zona +[Ashen]: https://sr.ht/~ficd/ashen +[Kakoune]: https://kakoune.org +[Helix]: https://github.com/helix-editor/helix +[Pygments]: https://pygments.org/ diff --git a/content/blog/pygments-kakoune-lexer.md b/content/blog/pygments-kakoune-lexer.md deleted file mode 100644 index a274d19..0000000 --- a/content/blog/pygments-kakoune-lexer.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -title: Implementing Kakoune Syntax In Pygments -date: July 04, 2025 -draft: true ---- - -Some content goes here.