update email formatting post draft

This commit is contained in:
Daniel Fichtinger 2025-07-14 17:57:47 -04:00
parent ce89015228
commit b82ab92586

View file

@ -1,10 +1,12 @@
--- ---
title: Email Formatting Is Harder Than It Looks title: Email Formatting Is Harder Than It Looks
date: 2025-07-13 date: 2025-07-14
draft: true draft: true
--- ---
*[UTF-8]: Unicode Transformation Format - 8 bit. *[UTF-8]: Unicode Transformation Format 8 bit. Text encoding standard.
*[plaintext]: Content representing only readable characters, and whitespace characters that affect the arrangement of the text.
[Kakoune]: https://kakoune.org [Kakoune]: https://kakoune.org
@ -12,10 +14,12 @@ draft: true
[TOC] [TOC]
## Plain text email
As I've [mentioned before](./email-in-kakoune.md), I like using [Kakoune] for As I've [mentioned before](./email-in-kakoune.md), I like using [Kakoune] for
reading & writing emails. Of course, Kakoune is a text editor, not a _rich text_ reading & writing emails. Of course, Kakoune a source code editor, not a _rich
editor. It operates on UTF-8 _plain text_ --- which means that the emails I text_ editor. It operates on UTF-8 _plaintext_ --- which means that the emails
write need to be in plain text, too. I write need to be in plain text, too.
As it turns out, plain-text email (which predates HTML by decades[^html]) hasn't As it turns out, plain-text email (which predates HTML by decades[^html]) hasn't
really left a "legacy" so much as it _hasn't actually gone anywhere_. Many really left a "legacy" so much as it _hasn't actually gone anywhere_. Many
@ -26,72 +30,151 @@ developers swear by it; some are even so committed as to automatically filter
[mailing list etiquette](https://man.sr.ht/lists.sr.ht/etiquette.md) guide. [mailing list etiquette](https://man.sr.ht/lists.sr.ht/etiquette.md) guide.
As I went down `text/plain` path, I quickly learned that I needed an **email As I went down `text/plain` path, I quickly learned that I needed an **email
formatter**. Plain text is like source code. You can't rely on the recipient's formatter**. Why? Plain text is like source code. You can't rely on the
mail client to render it in a certain way --- most often, what you see is recipient's mail client to render it in a certain way --- you have to assume
_exactly_ what they get. that what you see is _exactly_ what _they_ get.
I eventually wrote [`mailfmt`](https://git.ficd.sh/ficd/mailfmt) to fill this On one hand, this isn't really a problem --- the whole point of plain text is
niche. It provides consistent paragraph spacing, hard-wrapping and paragraph _not_ having to bother with formatting, right? There is, however, a crucial
reflow, while preserving Markdown syntax, email headers, quotes, sign-offs, and catch: **line wrapping**.
signature blocks. Additionally, the wrapped output can be made safe for passing
to a Markdown parser. This is useful if you want to build an HTML email from ## The wrapping problem
plain-text.
Since we (humanity) have been _writing_ text, we've been _wrapping_ it. Pages,
after all, have finite width. At some point, an ongoing sentence needs to
continue on the line below it. This is called _wrapping_. In digital text, there
are two kinds of wrapping: **soft** and **hard**. The former is much more
common, and we often take it for granted.
**Hard-wrapped text** is the simplest: the line breaks are directly part of the
source. If you're writing a sentence that's getting too long, you simply press
`<ret>` to begin a new line. The author is responsible for all line breaks. This
guarantees that, (assuming the renderer doesn't reflow text), the output will
always look _exactly_ how it does in the editor.
**Soft-wrapped text** has line breaks inserted by the _renderer_ --- they're
_not_ present in the source file. It's incredibly convenient! As the writer, we
don't need to worry at all about line breaks; only paragraph breaks. We can
trust that the text _will_ be wrapped properly whenever it's viewed.
Now... remember how I just said that, in the context of plain text email, we
can't make _any_ assumptions about how the text will be rendered? This applies
to wrapping, too. _Some_ mail clients may wrap text, **but not all of them**.
This essentially consigns us to hard-wrapping our emails.
The problem? _It's inconvenient!_ Imagine you edit a paragraph, and remove a
sentence. Well, now that entire paragraph's spacing is messed up, and you need
to manually reflow it and fix the line breaks. Yuck!
## The Markdown complication
### Standard tools
At this point, some of you may be screaming: _"but what about `fmt` and
`fold`?"_ There exist utilities meant to solve this specific problem, included
in most Linux distributions out-of-the-box! Well, you would be right. _Sort of_.
It's true that we already have excellent, composable commands for wrapping and
paragraph formatting. A simple `#!fish cat email.txt | fmt >email.txt` is enough
to cover many cases. However, there's a problem: **these tools are markup
agnostic**.
Why is that a problem when I literally [just](#plain-text-email) said we don't
care about markup? Well, there are _some_ markup formats that are delightfully
readable even in plain--text. Consider the following _unordered list_ in HTML
(Hyper Text **Markup** Format):
```html
<ul>
<li>Foobar</li>
<li>Barfoo</li>
</ul>
```
See, machines can read this no problem... but people? We struggle. Now, consider
the exact same expressed in [Markdown](https://en.wikipedia.org/wiki/Markdown):
```markdown
- Foobar
- Barfoo
```
Isn't that so much nicer? As it turns out, markup isn't only meant to make
writing HTML easier --- it's also a great way to enhance the _semantics_ of
plain text.
**This** is where we run up against issues with `fmt` & company: because they're
not _aware_ of Markdown syntax, they have a tendency to **break** it. Consider
the unordered list example from before:
```console
$ cat list.md | fmt
- Foobar - Barfoo
```
The tool has _no idea_ this is meant to be a list. It just treats whitespace
separated tokens as words and reflows paragraphs accordingly.
### Markdown formatters
My immediate next thought was to try an actual Markdown formatter. Not only do
they _also_ handle wrapping & reflow, they won't break the markup. I gave it a
shot, and to my horror, I found that they have the _opposite_ problem: they
preserve markup, but they break [signature blocks](#signature-blocks),
[sign-offs](#sign-offs), and [headers](#headers)!
## Writing `mailfmt`
I eventually wrote [`mailfmt`](https://git.ficd.sh/ficd/mailfmt) to fill the
niche of email formatting. It provides consistent paragraph spacing,
hard-wrapping and paragraph reflow, while preserving Markdown syntax, email
headers, quotes, sign-offs, and signature blocks. Additionally, the wrapped
output can be made safe for passing to a Markdown parser. This is useful if you
want to build an HTML email from plain-text.
`mailfmt` open-source under the ISC license, and is available on `mailfmt` open-source under the ISC license, and is available on
[PyPI](https://pypi.org/project/mailfmt/) for installation with tools like [PyPI](https://pypi.org/project/mailfmt/) for installation with tools like
`pipx` and `uv`. The source code is available on sourcehut at `pipx` and `uv`. The source code is available on sourcehut at
[git.ficd.sh/ficd/mailfmt](https://git.ficd.sh/ficd/mailfmt). [git.ficd.sh/ficd/mailfmt](https://git.ficd.sh/ficd/mailfmt).
## Target Audience
I wrote this tool primarily for myself. It's served me very well over the past I wrote this tool primarily for myself. It's served me very well over the past
few months. `mailfmt` could be helpful for anyone that prefers writing email in few months. `mailfmt` could be helpful for anyone that prefers writing email in
plain-text using text editors like Kakoune, Helix, and Vim. It can format via plain-text using text editors like Kakoune, Helix, and Vim. It can format via
`stdin`/`stdout` and read/write files, making `mailfmt` easy to configure as a `stdin`/`stdout` and read/write files, making `mailfmt` easy to configure as a
formatter for the `mail` filetype in your editor. formatter for the `mail` filetype in your editor.
I'm including a very lengthy explanation of exactly why I built this tool. You ### My requirements
may think it's overkill for such a small program — but I like to be crystal
clear about justifying my work. It reads like blog post rather than the
emoji-filled `README`/marketing style we're accustomed to seeing on this
platform. I've put a lot of thought into this, and I want to share my work. I
hope you enjoy reading about my thought process.
## Why I Built It (Comparison)
Unsurprisingly, it all started with a specific problem I was having composing
emails in plain-text format in my preferred text editor. As I searched for a
solution, I couldn't find anything that met all my needs, so I wrote it myself.
Here's what I wanted:
- A way to consistently format my outgoing emails in my text editor. - A way to consistently format my outgoing emails in my text editor.
- Paragraph reflow and automatic line wrapping. - Paragraph reflow and automatic line wrapping.
- Not all plain-text clients are capable of line-wrap. In some contexts, such - Ability to use Markdown syntax:
as mailing lists, the author is expected to wrap the text themselves.
- Inline Markdown syntax `can _still_ look great, **even** in plain-text!` Thus,
I wanted to use it:
- Without it being broken by reflow & wrap. - Without it being broken by reflow & wrap.
- While looking good and retaining the same semantics in _both_ rendered - While looking good and retaining the same semantics in _both_ rendered
**and** plain-text form — ideal for `multipart` emails. **and** plain-text form — ideal for `multipart` emails.
- Ensure signature block is formatted properly. - _Ensure_ proper formatting of [signature blocks](#signature-blocks).
- The single space after `--` and before the newline **must** be included. - _Preserve_ formatting of [sign-offs](#sign-offs).
### `fmt` and Markdown Formatters Don't Work For Email ### Wrap & reflow
The `fmt` utility provides great wrapping and reflow capabilities — I use it all It turns out that the most important part was also the easiest to implement.
the time while writing LaTeX. However, it's syntax agnostic, and breaks Python's standard library includes
Markdown. For example, it completely mangles fenced code blocks. I figured: hey, [`textwrap`](https://docs.python.org/3/library/textwrap.html), which _literally_
why not just use a Markdown formatter? It supports Markdown (obviously), _and_ just does it for you. So the _real_ challenge becomes figuring out _what to
can reflow & wrap text! Here's the problem: it turns out treating your wrap_, versus **what to ignore**.
**entire** email as a Markdown document isn't ideal.
### Preserving Markdown
Getting my tool to preseve Markdown was fairly straightforward. I'm not building
a _Markdown formatter_, I'm building _a formatter that doesn't break Markdown_.
In other words, I don't need to _parse_ Markdown syntax; just recognize it,
**and ignore it**.
`mailfmt`'s approach is simple: detect when a line matches a known pattern of `mailfmt`'s approach is simple: detect when a line matches a known pattern of
Markdown block element syntax, such as leading `#` for headings, `-` for lists, Markdown block element syntax, such as leading `#` for headings, `-` for lists,
etc. If so, **leave the line untouched**. Similarly, **don't format anything etc. If so, **leave the line untouched**. Similarly, **don't format anything
inside fenced code blocks**. inside fenced code blocks**.
#### Sign-Offs ### Sign-offs
Consider the following sign-off: Consider the following sign-off:
@ -118,7 +201,7 @@ Best wishes,
Daniel Daniel
``` ```
> However, this empty line looks _awkward_ when viewed in plain-text. > However, this empty line looks a tad awkward when viewed in plain--text.
2. Put a backslash after the intentional line break: 2. Put a backslash after the intentional line break:
@ -129,7 +212,7 @@ Daniel
> Again, this looks bad when the Markdown isn't rendered. > Again, this looks bad when the Markdown isn't rendered.
3. Put two spaces after the intentional line break ( = space): 3. Put two spaces after the intentional line break (`•` = space):
``` ```
Best•wishes,•• Best•wishes,••
@ -146,17 +229,20 @@ uppercase letter**, then we assume these two lines are a _sign-off_, and we
don't reflow or wrap them. The heuristic matches a very simple pattern: don't reflow or wrap them. The heuristic matches a very simple pattern:
``` ```
A courteous greeting, A courteous salutation,
First Middle Last Name First Middle Last Name
``` ```
#### Signature Block ### Signature blocks
The convention for signature blocks is as follows: The [standard](https://en.wikipedia.org/wiki/Signature_block#Standard_delimiter)
for signature blocks is as follows:
1. Begins with two `-` characters followed by a single space, then a newline. 1. Begins with two `-` characters followed by a single space, then a newline.
2. Everything that follows until the EOF is part of the signature. 2. Everything that follows until the EOF is part of the signature.
*[EOF]: End of file.
Here's an example (note the • = space): Here's an example (note the • = space):
``` ```
@ -167,31 +253,44 @@ Software•Developer,•Company
email@website.com email@website.com
``` ```
As with sign-offs, such a signature block gets mangled by Markdown formatters. As with sign-offs, such a signature block gets mangled by other formatters.
Furthermore, the single space after the `--` token is important: if it's Furthermore, the single space after the `--` token is important: if it's
missing, some clients won't recognize it is a valid signature — our formatter missing, some clients won't recognize it is a valid signature.
should address this too.
`mailfmt` detects when a line's _only_ content is `--`. It adds the required `mailfmt` detects when a line's _only_ content is `--`. It adds the required
trailing space if it's missing, and it treats the rest of the input as part of trailing space if it's missing, and it treats the rest of the file as part of
the signature, leaving it completely untouched. the signature, leaving it completely untouched.
### Consistent Multipart Emails ## Headers
Something you may want to do is generate a `multipart` email. This means that Raw emails contain many
_both_ an HTML **and** plain-text representation of the _same_ email are [headers](https://en.wikipedia.org/wiki/Email#Message_header). Even if you're
reading/writing in plain--text, it's likely that your client strips these.
However, in some cases, you may want to insert a header or two manually.
Luckily, headers are easily matched by
[regex](https://en.wikipedia.org/wiki/Regular_expression), so `mailfmt` can
ignore them without any issues.
## Consistent multipart emails
Something you may want to do is generate a `text/multipart` email. This means
that _both_ an HTML **and** plain-text representation of the _same_ email are
included in the file — leaving it up to the reader's client to pick which one to included in the file — leaving it up to the reader's client to pick which one to
display. display.
The plain-text email **must** be able to stand on its own, and _also_ render to The plain-text email **must** be able to stand on its own, and should _also_
decent-looking HTML. Essentially, you want to write your email in plain-text render to decent-looking HTML. Essentially, you want to write your email in
once, ensuring it has proper formatting, and then use a command to generate an plain-text once, ensuring it has proper formatting, and then use a command to
HTML email from it. For this, `mailfmt` provides the `--markdown-safe` flag, generate an HTML email from it.
which appends backslashes to the formatted output, making it safe for Markdown
parsing without messing up the line breaks after sign-offs and signature blocks.
For example, I use the following in [aerc](https://aerc-mail.org/) to generate For this, `mailfmt` provides the `--markdown-safe` flag, which appends
an HTML multipart email whenever I want: backslashes to the formatted output, making it safe for Markdown parsing without
messing up the line breaks after sign-offs and signature blocks.
Note that the **only** thing this does is output Markdown with hard line breaks.
It's the user's responsibility to write the pipeline for generating the email
file. For example, I use the following in [aerc](https://aerc-mail.org/) to
generate an HTML multipart email whenever I want:
```ini ```ini
[multipart-converters] [multipart-converters]
@ -201,5 +300,5 @@ text/html=mailfmt --markdown-safe | pandoc -f markdown -t html --standalone
## Conclusion ## Conclusion
If you've made it this far, thanks for sticking with me and reading to the end! If you've made it this far, thanks for sticking with me and reading to the end!
Even if you don't plan to write plain-text email or use `mailfmt` at all, I hope Even if you don't plan to write plain--text email or use `mailfmt` at all, I
you learned something interesting. hope you learned something interesting.