ficd.sh/content/blog/email-formatting.md
Daniel Fichtinger 72f267fe89
All checks were successful
/ deploy (push) Successful in 1m3s
publish email formatting post
2025-07-14 18:16:12 -04:00

303 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: Email Formatting Is Harder Than It Looks
date: 2025-07-14
---
*[UTF-8]: Unicode Transformation Format 8 bit. Text encoding standard.
*[plaintext]: Content representing only readable characters, and whitespace characters that affect the arrangement of the text.
[Kakoune]: https://kakoune.org
[^html]: The first email was sent in 1971 --- HTML was specified in 1990.
[TOC]
## Plain text email
As I've [mentioned before](./email-in-kakoune.md), I like using [Kakoune] for
reading & writing emails. Of course, Kakoune a source code editor, not a _rich
text_ editor. It operates on UTF-8 _plaintext_ --- which means that the emails
I write need to be in plain text, too.
As it turns out, plain-text email (which predates HTML by decades[^html]) hasn't
really left a "legacy" so much as it _hasn't actually gone anywhere_. Many
developers swear by it; some are even so committed as to automatically filter
`text/html` mail as spam. If you want to learn more, you can begin by reading
[Drew DeVault's post](https://drewdevault.com/2016/04/11/Please-use-text-plain-for-emails.html),
[useplaintext.email](https://useplaintext.email/#why-plaintext), and sourcehut's
[mailing list etiquette](https://man.sr.ht/lists.sr.ht/etiquette.md) guide.
As I went down `text/plain` path, I quickly learned that I needed an **email
formatter**. Why? Plain text is like source code. You can't rely on the
recipient's mail client to render it in a certain way --- you have to assume
that what you see is _exactly_ what _they_ get.
On one hand, this isn't really a problem --- the whole point of plain text is
_not_ having to bother with formatting, right? There is, however, a crucial
catch: **line wrapping**.
## The wrapping problem
Since we (humanity) have been _writing_ text, we've been _wrapping_ it. Pages,
after all, have finite width. At some point, an ongoing sentence needs to
continue on the line below it. This is called _wrapping_. In digital text, there
are two kinds of wrapping: **soft** and **hard**. The former is much more
common, and we often take it for granted.
**Hard-wrapped text** is the simplest: the line breaks are directly part of the
source. If you're writing a sentence that's getting too long, you simply press
`<ret>` to begin a new line. The author is responsible for all line breaks. This
guarantees that, (assuming the renderer doesn't reflow text), the output will
always look _exactly_ how it does in the editor.
**Soft-wrapped text** has line breaks inserted by the _renderer_ --- they're
_not_ present in the source file. It's incredibly convenient! As the writer, we
don't need to worry at all about line breaks; only paragraph breaks. We can
trust that the text _will_ be wrapped properly whenever it's viewed.
Now... remember how I just said that, in the context of plain text email, we
can't make _any_ assumptions about how the text will be rendered? This applies
to wrapping, too. _Some_ mail clients may wrap text, **but not all of them**.
This essentially consigns us to hard-wrapping our emails.
The problem? _It's inconvenient!_ Imagine you edit a paragraph, and remove a
sentence. Well, now that entire paragraph's spacing is messed up, and you need
to manually reflow it and fix the line breaks. Yuck!
## The Markdown complication
### Standard tools
At this point, some of you may be screaming: _"but what about `fmt` and
`fold`?"_ There exist utilities meant to solve this specific problem, included
in most Linux distributions out-of-the-box! Well, you would be right. _Sort of_.
It's true that we already have excellent, composable commands for wrapping and
paragraph formatting. A simple `#!fish cat email.txt | fmt >email.txt` is enough
to cover many cases. However, there's a problem: **these tools are markup
agnostic**.
Why is that a problem when I literally [just](#plain-text-email) said we don't
care about markup? Well, there are _some_ markup formats that are delightfully
readable even in plain--text. Consider the following _unordered list_ in HTML
(Hyper Text **Markup** Format):
```html
<ul>
<li>Foobar</li>
<li>Barfoo</li>
</ul>
```
See, machines can read this no problem... but people? We struggle. Now, consider
the exact same expressed in [Markdown](https://en.wikipedia.org/wiki/Markdown):
```markdown
- Foobar
- Barfoo
```
Isn't that so much nicer? As it turns out, markup isn't only meant to make
writing HTML easier --- it's also a great way to enhance the _semantics_ of
plain text.
**This** is where we run up against issues with `fmt` & company: because they're
not _aware_ of Markdown syntax, they have a tendency to **break** it. Consider
the unordered list example from before:
```console
$ cat list.md | fmt
- Foobar - Barfoo
```
The tool has _no idea_ this is meant to be a list. It just treats whitespace
separated tokens as words and reflows paragraphs accordingly.
### Markdown formatters
My immediate next thought was to try an actual Markdown formatter. Not only do
they _also_ handle wrapping & reflow, they won't break the markup. I gave it a
shot, and to my horror, I found that they have the _opposite_ problem: they
preserve markup, but they break [signature blocks](#signature-blocks),
[sign-offs](#sign-offs), and [headers](#headers)!
## Writing `mailfmt`
I eventually wrote [`mailfmt`](https://git.ficd.sh/ficd/mailfmt) to fill the
niche of email formatting. It provides consistent paragraph spacing,
hard-wrapping and paragraph reflow, while preserving Markdown syntax, email
headers, quotes, sign-offs, and signature blocks. Additionally, the wrapped
output can be made safe for passing to a Markdown parser. This is useful if you
want to build an HTML email from plain-text.
`mailfmt` open-source under the ISC license, and is available on
[PyPI](https://pypi.org/project/mailfmt/) for installation with tools like
`pipx` and `uv`. The source code is available on sourcehut at
[git.ficd.sh/ficd/mailfmt](https://git.ficd.sh/ficd/mailfmt).
I wrote this tool primarily for myself. It's served me very well over the past
few months. `mailfmt` could be helpful for anyone that prefers writing email in
plain-text using text editors like Kakoune, Helix, and Vim. It can format via
`stdin`/`stdout` and read/write files, making `mailfmt` easy to configure as a
formatter for the `mail` filetype in your editor.
### My requirements
- A way to consistently format my outgoing emails in my text editor.
- Paragraph reflow and automatic line wrapping.
- Ability to use Markdown syntax:
- Without it being broken by reflow & wrap.
- While looking good and retaining the same semantics in _both_ rendered
**and** plain-text form — ideal for `multipart` emails.
- _Ensure_ proper formatting of [signature blocks](#signature-blocks).
- _Preserve_ formatting of [sign-offs](#sign-offs).
### Wrap & reflow
It turns out that the most important part was also the easiest to implement.
Python's standard library includes
[`textwrap`](https://docs.python.org/3/library/textwrap.html), which _literally_
just does it for you. So the _real_ challenge becomes figuring out _what to
wrap_, versus **what to ignore**.
### Preserving Markdown
Getting my tool to preseve Markdown was fairly straightforward. I'm not building
a _Markdown formatter_, I'm building _a formatter that doesn't break Markdown_.
In other words, I don't need to _parse_ Markdown syntax; just recognize it,
**and ignore it**.
`mailfmt`'s approach is simple: detect when a line matches a known pattern of
Markdown block element syntax, such as leading `#` for headings, `-` for lists,
etc. If so, **leave the line untouched**. Similarly, **don't format anything
inside fenced code blocks**.
### Sign-offs
Consider the following sign-off:
```
Best wishes,
Daniel
```
A Markdown formatter considers this to be one paragraph, and reflows it
accordingly, causing it to lost semantic meaning:
```
Best wishes, Daniel
```
Within the confines of Markdown, I counted three ways of dealing with the
problem:
1. Put an empty line between the two parts:
```
Best wishes,
Daniel
```
> However, this empty line looks a tad awkward when viewed in plain--text.
2. Put a backslash after the intentional line break:
```
Best wishes, \
Daniel
```
> Again, this looks bad when the Markdown isn't rendered.
3. Put two spaces after the intentional line break (`•` = space):
```
Best•wishes,••
Daniel
```
> This syntax is **ambiguous, easy to forget**, and **not supported by editors
> that trim trailing whitespace.**
`mailfmt` detects sign-offs using a very simple heuristic. First, we check if a
line has _5 or less_ words, and **ends with a comma**. If we find such a line,
we check the _next_ line. If it has 5 or less words **that all begin with an
uppercase letter**, then we assume these two lines are a _sign-off_, and we
don't reflow or wrap them. The heuristic matches a very simple pattern:
```
A courteous salutation,
First Middle Last Name
```
### Signature blocks
The [standard](https://en.wikipedia.org/wiki/Signature_block#Standard_delimiter)
for signature blocks is as follows:
1. Begins with two `-` characters followed by a single space, then a newline.
2. Everything that follows until the EOF is part of the signature.
*[EOF]: End of file.
Here's an example (note the • = space):
```
--•
Daniel
Software•Developer,•Company
email@website.com
```
As with sign-offs, such a signature block gets mangled by other formatters.
Furthermore, the single space after the `--` token is important: if it's
missing, some clients won't recognize it is a valid signature.
`mailfmt` detects when a line's _only_ content is `--`. It adds the required
trailing space if it's missing, and it treats the rest of the file as part of
the signature, leaving it completely untouched.
## Headers
Raw emails contain many
[headers](https://en.wikipedia.org/wiki/Email#Message_header). Even if you're
reading/writing in plain--text, it's likely that your client strips these.
However, in some cases, you may want to insert a header or two manually.
Luckily, headers are easily matched by
[regex](https://en.wikipedia.org/wiki/Regular_expression), so `mailfmt` can
ignore them without any issues.
## Consistent multipart emails
Something you may want to do is generate a `text/multipart` email. This means
that _both_ an HTML **and** plain-text representation of the _same_ email are
included in the file — leaving it up to the reader's client to pick which one to
display.
The plain-text email **must** be able to stand on its own, and should _also_
render to decent-looking HTML. Essentially, you want to write your email in
plain-text once, ensuring it has proper formatting, and then use a command to
generate an HTML email from it.
For this, `mailfmt` provides the `--markdown-safe` flag, which appends
backslashes to the formatted output, making it safe for Markdown parsing without
messing up the line breaks after sign-offs and signature blocks.
Note that the **only** thing this does is output Markdown with hard line breaks.
It's the user's responsibility to write the pipeline for generating the email
file. For example, I use the following in [aerc](https://aerc-mail.org/) to
generate an HTML multipart email whenever I want:
```ini
[multipart-converters]
text/html=mailfmt --markdown-safe | pandoc -f markdown -t html --standalone
```
## Conclusion
If you've made it this far, thanks for sticking with me and reading to the end!
Even if you don't plan to write plain--text email or use `mailfmt` at all, I
hope you learned something interesting.