update email formatting post draft

This commit is contained in:
Daniel Fichtinger 2025-07-14 17:57:47 -04:00
parent ce89015228
commit b82ab92586

View file

@ -1,10 +1,12 @@
---
title: Email Formatting Is Harder Than It Looks
date: 2025-07-13
date: 2025-07-14
draft: true
---
*[UTF-8]: Unicode Transformation Format - 8 bit.
*[UTF-8]: Unicode Transformation Format 8 bit. Text encoding standard.
*[plaintext]: Content representing only readable characters, and whitespace characters that affect the arrangement of the text.
[Kakoune]: https://kakoune.org
@ -12,10 +14,12 @@ draft: true
[TOC]
## Plain text email
As I've [mentioned before](./email-in-kakoune.md), I like using [Kakoune] for
reading & writing emails. Of course, Kakoune is a text editor, not a _rich text_
editor. It operates on UTF-8 _plain text_ --- which means that the emails I
write need to be in plain text, too.
reading & writing emails. Of course, Kakoune a source code editor, not a _rich
text_ editor. It operates on UTF-8 _plaintext_ --- which means that the emails
I write need to be in plain text, too.
As it turns out, plain-text email (which predates HTML by decades[^html]) hasn't
really left a "legacy" so much as it _hasn't actually gone anywhere_. Many
@ -26,72 +30,151 @@ developers swear by it; some are even so committed as to automatically filter
[mailing list etiquette](https://man.sr.ht/lists.sr.ht/etiquette.md) guide.
As I went down `text/plain` path, I quickly learned that I needed an **email
formatter**. Plain text is like source code. You can't rely on the recipient's
mail client to render it in a certain way --- most often, what you see is
_exactly_ what they get.
formatter**. Why? Plain text is like source code. You can't rely on the
recipient's mail client to render it in a certain way --- you have to assume
that what you see is _exactly_ what _they_ get.
I eventually wrote [`mailfmt`](https://git.ficd.sh/ficd/mailfmt) to fill this
niche. It provides consistent paragraph spacing, hard-wrapping and paragraph
reflow, while preserving Markdown syntax, email headers, quotes, sign-offs, and
signature blocks. Additionally, the wrapped output can be made safe for passing
to a Markdown parser. This is useful if you want to build an HTML email from
plain-text.
On one hand, this isn't really a problem --- the whole point of plain text is
_not_ having to bother with formatting, right? There is, however, a crucial
catch: **line wrapping**.
## The wrapping problem
Since we (humanity) have been _writing_ text, we've been _wrapping_ it. Pages,
after all, have finite width. At some point, an ongoing sentence needs to
continue on the line below it. This is called _wrapping_. In digital text, there
are two kinds of wrapping: **soft** and **hard**. The former is much more
common, and we often take it for granted.
**Hard-wrapped text** is the simplest: the line breaks are directly part of the
source. If you're writing a sentence that's getting too long, you simply press
`<ret>` to begin a new line. The author is responsible for all line breaks. This
guarantees that, (assuming the renderer doesn't reflow text), the output will
always look _exactly_ how it does in the editor.
**Soft-wrapped text** has line breaks inserted by the _renderer_ --- they're
_not_ present in the source file. It's incredibly convenient! As the writer, we
don't need to worry at all about line breaks; only paragraph breaks. We can
trust that the text _will_ be wrapped properly whenever it's viewed.
Now... remember how I just said that, in the context of plain text email, we
can't make _any_ assumptions about how the text will be rendered? This applies
to wrapping, too. _Some_ mail clients may wrap text, **but not all of them**.
This essentially consigns us to hard-wrapping our emails.
The problem? _It's inconvenient!_ Imagine you edit a paragraph, and remove a
sentence. Well, now that entire paragraph's spacing is messed up, and you need
to manually reflow it and fix the line breaks. Yuck!
## The Markdown complication
### Standard tools
At this point, some of you may be screaming: _"but what about `fmt` and
`fold`?"_ There exist utilities meant to solve this specific problem, included
in most Linux distributions out-of-the-box! Well, you would be right. _Sort of_.
It's true that we already have excellent, composable commands for wrapping and
paragraph formatting. A simple `#!fish cat email.txt | fmt >email.txt` is enough
to cover many cases. However, there's a problem: **these tools are markup
agnostic**.
Why is that a problem when I literally [just](#plain-text-email) said we don't
care about markup? Well, there are _some_ markup formats that are delightfully
readable even in plain--text. Consider the following _unordered list_ in HTML
(Hyper Text **Markup** Format):
```html
<ul>
<li>Foobar</li>
<li>Barfoo</li>
</ul>
```
See, machines can read this no problem... but people? We struggle. Now, consider
the exact same expressed in [Markdown](https://en.wikipedia.org/wiki/Markdown):
```markdown
- Foobar
- Barfoo
```
Isn't that so much nicer? As it turns out, markup isn't only meant to make
writing HTML easier --- it's also a great way to enhance the _semantics_ of
plain text.
**This** is where we run up against issues with `fmt` & company: because they're
not _aware_ of Markdown syntax, they have a tendency to **break** it. Consider
the unordered list example from before:
```console
$ cat list.md | fmt
- Foobar - Barfoo
```
The tool has _no idea_ this is meant to be a list. It just treats whitespace
separated tokens as words and reflows paragraphs accordingly.
### Markdown formatters
My immediate next thought was to try an actual Markdown formatter. Not only do
they _also_ handle wrapping & reflow, they won't break the markup. I gave it a
shot, and to my horror, I found that they have the _opposite_ problem: they
preserve markup, but they break [signature blocks](#signature-blocks),
[sign-offs](#sign-offs), and [headers](#headers)!
## Writing `mailfmt`
I eventually wrote [`mailfmt`](https://git.ficd.sh/ficd/mailfmt) to fill the
niche of email formatting. It provides consistent paragraph spacing,
hard-wrapping and paragraph reflow, while preserving Markdown syntax, email
headers, quotes, sign-offs, and signature blocks. Additionally, the wrapped
output can be made safe for passing to a Markdown parser. This is useful if you
want to build an HTML email from plain-text.
`mailfmt` open-source under the ISC license, and is available on
[PyPI](https://pypi.org/project/mailfmt/) for installation with tools like
`pipx` and `uv`. The source code is available on sourcehut at
[git.ficd.sh/ficd/mailfmt](https://git.ficd.sh/ficd/mailfmt).
## Target Audience
I wrote this tool primarily for myself. It's served me very well over the past
few months. `mailfmt` could be helpful for anyone that prefers writing email in
plain-text using text editors like Kakoune, Helix, and Vim. It can format via
`stdin`/`stdout` and read/write files, making `mailfmt` easy to configure as a
formatter for the `mail` filetype in your editor.
I'm including a very lengthy explanation of exactly why I built this tool. You
may think it's overkill for such a small program — but I like to be crystal
clear about justifying my work. It reads like blog post rather than the
emoji-filled `README`/marketing style we're accustomed to seeing on this
platform. I've put a lot of thought into this, and I want to share my work. I
hope you enjoy reading about my thought process.
## Why I Built It (Comparison)
Unsurprisingly, it all started with a specific problem I was having composing
emails in plain-text format in my preferred text editor. As I searched for a
solution, I couldn't find anything that met all my needs, so I wrote it myself.
Here's what I wanted:
### My requirements
- A way to consistently format my outgoing emails in my text editor.
- Paragraph reflow and automatic line wrapping.
- Not all plain-text clients are capable of line-wrap. In some contexts, such
as mailing lists, the author is expected to wrap the text themselves.
- Inline Markdown syntax `can _still_ look great, **even** in plain-text!` Thus,
I wanted to use it:
- Ability to use Markdown syntax:
- Without it being broken by reflow & wrap.
- While looking good and retaining the same semantics in _both_ rendered
**and** plain-text form — ideal for `multipart` emails.
- Ensure signature block is formatted properly.
- The single space after `--` and before the newline **must** be included.
- _Ensure_ proper formatting of [signature blocks](#signature-blocks).
- _Preserve_ formatting of [sign-offs](#sign-offs).
### `fmt` and Markdown Formatters Don't Work For Email
### Wrap & reflow
The `fmt` utility provides great wrapping and reflow capabilities — I use it all
the time while writing LaTeX. However, it's syntax agnostic, and breaks
Markdown. For example, it completely mangles fenced code blocks. I figured: hey,
why not just use a Markdown formatter? It supports Markdown (obviously), _and_
can reflow & wrap text! Here's the problem: it turns out treating your
**entire** email as a Markdown document isn't ideal.
It turns out that the most important part was also the easiest to implement.
Python's standard library includes
[`textwrap`](https://docs.python.org/3/library/textwrap.html), which _literally_
just does it for you. So the _real_ challenge becomes figuring out _what to
wrap_, versus **what to ignore**.
### Preserving Markdown
Getting my tool to preseve Markdown was fairly straightforward. I'm not building
a _Markdown formatter_, I'm building _a formatter that doesn't break Markdown_.
In other words, I don't need to _parse_ Markdown syntax; just recognize it,
**and ignore it**.
`mailfmt`'s approach is simple: detect when a line matches a known pattern of
Markdown block element syntax, such as leading `#` for headings, `-` for lists,
etc. If so, **leave the line untouched**. Similarly, **don't format anything
inside fenced code blocks**.
#### Sign-Offs
### Sign-offs
Consider the following sign-off:
@ -118,7 +201,7 @@ Best wishes,
Daniel
```
> However, this empty line looks _awkward_ when viewed in plain-text.
> However, this empty line looks a tad awkward when viewed in plain--text.
2. Put a backslash after the intentional line break:
@ -129,7 +212,7 @@ Daniel
> Again, this looks bad when the Markdown isn't rendered.
3. Put two spaces after the intentional line break ( = space):
3. Put two spaces after the intentional line break (`•` = space):
```
Best•wishes,••
@ -146,17 +229,20 @@ uppercase letter**, then we assume these two lines are a _sign-off_, and we
don't reflow or wrap them. The heuristic matches a very simple pattern:
```
A courteous greeting,
A courteous salutation,
First Middle Last Name
```
#### Signature Block
### Signature blocks
The convention for signature blocks is as follows:
The [standard](https://en.wikipedia.org/wiki/Signature_block#Standard_delimiter)
for signature blocks is as follows:
1. Begins with two `-` characters followed by a single space, then a newline.
2. Everything that follows until the EOF is part of the signature.
*[EOF]: End of file.
Here's an example (note the • = space):
```
@ -167,31 +253,44 @@ Software•Developer,•Company
email@website.com
```
As with sign-offs, such a signature block gets mangled by Markdown formatters.
As with sign-offs, such a signature block gets mangled by other formatters.
Furthermore, the single space after the `--` token is important: if it's
missing, some clients won't recognize it is a valid signature — our formatter
should address this too.
missing, some clients won't recognize it is a valid signature.
`mailfmt` detects when a line's _only_ content is `--`. It adds the required
trailing space if it's missing, and it treats the rest of the input as part of
trailing space if it's missing, and it treats the rest of the file as part of
the signature, leaving it completely untouched.
### Consistent Multipart Emails
## Headers
Something you may want to do is generate a `multipart` email. This means that
_both_ an HTML **and** plain-text representation of the _same_ email are
Raw emails contain many
[headers](https://en.wikipedia.org/wiki/Email#Message_header). Even if you're
reading/writing in plain--text, it's likely that your client strips these.
However, in some cases, you may want to insert a header or two manually.
Luckily, headers are easily matched by
[regex](https://en.wikipedia.org/wiki/Regular_expression), so `mailfmt` can
ignore them without any issues.
## Consistent multipart emails
Something you may want to do is generate a `text/multipart` email. This means
that _both_ an HTML **and** plain-text representation of the _same_ email are
included in the file — leaving it up to the reader's client to pick which one to
display.
The plain-text email **must** be able to stand on its own, and _also_ render to
decent-looking HTML. Essentially, you want to write your email in plain-text
once, ensuring it has proper formatting, and then use a command to generate an
HTML email from it. For this, `mailfmt` provides the `--markdown-safe` flag,
which appends backslashes to the formatted output, making it safe for Markdown
parsing without messing up the line breaks after sign-offs and signature blocks.
The plain-text email **must** be able to stand on its own, and should _also_
render to decent-looking HTML. Essentially, you want to write your email in
plain-text once, ensuring it has proper formatting, and then use a command to
generate an HTML email from it.
For example, I use the following in [aerc](https://aerc-mail.org/) to generate
an HTML multipart email whenever I want:
For this, `mailfmt` provides the `--markdown-safe` flag, which appends
backslashes to the formatted output, making it safe for Markdown parsing without
messing up the line breaks after sign-offs and signature blocks.
Note that the **only** thing this does is output Markdown with hard line breaks.
It's the user's responsibility to write the pipeline for generating the email
file. For example, I use the following in [aerc](https://aerc-mail.org/) to
generate an HTML multipart email whenever I want:
```ini
[multipart-converters]
@ -201,5 +300,5 @@ text/html=mailfmt --markdown-safe | pandoc -f markdown -t html --standalone
## Conclusion
If you've made it this far, thanks for sticking with me and reading to the end!
Even if you don't plan to write plain-text email or use `mailfmt` at all, I hope
you learned something interesting.
Even if you don't plan to write plain--text email or use `mailfmt` at all, I
hope you learned something interesting.