ficd.sh/content/blog/email-formatting.md

12 KiB
Raw Blame History

title date description
Email Formatting Is Harder Than It Looks 2025-07-14 A detailed overview of plain-text email formatting, what makes it deceptively hard, and how I wrote mailfmt.

*[UTF-8]: Unicode Transformation Format 8 bit. Text encoding standard.

*[plaintext]: Content representing only readable characters, and whitespace characters that affect the arrangement of the text.

[TOC]

Plain text email

As I've mentioned before, I like using Kakoune for reading & writing emails. Of course, Kakoune a source code editor, not a rich text editor. It operates on UTF-8 plaintext --- which means that the emails I write need to be in plain text, too.

As it turns out, plain-text email (which predates HTML by decades1) hasn't really left a "legacy" so much as it hasn't actually gone anywhere. Many developers swear by it; some are even so committed as to automatically filter text/html mail as spam. If you want to learn more, you can begin by reading Drew DeVault's post, useplaintext.email, and sourcehut's mailing list etiquette guide.

As I went down text/plain path, I quickly learned that I needed an email formatter. Why? Plain text is like source code. You can't rely on the recipient's mail client to render it in a certain way --- you have to assume that what you see is exactly what they get.

On one hand, this isn't really a problem --- the whole point of plain text is not having to bother with formatting, right? There is, however, a crucial catch: line wrapping.

The wrapping problem

Since we (humanity) have been writing text, we've been wrapping it. Pages, after all, have finite width. At some point, an ongoing sentence needs to continue on the line below it. This is called wrapping. In digital text, there are two kinds of wrapping: soft and hard. The former is much more common, and we often take it for granted.

Hard-wrapped text is the simplest: the line breaks are directly part of the source. If you're writing a sentence that's getting too long, you simply press <ret> to begin a new line. The author is responsible for all line breaks. This guarantees that, (assuming the renderer doesn't reflow text), the output will always look exactly how it does in the editor.

Soft-wrapped text has line breaks inserted by the renderer --- they're not present in the source file. It's incredibly convenient! As the writer, we don't need to worry at all about line breaks; only paragraph breaks. We can trust that the text will be wrapped properly whenever it's viewed.

Now... remember how I just said that, in the context of plain text email, we can't make any assumptions about how the text will be rendered? This applies to wrapping, too. Some mail clients may wrap text, but not all of them. This essentially consigns us to hard-wrapping our emails.

The problem? It's inconvenient! Imagine you edit a paragraph, and remove a sentence. Well, now that entire paragraph's spacing is messed up, and you need to manually reflow it and fix the line breaks. Yuck!

The Markdown complication

Standard tools

At this point, some of you may be screaming: "but what about fmt and fold?" There exist utilities meant to solve this specific problem, included in most Linux distributions out-of-the-box! Well, you would be right. Sort of.

It's true that we already have excellent, composable commands for wrapping and paragraph formatting. A simple #!fish cat email.txt | fmt >email.txt is enough to cover many cases. However, there's a problem: these tools are markup agnostic.

Why is that a problem when I literally just said we don't care about markup? Well, there are some markup formats that are delightfully readable even in plain--text. Consider the following unordered list in HTML (Hyper Text Markup Format):

<ul>
  <li>Foobar</li>
  <li>Barfoo</li>
</ul>

See, machines can read this no problem... but people? We struggle. Now, consider the exact same expressed in Markdown:

- Foobar
- Barfoo

Isn't that so much nicer? As it turns out, markup isn't only meant to make writing HTML easier --- it's also a great way to enhance the semantics of plain text.

This is where we run up against issues with fmt & company: because they're not aware of Markdown syntax, they have a tendency to break it. Consider the unordered list example from before:

$ cat list.md | fmt
- Foobar - Barfoo

The tool has no idea this is meant to be a list. It just treats whitespace separated tokens as words and reflows paragraphs accordingly.

Markdown formatters

My immediate next thought was to try an actual Markdown formatter. Not only do they also handle wrapping & reflow, they won't break the markup. I gave it a shot, and to my horror, I found that they have the opposite problem: they preserve markup, but they break signature blocks, sign-offs, and headers!

Writing mailfmt

I eventually wrote mailfmt to fill the niche of email formatting. It provides consistent paragraph spacing, hard-wrapping and paragraph reflow, while preserving Markdown syntax, email headers, quotes, sign-offs, and signature blocks. Additionally, the wrapped output can be made safe for passing to a Markdown parser. This is useful if you want to build an HTML email from plain-text.

mailfmt open-source under the ISC license, and is available on PyPI for installation with tools like pipx and uv. The source code is available on sourcehut at git.ficd.sh/ficd/mailfmt.

I wrote this tool primarily for myself. It's served me very well over the past few months. mailfmt could be helpful for anyone that prefers writing email in plain-text using text editors like Kakoune, Helix, and Vim. It can format via stdin/stdout and read/write files, making mailfmt easy to configure as a formatter for the mail filetype in your editor.

My requirements

  • A way to consistently format my outgoing emails in my text editor.
  • Paragraph reflow and automatic line wrapping.
  • Ability to use Markdown syntax:
    • Without it being broken by reflow & wrap.
    • While looking good and retaining the same semantics in both rendered and plain-text form — ideal for multipart emails.
  • Ensure proper formatting of signature blocks.
  • Preserve formatting of sign-offs.

Wrap & reflow

It turns out that the most important part was also the easiest to implement. Python's standard library includes textwrap, which literally just does it for you. So the real challenge becomes figuring out what to wrap, versus what to ignore.

Preserving Markdown

Getting my tool to preseve Markdown was fairly straightforward. I'm not building a Markdown formatter, I'm building a formatter that doesn't break Markdown. In other words, I don't need to parse Markdown syntax; just recognize it, and ignore it.

mailfmt's approach is simple: detect when a line matches a known pattern of Markdown block element syntax, such as leading # for headings, - for lists, etc. If so, leave the line untouched. Similarly, don't format anything inside fenced code blocks.

Sign-offs

Consider the following sign-off:

Best wishes,
Daniel

A Markdown formatter considers this to be one paragraph, and reflows it accordingly, causing it to lose semantic meaning:

Best wishes, Daniel

Within the confines of Markdown, I counted three ways of dealing with the problem:

  1. Put an empty line between the two parts:
Best wishes,

Daniel

However, this empty line looks a tad awkward when viewed in plain--text.

  1. Put a backslash after the intentional line break:
Best wishes, \
Daniel

Again, this looks bad when the Markdown isn't rendered.

  1. Put two spaces after the intentional line break ( = space):
Best•wishes,••
Daniel

This syntax is ambiguous, easy to forget, and not supported by editors that trim trailing whitespace.

mailfmt detects sign-offs using a very simple heuristic. First, we check if a line has 5 or fewer words and ends with a comma. If we find such a line, we check the next line. If it has 5 or fewer words that all begin with an uppercase letter, then we assume these two lines are a sign-off, and we don't reflow or wrap them. The heuristic supports a very simple pattern:

A courteous salutation,
Prefix. First Middle Last, Suffix

For instance:

Sincerely,
Rev. John Apple Smith, PHD.

Signature blocks

The standard for signature blocks is as follows:

  1. Begins with two - characters followed by a single space, then a newline.
  2. Everything that follows until the EOF is part of the signature.

*[EOF]: End of file.

Here's an example (note the • = space):

--•
Daniel

Software•Developer,•Company
email@website.com

As with sign-offs, such a signature block gets mangled by other formatters. Furthermore, the single space after the -- token is important: if it's missing, some clients won't recognize it is a valid signature.

mailfmt detects when a line's only content is --. It adds the required trailing space if it's missing, and it treats the rest of the file as part of the signature, leaving it completely untouched.

Headers

Raw emails contain many headers. Even if you're reading/writing in plain--text, it's likely that your client strips these. However, in some cases, you may want to insert a header or two manually. Luckily, headers are easily matched by regex, so mailfmt can ignore them without any issues.

Consistent multipart emails

Something you may want to do is generate a text/multipart email. This means that both an HTML and plain-text representation of the same email are included in the file — leaving it up to the reader's client to pick which one to display.

The plain-text email must be able to stand on its own, and should also render to decent-looking HTML. Essentially, you want to write your email in plain-text once, ensuring it has proper formatting, and then use a command to generate an HTML email from it.

For this, mailfmt provides the --markdown-safe flag, which appends backslashes to the formatted output, making it safe for Markdown parsing without messing up the line breaks after sign-offs and signature blocks.

Note that the only thing this does is output Markdown with hard line breaks. It's the user's responsibility to write the pipeline for generating the email file. For example, I use the following in aerc to generate an HTML multipart email whenever I want:

[multipart-converters]
text/html=mailfmt --markdown-safe | pandoc -f markdown -t html --standalone

Conclusion

If you've made it this far, thanks for sticking with me and reading to the end! Even if you don't plan to write plain--text email or use mailfmt at all, I hope you learned something interesting.


  1. The first email was sent in 1971 --- HTML was specified in 1990. ↩︎