4 + 1 wrapping styles for Markdown prose and code comments

Published on in Clean code, Documentation and Markdown

A paragraph per line, a sentence per line, or hard wrapping at e.g. 80 characters? These days I prefer a fourth option: semantic line breaks.

Table of contents

Context

For a long time I used to hard wrap my Markdown prose and code comments at 80 characters. It looked quite neat.

When a certain colleague touched the same files, he always changed the wrapping style to a sentence per line. Not necessarily for the whole documents, but for the parts that he had modified, e.g. a section in a Markdown file.

I didn't like it at first. My beautiful hard wrapped prose! I was too shy to complain about it, and it wasn't actually that big of a deal.

Eventually I started to like the new style. Now I think it was a step in the right direction.

Benefits of a consistent wrapping style

Why does it matter how you wrap Markdown prose and code comments? While pondering over something like this may seem superficial, a good wrapping style does provide benefits:

  • Readability.

    Code is read more often than it's written, so readability is important. Readability also helps with writing and editing longer pieces of text.

  • Maintainability.

    For example: if you hard wrap at 80 characters, anyone editing the text has to manually update and tweak the wrapping, which can be annoying. Or you can use a tool, but finding a suitable tool and installing and configuring it is one more thing to worry about.

  • Diffability (version controlling).

    For example: if you have a long paragraph on a single line, changing one word highlights the whole paragraph in a Git diff. If you instead have a single sentence or even half a sentence on a single line, changing one word highlights just that single sentence / half a sentence.

Not limited to Markdown

So far I have talked about Markdown (and code comments), but these wrapping styles can be applied to many other lightweight markup languages as well. Here's a non-exhaustive list of other compatible lightweight markup languages:

  • AsciiDoc, "a text document format for writing notes, documentation, articles, books, ebooks, slideshows, web pages, man pages and blogs."
  • Org mode, "a GNU Emacs major mode for convenient plain text markup – and much more."
  • reStructuredText, "an easy-to-read, what-you-see-is-what-you-get plaintext markup syntax and parser system."
  • Wikitext, "consists of the syntax and keywords used by the MediaWiki software to format a page." The best known MediaWiki site (or "software") is Wikipedia.

The above list is adapted from a similar list on the website of Semantic Line Breaks specification. The descriptions are copied from the linked pages.

The 4 + 1 wrapping styles

Let's use the following examples for comparing the different wrapping styles:

  1. The following paragraph from my blog post Switch statements: default doesn't have to be the last case:

    After some thought, it's actually logical to present the game states (switch cases) in the order they have been presented: "waiting to start," "playing" and "dead." That's the natural flow of the game states. The initial game state is "waiting to start," so the most logical position for the default case is at the beginning.

  2. The following code comment from my AutoHotkey script special_characters.ahk:

    No need to define separate hotstrings for uppercase letters because hotstrings are "case-conforming" by default. Applies to the ;aringå hotstring as well.

The Markdown paragraph is rendered in the same way with all wrapping styles. The difference is in how the text looks like in the source code.

(Code comments are often read only in the source code, so the above doesn't apply as well.)

A paragraph per line

Readability

Possibly annoying to read because lines are so long. Depends on the window width of the IDE and whether long lines are hard wrapped automatically.

Maintainability

Paragraphs are easy to identify, so the style is easy to follow.

On the other hand, editing long lines is cumbersome as you can't e.g. rearrange the sentences of a paragraph by rearranging lines.

Diffability

Git diffs are noisy, especially for long paragraphs, because changing a single word highlights the whole paragraph. It looks like the whole paragraph has been touched, and it kind of has because it's on a single line, so you have to manually check what has actually changed.

Markdown example

The line is very long (329 characters), causing a horizontal scroll bar to appear.

After some thought, it's actually logical to present the game states (`switch` cases) in the order they have been presented: "waiting to start," "playing" and "dead." That's the natural flow of the game states. The initial game state is "waiting to start," so the most logical position for the `default` case is at the beginning.

Code comment example

Same here: a wild horizontal scroll bar appears!

; No need to define separate hotstrings for uppercase letters because hotstrings are "case-conforming" by default. Applies to the `;aring` → `å` hotstring as well.

A sentence per line

Readability

Better than in the "a paragraph per line" style. The problem is that a sentence can contain multiple points and ideas, so you have to check the whole line to see what's it about.

Long sentences might be annoying to read if they are not automatically hard wrapped by the IDE.

Maintainability

Sentences are easy to identify, so the style is easy to follow.

Sentences can be easily rearranged by rearranging lines.

Might encourage not-too-long sentences because very long lines stand out.

Diffability

Better than in the "a paragraph per line" style. Some lines might still be too long, making diffs noisy.

Markdown example

I like this better than putting the whole paragraph on the same line. The first and last lines are still too long, though.

After some thought, it's actually logical to present the game states (`switch` cases) in the order they have been presented: "waiting to start," "playing" and "dead."
That's the natural flow of the game states.
The initial game state is "waiting to start," so the most logical position for the `default` case is at the beginning.

Code comment example

Same here: this is better, but the first line is too long.

; No need to define separate hotstrings for uppercase letters because hotstrings are "case-conforming" by default.
; Applies to the `;aring` → `å` hotstring as well.

Hard wrapping at e.g. 80 characters

Hard wrapping used to be my favorite wrapping style. My rationale back then:

If you are hard wrapping your code at n characters, it's logical to hard wrap code comments at the same width. And why not hard wrap Markdown prose as well?

(If you are not hard wrapping your code, maybe you should. Configure Prettier or a similar tool and you don't have to think about it manually.)

Readability

Hard wrapped text often looks neat on the surface as the text has a uniform width. There are no horizontal scroll bars or automatic hard wrapping done by the IDE, unless there are very long words (like links) or the window is very narrow.

On the other hand, the text is often annoying to read because hard wrapping doesn't factor in sentence structures and such. It's difficult to see where sentences start and end and what points and ideas the whole paragraph contains.

Sometimes hard wrapping can break syntax highlighting, e.g. when a Markdown link spans on two lines like this:

Here's some intro text, followed by a [link with quite a long text so that it
spans on two lines](https://duck.com/).

Lastly: orphan words are annoying, i.e. single words that end up on their own lines. Can be seen in the code comment example below. Also see Widows and orphans on Wikipedia.

Maintainability

Very cumbersome to maintain manually. Change a single word in the middle of a paragraph, and you might have to adjust the wrapping of several lines.

On the other hand, can be done easily with certain tools. For example:

The downside of using tools is that you also have to get your teammates to install, configure and use them.

Diffability

Often abysmal. Change a single word in the middle of a paragraph, and you might have to adjust the wrapping of several lines. This makes diffs very noisy.

Markdown example

Looks nice on the surface, but is not very nice to read. The two words inside the parentheses unfortunaly end up on different lines.

Looking at the left side of the code block (the first words of the lines) doesn't make much sense: "After... cases... dead... waiting to start... the beginning."

After some thought, it's actually logical to present the game states (`switch`
cases) in the order they have been presented: "waiting to start," "playing" and
"dead." That's the natural flow of the game states. The initial game state is
"waiting to start," so the most logical position for the `default` case is at
the beginning.

Code comment example

The last line contains an orphan word. It looks lonely and makes the whole comment look unbalanced.

Looking at the second line only, the beginning of it doesn't make sense: "are 'case-conforming' by default." What is it talking about? You need to check the previous line to see that it's talking about hotstrings.

; No need to define separate hotstrings for uppercase letters because hotstrings
; are "case-conforming" by default. Applies to the `;aring` → `å` hotstring as
; well.

Semantic line breaks

The Semantic Line Breaks specification is introed like this:

When writing text with a compatible markup language, add a line break after each substantial unit of thought.

Semantic line breaks are my favorite! I have been using them at least since Nov 2020, i.e. for at least half a year at the time of this writing.

Readability

Lines tend to be short, which reduces the amount of horizontal eye movement, making the text easier to read.

It's also easy to jump around the text vertically looking for capital letters, i.e. where sentences begin and end.

Maintainability

A single line contains a "substantial unit of thought" like a clause or an idea. This makes editing text more powerful because it's easy to rearrange clauses and ideas by rearranging lines. Compare with e.g. hard wrapping: you can't just rearrange lines, because the text wouldn't make sense anymore.

Another benefit, copied from the specification's FAQ section:

Semantic line breaks make it easier to identify grammatical mistakes and find opportunities to simplify and clarify without altering original intent.

And yet another, copied from a comment by Rich Morin in Brandon Rhodes's blog post Semantic Linefeeds:

Another benefit of this technique is that it gives the author (or editor) a chance to examine the text in two very different formats: monospace with line breaks and proportional without. This helps greatly in finding errors.

One drawback is that it might be difficult to decide/learn what is a semantic line break, or what is a "substantial unit of thought." Thus, different people might wrap the same text differently.

If you are working in a team, it might lead to slight inconsistencies.

If you are working solo, it's not a problem. You'll quickly come up with your own style.

Diffability

Excellent. Small changes tend to produce clean diffs.

For a visual example, see Brandon Rhodes's blog post Semantic Linefeeds.

Markdown example

The text looks a bit unbalanced, but each line contains a single thing. I'm sure someone else would wrap this code differently, even if we both used semantic line breaks. :-)

After some thought,
it's actually logical
to present the game states (`switch` cases)
in the order they have been presented:
"waiting to start,"
"playing" and
"dead."
That's the natural flow of the game states.
The initial game state is "waiting to start,"
so the most logical position for the `default` case
is at the beginning.

Code comment example

Yay, the orphan word is gone.

The second line makes sense on its own: "(because) hotstrings are 'case-conforming' by default."

; No need to define separate hotstrings for uppercase letters
; because hotstrings are "case-conforming" by default.
; Applies to the `;aring` → `å` hotstring as well.

+1: Chaos

If you don't deliberately choose a specific style for wrapping code comments and prose, you'll eventually have lots of inconsistencies. Inconsistencies lead to chaos, and chaos means poor readability and maintainability.

Further resources

Hat tip to the Semantic Line Breaks specification for teaching me a better alternative to hard wrapping. My earlier blog post RFC 2119 in a nutshell might be helpful for interpreting the specification.

Brandon Rhodes's blog post Semantic Linefeeds from 2012 covers the same general idea as semantic line breaks. It has an example of the diffing benefits and musings on the history of semantic line breaks.