|
||
---|---|---|
LICENSE.txt | ||
README.md | ||
markdown_roundtrip.py | ||
requirements.txt | ||
results.csv |
README.md
Comparing Python Markdown libraries for roundtrip capability
There are quite a few Python libraries available that deal with Markdown. Basically all of them can convert it into HTML, most can load it into some sort of AST or structured representation, and some can generate new Markdown from that representation. The latter allows you to make targeted modifications without reverting to error-prone raw string manipulation.
However, if the Markdown documents you intend to programmatically edit will also be regularly edited by humans, and especially if the documents are intended to be version controlled, e.g. via Git, then you don’t want to make disruptive changes to the file when doing your edits.
For example, some libraries will remove unnecessary whitespace.
Others will replace all occurences of _emphasized_
and *emphasized*
, which is equivalent in Markdown, to only one hardcoded variant (i.e. wrap them always in underlines, or always in asterisks).
And some will even change the style of headings.
This is a problem, because you will introduce unrelated changes on save, leading to confused users, excessive diffs or even merge conflicts.
I have tested some of the libraries available for how perfectly they can do “round trips”, i.e. converting from Markdown into Python objects and then back again without any modification to the original file and the quirks or irregularities it might have.
Summary
If you need round-trips, you need to use mistletoe (≥ 1.1.0). It will still change unrelated parts of your file if they are broken Markdown, but assuming your file is okay, mistletoe should to byte-exact round-trips.
Mistune (≥ 3.0.0) is the runner-up, only removing excess whitespace and normalizing some tokens.
All other libraries I’ve tested either don’t have a renderer available, or very significantly normalize your document.
Test results
library | whitespace-preserving | has renderer | notes |
---|---|---|---|
marko | yes | no | No Markdown renderer available. |
markdown-it | not tested | via mdformat | mdformat is opinionated and will normalize the document. |
mistletoe | yes | yes | Perfect roundtrips, as long as the document doesn’t contain invalid Markdown. |
mistune | not quite | yes | Unnecessary whitespace will be removed. Some tokens will be normalized. |
pandoc | no | yes | Normalized on read and (very customizable, but still) on render. |
python-markdown | not tested | no | No Markdown renderer available. |
Testing code
You can find the code I used to test the libraries in markdown_roundtrip.py
.
I have been testing, for maximum backwards compatibility, with Python 3.6 or, if the library didn’t support that version anymore, with 3.7.
If the library doesn’t have a renderer available, there’s no test class for it.
Author & license
This test has been performed by scy. The contents of this repository are licensed under CC0 1.0 Universal.