Testing several Python Markdown libraries for their round-trip capabilities.
Go to file
Tim Weber 002f7b55cd
Initial commit
2023-07-16 20:16:21 +02:00
LICENSE.txt Initial commit 2023-07-16 20:16:21 +02:00
README.md Initial commit 2023-07-16 20:16:21 +02:00
markdown_roundtrip.py Initial commit 2023-07-16 20:16:21 +02:00
requirements.txt Initial commit 2023-07-16 20:16:21 +02:00
results.csv Initial commit 2023-07-16 20:16:21 +02:00

README.md

Comparing Python Markdown libraries for roundtrip capability

There are quite a few Python libraries available that deal with Markdown. Basically all of them can convert it into HTML, most can load it into some sort of AST or structured representation, and some can generate new Markdown from that representation. The latter allows you to make targeted modifications without reverting to error-prone raw string manipulation.

However, if the Markdown documents you intend to programmatically edit will also be regularly edited by humans, and especially if the documents are intended to be version controlled, e.g. via Git, then you dont want to make disruptive changes to the file when doing your edits.

For example, some libraries will remove unnecessary whitespace. Others will replace all occurences of _emphasized_ and *emphasized*, which is equivalent in Markdown, to only one hardcoded variant (i.e. wrap them always in underlines, or always in asterisks). And some will even change the style of headings.

This is a problem, because you will introduce unrelated changes on save, leading to confused users, excessive diffs or even merge conflicts.

I have tested some of the libraries available for how perfectly they can do “round trips”, i.e. converting from Markdown into Python objects and then back again without any modification to the original file and the quirks or irregularities it might have.

Summary

If you need round-trips, you need to use mistletoe (≥ 1.1.0). It will still change unrelated parts of your file if they are broken Markdown, but assuming your file is okay, mistletoe should to byte-exact round-trips.

Mistune (≥ 3.0.0) is the runner-up, only removing excess whitespace and normalizing some tokens.

All other libraries Ive tested either dont have a renderer available, or very significantly normalize your document.

Test results

library whitespace-preserving has renderer notes
marko yes no No Markdown renderer available.
markdown-it not tested via mdformat mdformat is opinionated and will normalize the document.
mistletoe yes yes Perfect roundtrips, as long as the document doesnt contain invalid Markdown.
mistune not quite yes Unnecessary whitespace will be removed. Some tokens will be normalized.
pandoc no yes Normalized on read and (very customizable, but still) on render.
python-markdown not tested no No Markdown renderer available.

Testing code

You can find the code I used to test the libraries in markdown_roundtrip.py. I have been testing, for maximum backwards compatibility, with Python 3.6 or, if the library didnt support that version anymore, with 3.7.

If the library doesnt have a renderer available, theres no test class for it.

Author & license

This test has been performed by scy. The contents of this repository are licensed under CC0 1.0 Universal.