Generate HTML from S-Expressions
You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Detlef Stern 55c7edf4d8 Refactor to make it more consistent 1 day ago
.gitignore Initial commit 4 months ago
LICENSE.txt Update license 4 months ago Fix: empty attributes, escape within comments, ... 3 months ago
go.mod Update to newest sxpf 4 days ago
go.sum Update to newest sxpf 4 days ago
printer.go Fix: empty attributes, escape within comments, ... 3 months ago
sxhtml.go Refactor to make it more consistent 1 day ago
sxhtml_test.go Adapt to sxpf changes (IsFalse) 3 weeks ago

sxhtml - Generate HTML from S-Expressions

HTML can be represented as a symbolic expression, also called s-expression or sexpr (for short). This is a similar approach compared to SXML, an attempt to encode XML as S-expressions.

For example, the following simple HTML text:

    <h1 id="main">Title</h1>
    <p>This is some example text.</p>
    <div class="small" id="footnote">Small text.</div>

A s-expression representation could be:

  (head (title "Example))
    (h1 (@ {id "main}) "Title")
    (p "This is some example text.")
    (div (@ {class "small"} {id "footnote}) "Small text".)

The s-expression representation has the advantage of easier parsing than the HTML text. In addition, a s-expression can be easier analysed and possibly optimized, compared to a string representation. For example, a ((p) (p)) can be simplified to ((p)). Similar there are circumstances, where a (li (p "text)) should be transformed to (li "text").

This library allows to generate HTML from s-expressions.

Often, HTML is generated by using string template libraries, like Mustache (many programming languages), Jinja (Python), or html/template (Go).

One problem area is to escape certain characters, which have a special meaning in various parts of the HTML text. Obviously, the less-than character "<" signals the beginning of a tag and cannot be used literally in normal text. It must be replaced by "&lt;". Now, the ampersand character "&" has a special meaning too. It must be replaced with "&amp;". But this is only true for ordinary HTML content. Within HTML attributes (for example "href" in "<a href="...">...</a>"), other characters must not occur. If you embed JavaScript in your HTML text, there is another set of rules.

Most string template libraries fail on certain scenarios. Mustache provide replacement characters only for HTML content, but not even for HTML attributes. Similar for Jinja. The html/template library for Go requires the developer to correctly specify the adequate escaping mode.

This is because string template libraries operates on, well, the string level. All structure of the HTML text is lost.

By using a structured representation of HTML, the HTML generator knows about the specific context and can automatically select the appropriate escape mode.


SxHTML is relatively lenient about the supported HTML language. However, if in doubt, it is targeted for HTML5. All tag and attribute names must be lowercase symbols. Do not use strings or keywords to specify a tag or an attribute. SxHTML does not check, if a symbol specifies a valid HTML tag or attribute. Some tag and attribute symbols have a special meaning. specifies the list of void elements that does not have and end tag. All other tags will haven an end tag. associates attribute names with expected content. This will result in an additional escaping mechanism for specific content type. Currently, only URL content is recognized and escaped.

In addition to the list above, the are some heuristics in detecting content type based on the attribute name.

  • A prefix of "data-" is stripped. For example, data-href is also treated as an URL attribute.
  • If there is no "data-" prefix, any namespace prefix is stripped. For example, svg:href is also treated as an URL attribute, but not svg:data-href.
  • The namespace "xmlns" will always result in treating the attribute as an URL attribute, e.g. xmlns:svg.
  • If the attribute name contains one of the strings "url", "uri", "src", it will be treated as an URL attribute.
  • If the attribute name starts with "on", it will be treated in future versions as JavaScript.
  • An attribute name "style" will treat the attribute value as CSS in the future.

SxHTML defines some additional symbols, all starting with "@":

  • @ specifies the attribute list of an HTML tag. If must follow immediately the tag symbol and contains a list of pairs, where the first component is a symbol and the second component is a string, a keyword, or a number.
  • @C marks some content that should be written as <![CDATA[...]]>.
  • @H specifies some HTML content that must not be escaped. For example, @H "&amp;" is transformed to &amp;, but not &amp;amp;.
  • @@ specifies a HTML comment, e.g. (@@ "comment") is transformed to <!-- comment -->.
  • @@@ specifies a multiline HTML comment, e.g. (@@@ "line1" "line2") is transformed to \n<!--\nline1\nline2\n-->\n.
  • @@@@ specifies the doctype statement, e.g. (@@@@ (html ...)) is transformed to <!DOCTYPE html>\n<html>...</html>.


HTML defines some tags as void elements. A void element has no content, they have a start tag only. End tags must not be specified, SxHTML will not generated them. Any content except attributes are ignored. Void elements are: area, base, br, col, embed, hr, img, input, link, meta, source, track, and wbr.


Attributes are always in the second position of a list containing a tag symbol. For example (a (@ (href . "")) "SxHTML) specifies a link to the page of this library. It will be transformed to <a href="">SxHTML</a>.

The syntax for attributes is as follows:

  • The first element of the attribute list must be the symbol @.
  • Remaining elements must be a list, where the first element of the list is a symbol, which names the attribute.
  • If there is no second element in the list, the attribute is an empty attribute. For example, (input (@ (disabled))) will be transformed to <input disabled>.
  • If there is a second element in the list, it must be an atomic value, preferably a string. For example, (input (@ (disabled "yes"))) will be transformed to <input disabled="yes">.
  • If the lists contains more elements, they are ignored.
  • if the list is really a cons cell, the second element of the cons cell must be an atomic value, preferably a string. For example, (input (@ (disabled . "yes"))) will be transformed to <input disabled="yes">.

Since the attribute list is just a list, there might be duplicate symbols as attribute names. Only the first occurrence of the symbol will create an attribute. For example, (input (@ (disabled "no") (disabled . "yes"))) will be transformed to <input disabled="no">. This allows to extend the list of attributes at the front, if you later want to overwrite the value of an attribute.

If you want to prohibit the generation of some attribute while still exntending the list of attributes at the front, use the boolean Value False as the value of the attribute. For example, (input (@ (disabled False) (disabled . "yes"))) will be transformed to <input>.