3 Parsers
Moritz Strohm edited this page 6 months ago

Parsers

A parser in MoeNavigatorEngine is a class that parses a document with a certain syntax and outputs a DOM (document object model) after parsing it. Parsers read data in chunks and maintain the parsing state so that it doesn't matter how big the chunks are.

MarkupParser

Parsers implementing the MarkupParser interface are used to parse markup languages like (X)HTML, Markdown and others. Each parser must implement three methods:

  • Reset its parsing state to the same state as in initialisation.
  • Read a data packet and parse its content.
  • Output a DOM node tree that has been created from the parsed data.

MoeNavigatorEngine contains the following markup parser classes:

HTMLParser

Parsing HTML is done with this parser.

PlaintextParser

Anything that is considered plain text can be "parsed" with this parser. In this case, parsing means to read all the text and put it in a DOM text node inserted into a DOM node called "PlainText". This way, plain text documents can be styled the same way as HTML documents: A stylesheet parser is invoked that reads the stylesheet for plain text and after that, the rules for the PlainText node are applied.

XMLParser

The XML parser is separate from the HTML parser because XML is strict contrary to HTML regarding closing tags and syntax. As a result, the XML parser is simpler and smaller than the HTML parser.

StylesheetParser

StylesheetParser implementations parse CSS, LESS or other stylesheet languages and output a DOM where each node hierarchy below the root node represent a selector. These nodes contain the rules that have to be applied to node structures in the DOM of the parsed markup document that match the selector.