C reference parsing library for eno notation
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Simon Repp dbbd4e6604
Assemble continuation-based values at parse time
2 months ago
include Implement parsing from user-supplied memory 3 months ago
lib Assemble continuation-based values at parse time 2 months ago
test Refactor global document data representation and stream reading code 3 months ago
.gitignore Set up doxygen 4 months ago
CMakeLists.txt Assemble associated comments at parse time 2 months ago
Doxyfile Set up doxygen 4 months ago
README.md Add getting started section to the readme 3 months ago

README.md

libeno

C reference parsing library for eno notation.

Getting started

#include <eno.h>
#include <string.h>

int main()
{
    char *content = "Who to greet: World";
    ENOIterator iterator;

    if (eno_parse_memory(&iterator, content, strlen(content))) {
        char *ptr;
        size_t size;

        eno_iterate_next(&iterator); /* advance to first element */

        if (eno_get_value(&iterator, &ptr, &size)) {
          printf("Hello %.*s!", (int)size, ptr);
        }
    } else {
        eno_report_error(&iterator);
    }

    eno_free_document(&iterator);

    return 0;
}

Development status

With the exception of copies libeno already implements the full eno specification. Some parts are still interim implementations, expect occassional full-on crashs for certain input or usage patterns.

The API is partially documented through doc comments, please be aware that everything is still moving and changing though.

In addition to the basic parser stuff you’d expect there is also some neat functionality for printing back the AST that libeno constructs during parsing, annotated with line numbers, both in plain text and with terminal coloring.

String encoding

Eno notation is always encoded in UTF-8, and so is the resulting content of all string buffers extracted from the document and provided to library consumers after parsing. The UTF-8 Everywhere Manifesto formulates a good rationale for this if you’re curious about some of the reasoning for this design decision.

Build

You need to install icu4c (the ICU library for C), including its headers. On linux this can be done through the package manager (e.g. libicu-dev on Ubuntu).

When the icu4c dependency is satisfied you can build libeno like this:

mkdir build
cd build
cmake ..
make

Test

Parsing correctness test suite

Inside build/ run:

./test_examples

This scans through the .eno files in test/examples/ and for each of them, after parsing, creates serializations of both the internally and externally accessible representation of the abstract syntax tree of the document, which are then stored in a directory named after the example but postfixed with .spec instead of .eno. These files are tracked with git and on consecutive runs the serializations are re-generated and compared to the previously generated snapshots, with any mismatches triggering errors and printing a diff for the affected line. These specs serve to ensure that the parser behaves according to specification and that regressions are quickly noticed during development. Furthermore this serves as an examplified version of the specification because more complex behavior can easily be studied by looking at the .eno files and their respective .spec directory counterparts, which reveal how documents are tokenized, continuations are assembled, element hierarchies are merged during copying, etc.

To force an update of all specs you can run:

UPDATE_SPECS=yeah ./test_examples

Parsing a document, printing the AST

Inside build/ create a test document and run:

./test_parse your_document_path.eno

This debug-prints the document’s abstract syntax tree and reports errors in the document if it encounters one.

Parsing a document, obtaining a value by key

Inside build/ create a test document and run:

./test_get your_document_path.eno "some key"

This parses the document and prints the value of a field at the root level of the document that matches the supplied key. It also prints considerable noise if there are other element types than fields, because the public API used for this is still completely experimental.