Create a Harris Matrix from archaeological stratigraphy data using Python and Graphviz. See the Harris Matrix Data Package https://codeberg.org/steko/harris-matrix-data-package for a revised and updated version https://www.iosa.it/2008/08/27/harris-matrix-with-graphviz-a-draft-application-with-python/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
Stefano Costa 3dcb4f575a Merge branch '1-add-readme' into 'master' 3 years ago
example Fix file paths to match documentation. Close #1 3 years ago
.gitignore A gitignore file 3 years ago
README.md Wrote an extensive README. 3 years ago
harris-matrix-manual.png Wrote an extensive README. 3 years ago
harris.py Fix file paths to match documentation. Close #1 3 years ago
harrisDb.py Add more functionality for graph creation from database. 11 years ago
harrisGv.py Add more functionality for graph creation from database. 11 years ago
requirements.txt Declare the dependency on pygraphviz. Close #3 3 years ago

README.md

Harris matrix with Python and Graphviz

This is a proof of concept for a Harris matrix created with Python and Graphviz, that I wrote in 2007-2008. A short Python script reads the stratigraphic data from a SQLite database and feeds the data to Graphviz, that draws the matrix.

To see an example, install Graphviz, then:

pip install pygraphviz
python harris.py

This will create the matrix.png image, with the example Harris matrix described in the example/matrix.db database. Read below for more details!

What is the Harris matrix?

On most archaeological excavations adopting the methodology of single context recording, the large number of stratigraphic units makes it necessary to use some sort of representation of the relative chronological sequence to keep track of what has already been excavated (not to mention building archaeology). This conceptual tool is the Harris matrix, used on paper for decades since its inception in the late 1960s.

A textbook example of Harris Matrix

From the theory of E. C. Harris, we know that all stratigraphic relations are bound to what I called an A-B-C model:

  • A fter / later than
  • B efore / earlier than
  • C ontemporary

(Contemporaneity without physical equality is rather problematic, though).

A short software history

Software applications aimed at creating a digital Harris matrix include WinBASP and ArchEd. (Win)BASP is to my knowledge the earliest example of a data management environment that (correctly) did away with representing the Harris matrix as a drawing, focusing on the underlying stratigraphic data. The Harris matrix can be formally defined as a directed graph from the most recent down to the older deposits, where the nodes represent layers, that are connected through stratigraphic relations (edges).

In 2007 and 2008 I spent some time experimenting with Graphviz for automating the creation of the Harris Matrix for the excavation of Gortyna in Crete. This repository contains the small Python application I had been writing to demonstrate how to automate the use of Graphviz to generate Harris matrix diagrams for that excavation. The application is far from complete and has no GUI, but it shows the model I had been developing from the first examples, where all steps were to be performed “by hand”. I published two blog posts (2007, 2008) detailing the experiment.

In the following years, there have been two interesting software tools based on the same data first, Graphviz later principle:

  • pyArchInit is a QGIS plugin that offers a complete data management solution for archaeology, written in Python and inspired by my experiment for using Graphviz
  • hm is a Common Lisp library that develops much further the analytical features and the graphic output ‒ it’s the most interesting project to follow

In the meantime, I'm afraid the vast majority of Harris matrices are drawn using Illustrator or Excel.

Directly using Graphviz to compose a Harris matrix

Using Graphviz directly is an instructing exercise and is easier to understand if you're just starting. We will be writing a plain text file that is used for both:

  1. describing the stratigraphic relationships as data
  2. processing data to obtain the graph

Graphviz has its own native, plain text format, that is documented on the website. Graphviz .dot files can be read and written with any text editor like Emacs, Vim, or Notepad++. Keeping a file of this kind is the obvious choice to experiment, even though the single-file approach is not very efficient for real world data.

This is a sample from the final .dot file I had compiled during the excavation weeks in Gortyna:

digraph matrix {
    723->722
    505->732
    729->732
    731->730->729
    726->729
    730->726
    726->810->725
    729->810->725
    729->733->792->793
    722->731
    732->737->736->733
    733->810->725
    729->505
    736->506
    505->506
    179->759
    759->725
    759->737
    759->769->768->778
    768->303
    737->739->736->778
    736->769
    778->303
    506->303
    769->506
    769->780
    778->779
    736->773->774->779->780
    779->303
    780->303
    506->780
    505->724
}

You can save this file as harris-matrix.dot and follow along with code examples below.

Apart from the initial preamble, it's a ridiculously easy syntax. The Harris Matrix is to be read top-down, so i.e. A -> B means “A is later than B”. You can also concatenate multiple relations on the same row. Indenting is not mandatory, but it helps keeping your file clean. You can write comments on any line after a # character, like

# this is a comment
A -> B -> C
A -> D -> E # this one too!

It's not that difficult to keep this file updated by hand, really. One thing you could worry about are redundant relations that could for sure make your graph ugly and unreadable. But this is about automation, so redundant data isn't going to be a problem: we'll be recording each relation.

I mentioned above that the Harris Matrix is a directed graph. Graphviz comes with a lot of tools, but only one does what we need, and it's named dot. From the command line we can just run

dot harris-matrix.dot -Tpng -o harris-matrix.png

and get in zero seconds our data compiled as a graph. The -Tpng command line option specifies which one of the many available output formats we want to get. The -o flag (that is, option) precedes the output filename.

So far, the result is quite good. But redundant relations are still there, and I promised it wouldn't be a problem at all.

Here's when the power of UNIX comes in help. tred is another of the many tools provided by Graphviz, that acts as a “transitive reduction filter for directed graphs”. So, it has to run before dot reads the input file. A pipe (represented by the | character) is the easiest way to pass data from one program to another in UNIX style. Here's how I did it:

tred harris-matrix.dot | dot -Tpng -o harris-matrix-tred.png

Note that dot by default accepts input from stdin, while tred by default uses stdout as output. Many simple programs that do one single operation, well done: this is the core of the UNIX philosophy, and Graphviz follows it. Once you understand this concept, things will be much easier. The output of this second command is slightly different from the first one:

Result of the tred command

You can play around with some general options to change the graphic layout of your graph. These are two options I often use to get better looking Harris Matrices:

digraph matrix { # these two options go at the beginning of the graph file
    concentrate=true;
    node[shape=rect];