===============
murature.py
===============
basic usage
-----------
You need a recent version of Python. This scripts have been tested with
Python 2.4 and 2.5.
You will need also the Python OGR bindings and the RPy module.
On Debian-like systems (including Ubuntu), this is just a matter of installing
the following packages: ``python-gdal``, ``python-rpy``.
To run the script, open a terminal in the directory that contains the files,
and type::
python example.py 3.1
``3.1`` is the *magic number* that we pass to the script. If you don't specify a
magic number, the default value is ``3.0``.
The usual output is something like this::
by hand
Min. 1st Qu. 3rd Qu. Median Max. Mean
9.90 12.00 16.50 13.50 30.20 15.01
Simple Height
Min. 1st Qu. 3rd Qu. Median Max. Mean
9.093 12.780 20.230 15.170 48.190 16.860
Smart Height
Min. 1st Qu. 3rd Qu. Median Max. Mean
7.768 11.540 18.410 13.640 34.670 15.150
The image ``plotR.png`` is created in the current directory and shows a
graphical comparison between the various recording methods.
library description
-------------------
This Python library was written as an help for the study of stone walls, mainly
through the quantitative analysis of spatial dimensions of stones.
The code is still in the early stages of development and lacks a clear
structure, but the functions are documented quite well with docstrings. At the
moment you find an ``example.py`` that shows how to use the library routines,
an example dataset (made by various files), and 3 Python scripts:
- ``geometrywall.py`` has all the geometry (OGR) related functions
- ``rplots.py`` contains the ``RPlots`` class that can be used to output
summaries and graphs
- ``breaks.py`` has the code for automatically classifying stones into rows
The numerical analisys is done with R using the ``rpy`` Python module. We are
trying to use just methods and functions from the standard R library.
- the ``ogr`` module is used to import geometry data and get all the needed
parameters like centroid and boundary coordinates
- height is calculated with two different methods:
* a simple method ``max(y) - min(y)``
* `the smart method`_
- we also store a central measure (like median) to be used in the next steps as
a parameter to find stone rows
Each stone is assigned a rank number using a kernel density function
(``density`` in the R ``stats`` standard library) with a narrow bandwidth. This
works because vertical coordinates of centroids aren't distributed uniformely,
but are concentrated around the mean row's value. Thus, stones who are on the
same course will get the same rank number. This allows other kind of analyses
based on the variance of single courses and other methods still to be explored.
To best compare the measures taken by hand and the automatic ones we need a
complete and detailed case study, that has to be well drawn, with an attribute
table containing the hand-taken measures.
the *smart* method
------------------
The *basic* algorhythm works by finding the highest and the lowest Y coordinate
of the stone polygon, then a simple ``max(y) - min(y)`` subtraction gives us a
*rough* estimate of the true height of the stone. Tests carried between
hand-recorded measures and this method show this roughness is way too high for
our needs. The hand-taken measures are our reference because the expert human
operator is able to record a *significant* value for the stone height, and thus
(s)he behaves quite differently than this simple algorhythm.
--insert images and graphs here--
We can try to get a *smart* height by averaging the ``n`` highest and ``n``
lowest Y coordinates. This way our ``max(y) - min(y)`` becomes something
slightly different: a difference between the *average upper limit* and the
*average lower limit*. We use a **magic number** to express the ratio of total
points to be used in this calculation.
If ``magicNumber = 3.0`` we are going to use ``totalPoints / magicNumber``
points for the *upper limit* and ``totalPoints / magicNumber`` points for the
*lower limit*.
.. code-block:: python
magicNumber = 3.0
self.stonePointsCount = stoneBoundary.GetPointCount()
pointsRatio = int(( self.stonePointsCount / magicNumber )) + 1
The ``stoneBoundary`` object is the OGR boundary (of type ``LINESTRING``) of
the stone. The OGR ``GetPointCount()`` method simply returns the number of
points in a ``LINESTRING``.
The ``pointsRatio`` variable is used to calculate *for each stone* how many
points are needed to compute the *smart* height. This way we make sure that the
algo is consistent across all the stones. Using a fixed value doesn't make
sense here, because there will be stones with a few points and others with more
than 20 points. The numerical value ``self.stonePointsCount / magicNumber`` is
a floating point number, that we must convert to an integer in order to use it.
We should think about the different results of adding or not 1 to this
variable.
.. code-block:: python
def smartAlgo(listey,ratio):
'''`smart' algo with average coordinates.'''
listey.sort()
asd = 0
for i in listey[0:ratio]:
asd = asd + i
yMin = asd/ratio
asd = 0
for i in listey[-ratio:]:
asd = asd + i
yMax = asd/ratio
yAvg = yMax - yMin
return yAvg
.. image:: points.png
the *smart 2* method
--------------------
This second smart algorhythm works in a slightly different way from the first
one.
Instead of using a predetermined number of points for averaging the upper and
lower limits, we use a range of Y coordinates based on the extreme values.
So, if ``p_max`` is the point with the highest Y coordinate, and ``p_min`` the
one with the lowest, we first obtain the *simple* height with::
simple_height = y(p_max) - y(p_min)
Then, the points that will be used for the average *upper* limit are those
whose Y coordinate is such that::
(y(p_max) - ( simple_height / n )) < y(p) <= y(p_max)
where ``n`` expresses the range that should be used, proportional to the stone
height. Note that ``y(p)`` is *less or equal* than ``y(p_max)``, otherwise
``p_max`` itself would be excluded from the procedure, exposing to
``ZeroDivisionError`` s and other bugs. The same applies for the lower limit.
Once the points to use have been selected, the two averages are calculated and
their difference is the resulting ``smart_2_height`` of the stone.
An example should make it more clear. If ``y(p_max)`` for our stone is 410.34
and ``y(p_min) = 395.16``, it's easy to obtain::
simple_height = y(p_max) - y(p_min) = 410.34 - 395.16 = 15.18
Then, for finding the average upper limit, we iterate through all the points
in the current stone, and select only those such as that::
(y(p_max) - ( simple_height / n )) < y(p) <= y(p_max)
(410.34 - (15.18 / n)) < y(p) < 410.34
Which value should we give to ``n``? Some experiments showed that values around
7 give good results, so let's use this value for now::
(410.34 - (15.18 / 7)) < y(p) <= 410.34
408.17 < y(p) <= 410.34
So, only those points that are in that range will be included in the average
upper limit.
Performance
~~~~~~~~~~~
So far, the two algorhythms both work quite well when compared to hand-made
measurements, while the difference between the two are poorly significant.
Given this correspondance, we should choose the faster and simpler one. Another
important issue is the choice of parameters. At the moment parameters must be
specified manually by the user (or fixed in the source code), and there are no
plans to change this.
buffer analysis
---------------
This analysis is optionally based on the height values calculated with one
of the methods above.
introduction
~~~~~~~~~~~~
In 1993, Parenti and Doglioni suggested the use, among other qualitative
parameters, of a quantitative parameter which would be useful to
describe a stone wall and eventually compare two walls.
This quantitative parameter is calculated on a random area from the wall,
as the ratio between the area occupied by stones and the "empty" areas around
the stones. This value, if compared to other numeric parameters (most notably
the number of stones that fall into the same area), can be useful when
creating a tipology.
Our analysis is based on this method, pushing the same concept beyond the
barrier of the wall taken as a whole.
the analysis
~~~~~~~~~~~~
For each stone-polygon, we start creating a *buffer* area. This is the area
that contains all the points within a certain distance from the polygon
boundary. How the distance is chosen can slightly change the results, depending
on wheter a fixed value is used or a value proportional to some other value
of each polygon (area, perimeter, height, width, etc..).
After the buffer area has been created, the procedure is as follows:
* subtract the stone area from the buffer area (which includes it) to get only
the actual buffer.
* find the intersection of the buffer area with the other surrounding stones
(this is obtained by creating a multipolygon that includes all stones) and
retrieve the total area of this intersection
* the ``intersection_area / buffer_area`` ratio is the value we will use as an
indicator of (possibly hidden) groups of stones.
results
~~~~~~~
So far, this method has failed to give the expected results. Lower values are
obtained for stones that are near the wall limits, or have otherwise no stones
on one or more sides. For normal stones, there are no significant variations
in the obtained value.
Though not related with the original idea, this method could be used to find
which stones are suitable for further analyses that are based on the wall
fabric.