1 Documentation
Izzy edited this page 4 years ago

Histview Documentation

This tiny documentation covers the installation, configuration and an introduction to the usage of HistView and the download class. Additionally to this, there is an API Reference available covering all the internals of the class and helps you developing advanced scripts with these classes.

Index

Requirements

There are a few dependencies to consider in order to use HistView:

  • a Web server supporting PHP (tested only with Apache)
  • PHP version 4.1 or higher (recommended: loadable module for Apache - which again is the only tested variant), optionally compiled with support for the MySQL database to use the statistics of the Download class
  • optionally a MySQL database to store the statistics. This may be remote or local

Most (if not all) Linux distribution ship this along with their package manager, so it should be easy to install. On Windows it might be a bit more tricky. To test whether you meet these requirements, simply create a test PHP file containing a single line: <?php phpinfo();?>, put it into your document root, and call it with your browser. The resulting page will list your PHP configuration and tell you about available capabilities.

Installation

The best (and most recommended) way to install HistView is to use your Linux distributions package manager. There are *.deb and *.rpm packages available in the IzzySoft APT repository, and you can even include this repository in your APT or YUM configuration (this is described on the webpages there). Doing so, all dependencies will be resolved automatically, and installation is done by a single command (apt-get install histview or yum install histview will perform all necessary steps).

If you can neither use *.deb nor *.rpm packages, you should download the TAR archive. Having this unpacked, change to the directories created and run make install after making sure that all dependencies are met.

All these mentioned methods have at least two advantages over the manual installation:

  • all files will automatically be put to the right places
  • everything can be easily cleanly uninstalled (apt-get remove histview, yum remove histview, make uninstall)

Using *.deb/*.rpm packages with the repository moreover will allow you easy updates.

Manual Installation

If you prefer to do a manual installation, you first need to make sure all requirements are met. Then:

  1. Unpack the TAR archive to the directory of your choice. This must either be inside your web servers document root, or linked there (and the web server configured to allow that – FollowSymLinks in the Apache options), or set up in your web servers configuration as an Alias
  2. Check the class.hvconfig.inc with the configuration page and create your hv-localconf.inc file (this step has to be done in all cases)

Updating

Depending on how you performed the installation, updating from a previous version of HistView can be done in different ways:

  • Using your package manager (apt-get update histview, yum update histview)
  • using the Makefile (make install)
  • manually copying the files from the Tar archive

In any case, you afterwards may need to check your configuration again concerning new options.

Configuration

Starting with version 0.1.6, HistView as well as the download class support two configuration files. One of them is mandatory: class.hvconfig.inc defines the default settings for both of the classes. These settings are to 80..90% fine for everybody, but 10..20% may need to be "personalized". On the next update, however, the file would be overwritten by a new version – you had to check everything for changes to modify the file yourself again. To avoid that, a "personal configuration file" named hv-localconf.inc is included automatically by class.hvconfig.inc if it is found (in other words: this file is optional). Both of these files have to reside in the same directory together with the other class files.

Where modifications take place

As you may already have guessed by above description, you should not directly modify the shipped class.hvconfig.inc – but rather copy the lines you need to change to the other file, hv-localconf.inc, and modify them there. At the first time, you need to create this file (since it is not shipped). An example hv-localconf.inc may look like this:

<?php
$this->debbase = "/var/repos/ubuntu/hardy/universe/binary-all";
$this->rpmbase = "/var/repos/redhat/RPMS.mine";
$this->relname = "sample";
$this->icondir = "/icons";
$this->db["database"] = "statistics";
$this->db["user"] = "apache";
$this->db["pass"] = "h#2$YjKT";
?>

But these are already all settings you possibly need to adjust (but certainly to different values). Still, we will describe the other settings as well – but the description in the Wiki may not always be up-to-date with the latest version. So if you miss something here, or suspect something may have changed, please refer to the API Reference as shipped with the distribution (or, if you took the source from the repository, use e.g. PhpDocGen to generate the API Reference yourself).

Modification at runtime

If you need to re-define settings at a later point (i.e. outside the configuration files), please don't use the settings discussed here - since they will only be evaluated by the classes constructor. Instead, use the corresponding methods described in the API Reference. Examples for this can be found in the file histview.php which comes with the distribution.

Setting up the Database

If you want to use the statistic functions, you need to setup a database for that. The settings in the configuration file are described below – so here comes the database part.

As described in the mentioned place, HistView supports a MySQL database. We further assume, you already created a database for this, and rather discuss the required table structure:

CREATE TABLE statistics (
  dl_date         DATE,
  prog            VARCHAR(30),
  newver          VARCHAR(10),
  referrer        VARCHAR(100),
  remote_addr     VARCHAR(15), -- IPv4
  http_user_agent VARCHAR(255)
);

That's it – we do not need any primary key or the like. Later on, with your collected data being grown, you may however decide to create some indexes to speed up your analytic queries – but since these depend very much on how your queries look like, we will not discuss this here. But you may want to know what is stored in which column:

Column Content
dl_date The date of the download in the usual MySQL format
prog name of the downloaded prog (not the filename – so this is an overall stat on all file types)
newver the downloaded version of the program
referrer the referring page (if not blocked by the downloading user)
remote_addr IP address which made this request (to detect scrapers and crawlers)
http_user_agent the name of the downloading program (complete UA string as sent by it – to name scrapers and crawlers for the exclusion lists)

Privacy considerations

Some of these columns may rise privacy considerations. Though the remote_addr and http_user_agent columns are not intended to create user profiles (but to detect bad-behaving agents), and the referrer column just serves the purpose to see where your programs are known, the data stored here may be subject to abuse. To prevent that, you should clean them up after you made your analysis.

The easiest way would be to simply replace them by NULL. This is probably fine for the remote_addr and http_user_agent columns, but certainly not always wanted for the referrer - so we will show a more appropriate way:

--- Remove the path from remote referrers, but keep the domain. Don't touch
--- local references (path without protocol) or entries already processed.
--- Turns 'http://www.example.com/some/path' into 'www.example.com'
UPDATE statistics SET referrer=SUBSTRING_INDEX(SUBSTRING_INDEX(referrer,'/',3),'/',-1)
 WHERE referrer LIKE 'http:%' OR referrer LIKE 'https:%';
--- Anonymize all information about the remote site
UPDATE statistics SET remote_addr=MD5(remote_addr)
 WHERE remote_addr LIKE '%.%.%.%';
UPDATE statistics SET http_user_agent=MD5(http_user_agent)
 WHERE http_user_agent LIKE '%/%';

This script would:

  • leave all local referrers intact – while remote referrers will be replaced by their domain name
  • replace all remote addresses as well as user agent strings by their corresponding MD5 hash – so they are "anonym" (you cannot tell the literal data anymore), but you can still compare (e.g. looking for the same user agents or addresses without knowing which ones you may get – but you can still see how many different you have).

You could manually run this immediately after having checked your data for bot signs and the like, so it does not do you any harm – and for sure it does no harm to others but respecting their privacy. So please consider that.

Download Settings

Settings for the download class will also be divided into sub-sections. And as on the other pages, the prefix $this-> for the properties will again be omitted for better readability:

Database Settings

This is where your download counters reside, and this section will very likely be subject to future enhancements (as e.g. define the columns you want to use). For now, you can configure the following details:

Property Description Default
db["host"] On which machine the database to use is running "localhost"
db["database"] Name of the database to use "webstat"
db["user"] The "login" for the database. This user requires permission to select/insert/update rows "guest"
db["pass"] The corresponding password for the database "guest"
db["table"] Which table holds all the data? "downloads"

Miscellaneous Settings

Well, that's everything else again – or call it the "essential settings": Define how bots should be handled, and whether you want to use the statistic gathering stuff at all. And, of course, what files you never want to serve – even if they are found in your download directories:

Property Description Default
ignorefile In the ignorefile you can define user agents (one per line, by a significant substring) to be ignored by the download counter "./histview_ignorebots"
rejectfile Like the ignorefile – but UAs listed here will be completely rejected "./histview_reject"
rejectheader Header to send to rejected UAs "HTTP/1.1 403 Forbidden"
rejectmsg HMTL page to send to rejected UAs (instead of the file) "<html><head><title>403 Forbidden</title></head><body>Bots should not download files here.</body></html>"
excludes Files which are not to be served – even if contained in one of the download directories array(".","..","files.htm","files.php","index.php","index.htm","index.html")
statisticsmode Collect statistics in a database (1) or not (0)? If you only want to protect your downloads against bots, but don't want the download counters etc., you can turn it off with this setting. 1

General Configuration

As already described, we do not modify the class.hvconfig.inc itself, but copy the relevant lines to a file we may need to create first: hv-localconf.inc. Since we are talking about classes, the configuration modifies their properties – hence we precede each setting here with the string $this->.

While the configuration files are used by both, the download class and the HistView class itself, they share only a few settings – so this page will be quite short. Nevertheless, we use the same file – so the common settings have to be made only once, and we also have a central point to look for them.

The following table lists the settings common to both classes. The preceding string $this-> is omitted here for better readability:

Property Description Default Setting
argsep URL argument separator. Most sites use the plain ampersand (&) for this - but though that works, it does not comply to the W3C guidelines since it introduces an entity (as you define e.g. the ampersand itself as the entity &amp;). So you may need to check with your web servers (and/or PHP) configuration what to set here. If you are unsure, you may use the & since it works – but on a check with e.g. htmltidy, your pages may throw some warnings then. ";"
dltype How your downloads should be served. You may e.g. link them directly by setting this to "direct" – or use the download class by setting this to "internal". The latter has a lot of advantages, as e.g. being able to have a download counter, or kick-off bots like Yahoo Slurp when they are touching your files. "internal"

HistView

The configuration for the HistView class itself can be split into multiple sections again – and so does this page. Again, the leading $this-> will be omitted for better readability:

Changelog formatting settings

Please see the HistView page concerning changelog formatting.

Property Description Default
plus With this symbol you indicate a new feature introduced. "+"
minus With this symbol you indicate that you removed something from your code. "-"
change This symbol indicates changes in behaviour, handling, etc. "*"
bug This symbol indicates that some bug has been fixed. "!"
ver Introduces a version number – i.e. the following changes apply to that version "v"
ignore Lines starting with one of the strings mentioned in this array will be completely ignored "$","#","--","=="

Icons to be used

This does not mean the icon files, but the complete <IMG> tag for them. This way you cannot only use your local icons, but also load them from a different server (please don't use it to abuse other peoples capacities!) or modify the ALT and TITLE tags.

Property Description Default
iplus Icon for a new feature "<IMG SRC='/icons/burst.gif' BORDER='0' ALT='+' TITLE='New!'>"
iminus Icon for something removed "<IMG SRC='/icons/transfer.gif' BORDER='0' ALT='-' TITLE='Removed'>"
ichange Icon for changes "<IMG SRC='/icons/image1.gif' BORDER='0' ALT='*' TITLE='Changed'>"
ibug Icon for a bug fix "<IMG SRC='/icons/alert.red.gif' BORDER='0' ALT='!' TITLE='Bugfix'>"
itar Icon to be used for .tar.gz files to download "<IMG SRC='icons/tgz.png' BORDER='0' ALT='*' TITLE='Tar Archive'>"
ideb Icons to be used for .deb files to download "<IMG SRC='icons/deb.png' BORDER='0' ALT='*' TITLE='Debian Package'>"
irpm Icons to be used for .rpm files to download "<IMG SRC='icons/rpm.png' BORDER='0' ALT='*' TITLE='RPM Package'>"

These default settings use icons shipped with the Apache web server (/icons/*), plus the ones shipped with HistView itself (icons/*).

CSS classes

By the use of CSS classes, the text of your changelog can be highlighted. With the shipped configuration, for example, new features will be highlighted using green color, while bug fixes use red color (and simple changes just simple black color). This makes it easier to focus on what you are looking for (Has this nasty bug been fixed finally? Did he fulfil my feature request?). Of course you can play with the stylesheets themselves to adjust this to something you feel better fitting – that's what it is for.

Property Description Default
cplus CSS class for a new features description "feature"
cminus CSS class for removed things description "removed"
cchange CSS class for description of simple changes "changed"
cbug CSS class for bugfix descriptions "bugfix"

Directory settings

These settings describe your default directories. If you are serving Debian and/or RPM packages via a repository (APT/YUM), these directories are usually the same for even multiple different software packages – so it makes sense to set those up here (you don't need to do so separately for each project). For the Tarballs, however, it may be more useful to set them up for each project separately (see the example histview.php file shipped with the distribution) when you have separate directories for each program here.

Property Description Default
basedir This is only needed when linking directly to a file instead of using the download class, and then is usually identical with your web servers DOCUMENT_ROOT. Better use the download class (which offers you much more features) and leave this as-is ;) ""
debbase Directory where your Debian packages are stored within "/var/repo/debian"
rpmbase Directory where your RPM packages are stored "/var/repo/redhat.dist"
tarbase Where to find your .tar.gz files for download "/var/www/downloads"

Miscellaneous settings

That's as usual everything that didn't fit anywhere else …

Property Description Default
arch Default architecture for your packages using Redhat notation. For Debian, the corresponding one will be evaluated by the class itself. "noarch"
relname If you use to tag your packages to indicate they are delivered by your repository, this is where to define it. ""
max_relnum Next to the programs version, there is also a release number. Since the class is looping on this number to find the highest available even if there are holes, we need to tell it where to stop at latest – or it would loop forever. 9

Usage

After having installed and configured the classes, you certainly want to use them. Taking the provided example file histview.php, this page explains how to do that: We will setup a simple page which displays your ChangeLog and provides links to download released versions.

Setting it up

Sure, we already configured the classes for "general use" - but there are still some things we better leave for the particular pages as this one - e.g. parameters passed at the URL:

#==================================================================[ Setup ]===
#-------------------------------[ Read request vars and do some protection ]---
$prog = $_REQUEST["prog"];
# $prog should only contain letters, digits and "_"
if (empty($prog) || @preg_match("/[^\w]/u",$prog)) $prog = "HistView";
# $file should be alphanumeric - no special chars or / or \
$file = $_REQUEST["file"];
if (empty($file) || @preg_match("/[^\w-_.]/u",$file)) unset ($file);
# $dir should only be one of ours - here: "tar","deb","rpm"
$dir  = $_REQUEST["dir"];
if (!empty($dir) && !in_array($dir,array("tar","deb","rpm"))) unset ($file,$dir);

This first picks some arguments from the URL and assigns them to local variables, so it is easier to access them – and then "sanitizes" them to provide some protection against hackers code injection and the like: So we define that the "prog" name only may contain letters and digits (line 21). If there is any other letter in the string, or the string is empty, we assign it a default.

Analogue to that we protect the file name, which should be alpha-numerical (letters, digits, the minus, the underscore and the dot - but e.g. no slash ("/") or backslash ("") to prevent the delivery of system-wide files). If this is violated, the file name will be unset, and no file is delivered. As for the directories, we know there should be only those we have defined – humm, or we go to define in the next section.

To prevent PHP Notices to be thrown if (one of) those arguments are not passed, you could add some code around the particular block - so for the "prog" it would look like:

<?php
if (isset($_REQUEST["prog"])) {
  $prog = $_REQUEST["prog"];
  if (empty($prog) || @preg_match("/[^\w]/u",$prog)) $prog = "HistView";
} else {
  $prog = "HistView";
}
#------------------------------------------------------[ Setup directories ]---
$dirs = array( "tar" => "/var/ftp/downloads",
               "deb" => "/var/repo/debian",
               "rpm" => "/var/repo/redhat/RPMS.dist" );
$charset = "iso-8859-15";

Finally you may need to setup the directories, and the character set which we will use at a later time.

Processing a file request

No we go for the real thing: Before we display our ChangeLog, we need to see whether we should deliver a file instead:

#========================================================[ Process Request ]===
#----------------------------------------------[ Was a download requested? ]---
if (!empty($_REQUEST["file"])) {
  include("class.download.inc");
  if (!empty($file) && !empty($dir)) {
    $dl = new download();
    if ($dl->sendfile($file,$dirs[$dir])) exit;
  }
  $e404 = "\n<DIV CLASS='ebox'>Sorry - but the requested file was not found here.</DIV>";
} else {
  $e404 = "";
}

Note that in the first code line quoted here we check for the $_REQUEST variable though we already assigned it to a local variable? This is not an accident, but intended: We need to check whether something was requested – not if something has to be delivered. The latter is checked then: If both $file and $dir are empty while the corresponding $_REQUEST variable was set, they have been cleaned up. So we handle it the same as if the file was not found: If it were found, it would have been sent (and the script would have stopped). Since we are still here, the error message will be setup instead.

If you want to reference the directories configured with the classes instead, change the corresponding line to

if ($dl->sendfile($file,$hv->${dir}base)) exit;

Display the ChangeLog

If either the file request failed, or there has not been any - we are still here, and the script is still running. So now we go to display the ChangeLog:

#-----------------------------------------------[ Display the history file ]---
require_once("class.histview.inc");
$file = $dirs["tar"]."/".strtolower($prog).".hist"; // ChangeLog to parse
# Simple method, no download links to provide:
#$hv = new histview($file);
# Providing download links:
$hv = new histview($file,strtolower($prog));
# Setting up the directories
$hv->set_basedir("tar", $dirs["tar"]);

First we of course need to include the class, and create an instance of the class – which will automatically execute the configuration we made before – so we only have to provide it with the missing directory for our .tar.gz files, and we are done so far.

# Process the page
$hv->process();
$history = $hv->out();
$title = "History for $prog";
echo "<!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 4.01 Transitional//EN'>\n";
echo "<HTML><HEAD>\n"
   . " <META HTTP-EQUIV='Content-Type' CONTENT='text/html; charset=$charset'/>\n"
   . ' <LINK REL="stylesheet" TYPE="text/css" HREF="histview.css">'."\n"
   . " <TITLE>$title</TITLE>\n"
   . "</HEAD><BODY>\n<H2>$title</H2>\n";
echo $e404;
echo $history."\n</BODY></HTML>\n";

Here we tell the class to process our ChangeLog and also check for all related files, and then we obtain the well-formatted ChangeLog and assign it to the local variable $history. The following lines output the HTML header and start of the page. We also place our error message (which may be empty), and finally all the content will be sent to the browser, and the page will be closed – task completed!