parse Changelog ("History") files and generate HTML pages
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

64 lines
3.1KB

  1. Create Download Statistics
  2. ==========================
  3. You may have noticed that HistView now also (optionally) provides download links.
  4. Moreover, if you enable the use of the download class, you have the possibility
  5. to record download statistics to your MySQL database (see the example files
  6. provided). And here you find an example of the table you need to use that:
  7. CREATE TABLE statistics (
  8. dl_date DATE,
  9. prog VARCHAR(30),
  10. newver VARCHAR(10),
  11. referrer VARCHAR(100),
  12. remote_addr VARCHAR(15), -- IPv4
  13. http_user_agent VARCHAR(255)
  14. );
  15. No primary key needed for this. If you really want one, add a column "id" and
  16. make it auto-increment - but it's really not needed.
  17. As the name "newver" suggests, you can also have a column "oldver" (and many
  18. others) - but these are not (yet?) used by HistView and/or the download class.
  19. However, if you want to use more columns for something: No problem, the
  20. download class will simply ignore them.
  21. Some of the columns seem to indicate a "data collector" in the sense of
  22. collecting personal data to create profiles - but that is not the intention.
  23. Here is what the columns are intended for:
  24. * dl_date + prog + newver: To see what version of what prog has been downloaded
  25. how often per day. With these columns, no personal data are stored,
  26. * referrer: Of course to see who brought the downloader here.
  27. * remote_addr: If you have an unusual high amount of downloads on one day, you
  28. can be quite sure there was some crawler/bot downloading all it could get.
  29. With this column you can identify which downloads you may want to drop from
  30. the stats (DELETE...WHERE dl_date=.. AND prog=.. AND remote_addr=..)
  31. * http_user_agent: This helps you to identify bots (together with the remote_addr
  32. column) so you can add them to either the ignore or the reject file, i.e. to
  33. prevent them from increasing your DL counter (or from downloading at all)
  34. for the future. You may also want to compare the agent provided here with the
  35. source address from remote_addr, as some crawlers (especially from MS)
  36. tend to "mask" themselves, pretending to be "human" browsers. Those are
  37. candidates for the crawlernets file then.
  38. Having said that, after you applied the wanted changes (deleting wrong counters,
  39. adding bots to the ignore/reject/crawlernets file), you may want to "anonymize"
  40. the columns. You could:
  41. * truncate the referrer down to the domain
  42. * replace the remote_addr by a hash (or even by NULL)
  43. * replace the http_user_agent by a hash (or even by NULL)
  44. This is not done automatically by HistView, so you have to take care for that
  45. yourselves. Sample queries could look like:
  46. --- Remove the path from remote referrers, but keep the domain. Don't touch
  47. --- local references (path without protocol) or entries already processed.
  48. --- Turns 'http://www.example.com/some/path' into 'www.example.com'
  49. UPDATE statistics SET referrer=SUBSTRING_INDEX(SUBSTRING_INDEX(referrer,'/',3),'/',-1)
  50. WHERE referrer LIKE 'http:%' OR referrer LIKE 'https:%';
  51. --- Remove all information about the remote site
  52. UPDATE statistics SET remote_addr=NULL;
  53. UPDATE statistics SET http_user_agent=NULL;