Limit Downloads by IP+Prog
If you find your bandwith wasted by crawlers downloading everything (especially every single version of a certain file), and you cannot catch them with one of the other algorithms (i.e. no specific user agent and IP), here's a different approach to block them off: Limit the amount of downloads per prog and IP. This requires an additional table in the database, and is thus disabled by default. If you want to use this feature, you will first have to create this table in the configured database:
CREATE TABLE hvlimits ( dl_date DATETIME, -- 'YYYY-MM-DD HH:MM:SS' prog VARCHAR(30), remote_addr VARCHAR(15), -- IPv4 http_user_agent VARCHAR(255) );
The columns use the same names as in the statistics table (see stats.txt), but the purpose is different: With this feature enabled (i.e.
$dlFileLimit > 0), a new record will be inserted on each download request. After this, HistView checks the table: If for the requesting IP and the requested program there are already more than $dlFileLimit download attempts recorded within the last $dlTimeLimit seconds, the download will be refused, and $rejectmsg is instead displayed to the downloader.
An example to visualize this. Imagine the following settings:
$dlFileLimit = 5; $dlTimeLimit = 300;
Plus your program "myprog" available as RPM, DEB and TGZ. Now some user tries to download all 3 types of the latest version, each one time. And let's assume these downloads are finished in less than 300s. So at the end, there are 3 entries for him in our table for "myprog" (since we cound all 3 requests belonging to the same file). The user succeeded with all 3 downloads.
Now he decides to download the previous version as well – and will succeed for the next 2 requests. On the 3rd, the following would happen (assumed we are still in the 300s frame):
- a new entry would be made into our table
- the check finds 6 entries for this IP and prog within the last 300s
6 > 5, i.e. our configured limit has been exceeded
- this download will be cancelled
If he now tries again and again, he will still be rejected – until pausing for 300s (our configured window). After that, he may try again and succeed.
Conclusion: Scripted mass downloads would have trouble, and stop after reaching our limit – which is exactly what we wanted. If we find the window to small, we can increase
Since the data in our table is of no use once it is older than
$interval seconds, HistView will automatically purge older values if you set
TRUE (recommended setting). Of course this requires some privileges on the table used – with the example configuration, this would be:
GRANT DELETE ON hvlimits TO guest@localhost IDENTIFIED BY 'guest';
You may wish to temporarily set
FALSE in order to check for abusing IPs (and agents). But to protect the privacy of your users, you shouldn't do this longer than necessary or set up other purge jobs to be run after you evaluated what you needed to.
If for some other reasons you cannot use the AutoPurge functionality (e.g. you have security concerns giving the web process the DELETE permission), you may want to setup a cron task (or run a manual purge). The statement you would need for this probably looks like this:
DELETE FROM hvlimits WHERE dl_date < SYSDATE() - INTERVAL 300 SECOND;
Where you may want to replace the "300" by the age of the oldest entry you wish to keep in the table.