Removed an obsolete ToDo.
|3 years ago|
|.gitignore||3 years ago|
|LICENSE||3 years ago|
|README.md||3 years ago|
|config.json.example||3 years ago|
|create_db.sql||3 years ago|
|discover_accounts.py||3 years ago|
|discover_instances.py||3 years ago|
Python scripts to discover new accounts and instances on the Mastodon Fediverse.
discover_accounts.py - Crawls the /api/v1/accounts/ endpoint on Mastodon instances for known users.
discover_instances.py - Discovers new Mastodon instances by looking through the home instances of discovered accounts.
How to setup?
- Run create_db.sql to create database in mySQL
- Manually add first instance to crawl to the instances table
- Add your configuration parameters to config.json
How to start crawling?
- Run discover_accounts.py to discover the first 200 known accounts of the first instance
- Run discover_instances.py to add home instances of the first 200 known accounts to the instances table
- Run discover_accounts.py again to use multi-threaded crawling on multiple instances at once
- Run discover_instances.py again some time to add more instances to the instances table
Things to look out for:
- Both Python scripts will use 24 worker threads to optimize throughput. If you run this on a smaller machine, adjust max_threads in config.json.
- Mastodon APIs are usually setup with a default request limit of 350 per 5 minutes. The script discover_accounts.py is aware of this limitation and will stop crawling an instance for at least 5 minutes after it has crawled 200 accounts.
- The script discover_accounts.py will crawl a maximum of 200 accounts per instane per run. After it has crawled 200 accounts for each known instance, it will quit. The next time you start the script, it will continue for each instance where it left off. If you want to continously crawl (e.g. because you want to get all users of the Mastodon Fediverse), you will have to repeatedly call the scripts or build some kind of loop around it.
- Change notification terms to be configurable through config.json
- Add another Python script that adds a thoughtful logic for endless crawling!
- Add a requirements.txt for the virtual environment
PS: What does "Stasi" mean?
"Stasi" is an abbreviation for the German word "StaatsSicherheit", which was the name of the Intelligence Agency of East Germany. It was famous for employing roughly 1 million people in a country with a total population of 16 million. This agency mainly spied on their own people and recorded almost all interactions between all citizens.
A lot more info can be found here: https://en.wikipedia.org/wiki/Stasi