Scrapy Spider to crawl through https://oehb-handball.liga.nu/.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
gue 30e2122220 fix noncompetitive indicator on team item 4 days ago
scrapliga fix noncompetitive indicator on team item 4 days ago
.gitignore Initial commit 2 weeks ago
README.md updated README.md 4 days ago
main.py Initial commit 2 weeks ago
requirements.txt Initial commit 2 weeks ago
scrapy.cfg Initial commit 2 weeks ago

README.md

scrapliga

Scrapy Spider to crawl through https://oehb-handball.liga.nu/.

https://codeberg.org/gue/scrapliga

Status

This project is in a experimental state.

Scraped Data

Following items are being scraped:

  • Championship
  • ChampionshipGroup
  • Club
  • Team
  • Court
  • Match
  • Player (Not yet implemented)

See scrapliga.items for details.
Caveat: Information on https://oehb-handball.liga.nu/ is quite complicated presented, resulting in a rather unintuitive item structure.

Execute

Crawl site (currently only WHV parts are processed) and store results in directory feeds as JSON files.

[ -d feeds ] && rm feeds/*
python3 main.py
ls feeds/

Development Setup

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt