Determine and publish measures for the non-js usability of web pages.
 
 
Go to file
Anselm Flügel 0ed6f5401b HTML Validity fixes 2023-02-08 17:19:51 +01:00
.gitignore Adding a bit of robustness to humongous/non-html documents. 2023-02-05 11:33:07 +01:00
README.rst Adding a README 2023-02-05 11:01:23 +01:00
c7y.css + a fragment search (and a bit cosmetics) 2023-02-04 21:31:58 +01:00
c7yops HTML Validity fixes 2023-02-08 17:19:51 +01:00

README.rst

The Crapicity Machine

The javascriptification of the Web means among other, perhaps worse problems that in today's commerical internet, many pages (as in: their HTML source) are entirely empty, except perhaps for some lame order to switch on Javascript if you're lucky. That's sad, if only because things like the Parallelweb (that could, for instance, translate the entire Web to Swebian) will be very boring on such pages.

There's a simple measure for how much pages suck by such a stanard: The ratio of extractable text to the total size of the document. That's what I call crapicity (or c7y for short; and yes, you can of course construct examples where that'll be grossly misled). Using BeautifulSoup, it's easily computed:

def compute_crapicity(doc):
    """returns the crapicity of html in doc.

    doc really should be a str -- but if len() and BeautifulSoup() return
    something sensible with it, you can get away with something else, too.
    """
    parsed = BeautifulSoup(doc, "html.parser")
    content_length = max(len(parsed.text), 1)
    return len(doc)/content_length

The program distributed here is built around this function; there's a command line interface, a SQLite-based result storage, and a web interface.

Installation

c7yops is a one-module program you can run it in place or put it somewhere into your path:

$ sudo apt install git python3-bs4 python3-requests # your your distro's equivalent
$ git clone https://codeberg.org/AnselmF/crapicity.git
$ cd crapicity
$ chmod +x c7yops
$ ./c7yops compute <some html file>
# perhaps
$ sudo cp c7yops /usr/local/bin

Usage

c7yops has a few subcommands; in the synopses below, {x} means zero or more of x, [x] means zero or one of x, and a|b means a or b.

compute {filename|URL}

computes the crapicity for files or remote resources specified by local file names or URLs you give in the command line and prints them. Note that these results are not recorded in the database.

update {URL}

computes the crapicity of documents at URLs and records these results in the database.

dump [pattern]

dumps URLs and their crapicities from the database. Optionally give a pattern for SQL LIKE (i.e., % is zero or more characters) to only dump matching entries.

serve port prefix

run a webserver at localhost:port with the root page exposed at prefix that lets people rate web pages and review previous results. See below.

Example

$ curl -o tmp.html https://codeberg.org/AnselmF/crapicity
$ c7yops compute tmp.html https://codeberg.org/AnselmF/crapicity
tmp.html: 18.32
https://codeberg.org/AnselmF/crapicity: 18.32
$ c7yops update https://gnu.org https://microsoft.com
https://gnu.org: 4.29
https://microsoft.com: 20.69
$ c7yops dump
https://gnu.org: 4.29
https://microsoft.com: 20.69
$ c7yops update http://blog.tfiu.de
http://blog.tfiu.de: 3.38
$ c7yops dump
http://blog.tfiu.de: 3.38
https://gnu.org: 4.29
https://microsoft.com: 20.69
$ c7yops serve 1070 "" https://blog.tfiu.de/media/2023/c7y.css
# now point your browser to http://localhost:1070

Running a crapicity Server

c7yops has a built-in web server; in principle, you could make it bind to "" (see the argument to ThreadingHTTPServer in do_serve) and then expose it to the net; I'd be reasonably relaxed doing that.

However, the way it's really supposed to run is somewhere within a larger site; for instance, in some nginx site you would say:

location /c7y {
  proxy_pass http://localhost:8099;
}

and then run:

c7yops serve 8099 /c7y ""

After that, http://<site>/c7y should show the (unstyled) interface.

There is some CSS coming with c7y that's really intended to be customised and served externally. Copy c7y.css to some directory that is statically served from the hosting server and pass the (URL) path to that to your c7yops serve call, for instance:

c7yops serve 8099 /c7y /media/2023/c7y.css

To make this a little less haphazard, kill the c7ops server again and create a new user to run the server; the fewer privileges the better, but it needs to be able to write to its home. If you're liberal, do:

$ sudo adduser --disabled-login c7yscorer

Become that user and clone the crapicity repo into its home:

$ sudo -u c7yscorer bash
$ cd
$ git clone https://codeberg.org/AnselmF/crapicity.git

If you have already collected a few scores as yourself, you can copy the existing database with something like:

$ mkdir -p .local/share/c7y
$ cp <your private .local/share/c7y/scores.db>  .local/share/c7y/scores.db

(where it'd be better if the second command were a bit complicated because your new user cannot look at other home's contents).

Now create a startup script. With sysvinit, something like the following should work (adapt SCRIPT_ARGS and SCRIPT_LOCATION as necessary for your machine):

#!/bin/sh
### BEGIN INIT INFO
# Provides:          crapicity
# Required-Start:    $local_fs $network
# Required-Stop:     $local_fs
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Crapicity server
# Description:       A web server for scoring and publishing crapicity scores
### END INIT INFO

SCRIPT_LOCATION=/home/c7yscorer/crapicity/c7yops
SCRIPT_ARGS="serve 8099 /c7y /media/2023/c7y.css"
UID=c7yscorer
DAEMONOPTS="--pidfile /run/c7yops.pid --oknodo --exec /usr/bin/python3"

do_start() {
    start-stop-daemon --start $DAEMONOPTS \
      --make-pidfile --background --chuid $UID \
      --output /var/log/c7yserver.log \
      -- $SCRIPT_LOCATION $SCRIPT_ARGS
}

do_stop() {
    start-stop-daemon --stop $DAEMONOPTS \
      --remove-pidfile \
      -- $SCRIPT_LOCATION
}


case "$1" in
  start)
    do_start
    ;;
  stop)
    do_stop
    ;;
  restart)
    do_stop
    do_start
    ;;
  *)
    echo "USAGE: $0 start|stop|restart"
    ;;
esac

exit 0

A systemd unit file should be correspondingly simpler. After (assuming a Debian box) sudo update-rc.d crapicity defaults and sudo service crapicity start you should be good to go.