Source code for the analysis of the feedback on the EU chatcontrol regulation
Go to file
Maxim 944457b772 Fix wrong filename
2022-10-09 20:22:09 +02:00
static Initial commit 2022-10-09 20:10:44 +02:00
.gitignore Initial commit 2022-10-09 20:10:44 +02:00
Makefile Initial commit 2022-10-09 20:10:44 +02:00 Fix wrong filename 2022-10-09 20:22:09 +02:00
countries.json Initial commit 2022-10-09 20:10:44 +02:00 Initial commit 2022-10-09 20:10:44 +02:00
feedback.jl Initial commit 2022-10-09 20:10:44 +02:00
feedback_annot.jl Initial commit 2022-10-09 20:10:44 +02:00 Initial commit 2022-10-09 20:10:44 +02:00 Initial commit 2022-10-09 20:10:44 +02:00 Initial commit 2022-10-09 20:10:44 +02:00 Initial commit 2022-10-09 20:10:44 +02:00
requirements.txt Initial commit 2022-10-09 20:10:44 +02:00 Initial commit 2022-10-09 20:10:44 +02:00 Initial commit 2022-10-09 20:10:44 +02:00

Using the software


This software is based on python3 and scrapy. If you have python3 and pip installed

pip3 install -r requirements.txt

should install the required dependencies.

Downloading the data

The data is already in this repository, in feedback.jl, so you can skip this step. If you want to download it again, run either

make download


scrapy runspider -o feedback.jl

to download it again (the makefile just makes a backup before invoking the download command).

Using the web interace

You can then run


to start the server, afer which you'll be able to visit to load one comment. Use the buttons to navigate. Access keys are noted in brackets, how to use them depends on your browser (with Firefox: Alt+Shift+the noted key).

If you modified something, don't forget to save!

Generating plots

Simply running


will generate the four images I used in my blog post. You can also plot them separately:

  • issues.svg (python3
  • sankey.svg (python3
  • bydate.svg (python3
  • bynationality.svg (python3

The underlying data


The feedback is directly downloaded from the API. One comment looks like this:

  "language": "EN",
  "id": 3338612,
  "country": "USA",
  "organization": "Wikimedia Foundation",
  "surname": "",
  "feedback": "Please see attached document for the feedback of the Wikimedia Foundation.",
  "status": "PUBLISHED",
  "firstName": "",
  "attachments": [
      "id": 27511731,
      "size": 117469,
      "documentId": "090166e5f12d1a4e",
      "isExternalizedInHrs": true,
      "ersFileName": "Wikimedia Foundation Feedback.pdf",
      "pdfSize": 120720,
      "isRendered": true,
      "pages": 3,
      "_links": {
        "self": {
          "href": "{?projection}",
          "templated": true
        "commonFileContent": {
          "href": ""
        "feedback": {
          "href": "{?projection}",
          "templated": true
  "dateFeedback": "2022/09/12 23:58:37",
  "publication": "ANONYMOUS",
  "userType": "NGO",
  "companySize": "LARGE",
  "tr_number": "596597913132-95",
  "historyEventOccurs": false,
  "isMyFeedback": false,
  "referenceInitiative": "COM(2022)209",
  "publicationId": 30786148,
  "publicationStatus": "CLOSED",
  "_links": {
    "self": {
      "href": "{?projection}",
      "templated": true
    "initiative": {
      "href": "{?projection}",
      "templated": true
    "attachments": {
      "href": "{?projection}",
      "templated": true
    "campaign": {
      "href": "{?projection}",
      "templated": true
    "account": {
      "href": ""
    "reports": {
      "href": ""

Important things I extracted during analysis include

  • the unique id (id) which is used to identify a feedback
  • the country (country)
  • the actual feedback text (feedback)
  • a list of attachments (attachments), each with
    • a unique documentId, which helps to find the document URL to download it{documentId}
    • the filename (ersFileName)
  • when the feedback was submitted (dateFeedback)

What I didn't analyse but is still useful to know:

  • Who submitted this (either oranization or firstName and surname)
  • publication type (ANONYMOUS or WITHINFO) indicating if the fields actually contain useful info
  • the language (language)


The feedback I analysed is stored in feedback_annot.jl. Each entry looks like this:

  "id": 3258868,
  "analysis": {
    "infavor": "yes",
    "marked": false,
    "needtranslation": true,
    "ineffective": false,
    "expansion": false,
    "accuracy": false,
    "privacy": false,
    "alternative": false,
    "notes": ""

It consists of an id (of the feedback that was analyzed) and the analysis results.

The analysis fields have the following values:

  • infavor:
    • "yes" if a comment was clearly in favor of the proposal
    • "no" if a comment was clearly against the proposal
    • "unclear" if the position was unclear
    • "excluded" in one case to exclude an second feedback entry (3315318 posted by the same authos as 3315314) from the sankey diagram.
  • marked: If I found a feedback especially interesting, I marked it to find it again later
  • needtranslation: Set to true if I needed to translate something. In that case, I copied the comment into deepl to read it
  • notes: My own about that feedback, often empty, sometimes summaries of the comment

The following properties are only always set if infavor == "yes" (I sometimes added them in other cases, but not always, so be careful here)

  • ineffective: true if the commenter notes that the proposal will not help reduce child abuse or calls it dispropotionate
  • expansion: true if the commenter fears an expansion of the proposal to scan for more things or is concerned about third parties abusing the scanning infrastructure
  • accuracy: true if the commenter noted issues with the accuracy of the technologies used for detection
  • privacy: true if the commenter writes that the proposal violated their privacy rights, called the proposal mass surveillance or fears minor breaches of privacy (while assuming good faith use)
  • alternative: true if the commenter suggested alternatives. Usually summarized/listed in the notes Note: These criteria are somewhat vague and serve as a broad summary of the different common concerns I've found. On some edge cases you may disagree with the results (which is one reason I'm publishing them, so you can check yourself).