Whoops. |
||
---|---|---|
static | ||
.gitignore | ||
Makefile | ||
README.md | ||
countries.json | ||
eufeedback.py | ||
feedback.jl | ||
feedback_annot.jl | ||
loadutil.py | ||
plot_bynationality.py | ||
plot_criticisms.py | ||
plot_date.py | ||
requirements.txt | ||
sankey.py | ||
server.py |
README.md
Using the software
Installation
This software is based on python3 and scrapy. If you have python3 and pip installed
pip3 install -r requirements.txt
should install the required dependencies.
Downloading the data
The data is already in this repository, in feedback.jl, so you can skip this step. If you want to download it again, run either
make download
or
scrapy runspider eufeedback.py -o feedback.jl
to download it again (the makefile just makes a backup before invoking the download command).
Using the web interace
You can then run
python3 server.py
to start the server, afer which you'll be able to visit http://127.0.0.1:5000/index.html to load one comment. Use the buttons to navigate. Access keys are noted in brackets, how to use them depends on your browser (with Firefox: Alt+Shift+the noted key).
If you modified something, don't forget to save!
Generating plots
Simply running
make
will generate the four images I used in my blog post. You can also plot them separately:
- issues.svg (
python3 plot_criticisms.py
) - sankey.svg (
python3 sankey.py
) - bydate.svg (
python3 plot_date.py
) - bynationality.svg (
python3 plot_bynationality.py
)
The underlying data
Feedback
The feedback is directly downloaded from the API. One comment looks like this:
{
"language": "EN",
"id": 3338612,
"country": "USA",
"organization": "Wikimedia Foundation",
"surname": "",
"feedback": "Please see attached document for the feedback of the Wikimedia Foundation.",
"status": "PUBLISHED",
"firstName": "",
"attachments": [
{
"id": 27511731,
"size": 117469,
"documentId": "090166e5f12d1a4e",
"isExternalizedInHrs": true,
"ersFileName": "Wikimedia Foundation Feedback.pdf",
"pdfSize": 120720,
"isRendered": true,
"pages": 3,
"_links": {
"self": {
"href": "https://www.cc.cec/info/law/better-regulation/brpapi/api/feedbackAttachment/27511731{?projection}",
"templated": true
},
"commonFileContent": {
"href": "https://www.cc.cec/info/law/better-regulation/brpapi/api/feedbackAttachment/27511731/commonFileContent"
},
"feedback": {
"href": "https://www.cc.cec/info/law/better-regulation/brpapi/api/feedbackAttachment/27511731/feedback{?projection}",
"templated": true
}
}
}
],
"dateFeedback": "2022/09/12 23:58:37",
"publication": "ANONYMOUS",
"userType": "NGO",
"companySize": "LARGE",
"tr_number": "596597913132-95",
"historyEventOccurs": false,
"isMyFeedback": false,
"referenceInitiative": "COM(2022)209",
"publicationId": 30786148,
"publicationStatus": "CLOSED",
"_links": {
"self": {
"href": "https://www.cc.cec/info/law/better-regulation/brpapi/api/feedback/3338612{?projection}",
"templated": true
},
"initiative": {
"href": "https://www.cc.cec/info/law/better-regulation/brpapi/api/feedback/3338612/initiative{?projection}",
"templated": true
},
"attachments": {
"href": "https://www.cc.cec/info/law/better-regulation/brpapi/api/feedback/3338612/attachments{?projection}",
"templated": true
},
"campaign": {
"href": "https://www.cc.cec/info/law/better-regulation/brpapi/api/feedback/3338612/campaign{?projection}",
"templated": true
},
"account": {
"href": "https://www.cc.cec/info/law/better-regulation/brpapi/api/feedback/3338612/account"
},
"reports": {
"href": "https://www.cc.cec/info/law/better-regulation/brpapi/api/feedback/3338612/reports"
}
}
}
Important things I extracted during analysis include
- the unique id (
id
) which is used to identify a feedback - the country (
country
) - the actual feedback text (
feedback
) - a list of attachments (
attachments
), each with- a unique
documentId
, which helps to find the document URL to download ithttps://ec.europa.eu/info/law/better-regulation/api/download/{documentId}
- the filename (
ersFileName
)
- a unique
- when the feedback was submitted (
dateFeedback
) - the
userType
. Dataset containsACADEMIC_RESEARCH_INSTITTUTION
,BUSINESS_ASSOCIATION
,COMPANY
,EU_CITIZEN
,NGO
,NON_EU_CITIZEN
,OTHER
,PUBLIC_AUTHORITY
What I didn't analyse but is still useful to know:
- Who submitted this (either
oranization
orfirstName
andsurname
) - publication type (
ANONYMOUS
orWITHINFO
) indicating if the fields actually contain useful info - the language (
language
)
Annotations
The feedback I analysed is stored in feedback_annot.jl
.
Each entry looks like this:
{
"id": 3258868,
"analysis": {
"infavor": "yes",
"marked": false,
"needtranslation": true,
"ineffective": false,
"expansion": false,
"accuracy": false,
"privacy": false,
"alternative": false,
"notes": ""
}
}
It consists of an id (of the feedback that was analyzed) and the analysis results.
The analysis fields have the following values:
- infavor:
- "yes" if a comment was clearly in favor of the proposal
- "no" if a comment was clearly against the proposal
- "unclear" if the position was unclear
- "excluded" in one case to exclude an second feedback entry (3315318 posted by the same authos as 3315314) from the sankey diagram.
- marked: If I found a feedback especially interesting, I marked it to find it again later
- needtranslation: Set to true if I needed to translate something. In that case, I copied the comment into deepl to read it
- notes: My own about that feedback, often empty, sometimes summaries of the comment
The following properties are only always set if infavor == "yes" (I sometimes added them in other cases, but not always, so be careful here)
- ineffective: true if the commenter notes that the proposal will not help reduce child abuse or calls it dispropotionate
- expansion: true if the commenter fears an expansion of the proposal to scan for more things or is concerned about third parties abusing the scanning infrastructure
- accuracy: true if the commenter noted issues with the accuracy of the technologies used for detection
- privacy: true if the commenter writes that the proposal violated their privacy rights, called the proposal mass surveillance or fears minor breaches of privacy (while assuming good faith use)
- alternative: true if the commenter suggested alternatives. Usually summarized/listed in the notes Note: These criteria are somewhat vague and serve as a broad summary of the different common concerns I've found. On some edge cases you may disagree with the results (which is one reason I'm publishing them, so you can check yourself).