Draft for a text-based accessible Captcha Service. This is targetted at users who already know they want to use captchas over manual confirmation.
When evaluating existing captcha services, we see all or some of the following problems:
- Proprietary: Providing useful captcha systems is usually a domain of profit, and many - even free software services - integrate with solutions like reCaptcha, hCaptcha or similar. This often means they are offered as SaaS-S, cannot be self-hosted and are difficult to operate with maximum privacy-friendliness.
- Inaccessible: While free software solutions exist, they are often started as tiny side projects and included in other solutions, like a default captcha of a web framework. They often just implement a "minimum", and are thus inaccessible to users who cannot solve visual captchas.
- Easy: Many captcha solutions are too easy to crack. Just disturbing some letters and numbers is nowadays easily cracked by bots, after we switched the captcha on Codeberg to a more default one recently, we saw spam registrations rise to a multitude of what it has been before due to automatic bot solving.
We know of the following solutions (feel free to suggest any other):
- FriendlyCaptcha European and allegedly privacy-friendly solution, probably better than reCaptcha, hCaptcha and the like. There is a simple OpenSource server available.
- mCaptcha another proof of work captcha system, but libre. I'm personally sceptic towards PoW, but let's look closer.
- VisualCaptcha (GitHub) Open-Source and self-hosted, but no longer developed or maintained. Needs more evaluation, but likely requires some work for integration and modernization.
- OpenCaptcha open source and self-hosted. Uses an interesting approach with graphs and is worth further investigation.
- Honeypots (Upstream discussion) Probably difficult to get right, and does not reduce manual spam registration. Still worth a further consideration.
- Captcha customizations: It's funny, but from our experience, even little modifications to the captcha seem to keep generic bots away (but probably won't help against targetted attacks). Ideas include: Inverting, rotating, mirroring image and reverting this via CSS, splitting in parts or layers and combining client-side again. This solution does not improve accessibility, but could act as short-term measures.
In case we don't find a viable solution for Codeberg, we are considering to build a simple system on our own.
- Text Based: The system focuses on being text-based with a simple question-answer system. This allows being accessible with a single solution both to users with intact eye vision and impaired ones. Also, it would in theory allow to use captchas in text-only systems, such as chat or SSH (e.g. asking for a captcha upon creating a repo via Git push options).
- Free and self-hosted: Of course, a development by Codeberg will be Free Software with the possibility to self-host. The system should be easy to set up and lightweight. It should also be suited for users who want to protect the contact form on their personal blog etc.
- Crowdsourced: Having only predefined text questions means the system gets insecure as soon as attackers get over to mapping them to the right answer. Thus, we want to rely on users to ask new questions, e.g. by nicely asking to make up their own question after they successfully solved a captcha.
Details & Roadmap
The sytem should consist of the following features. Numbers in brackets represent priorization (iteration of implementation).
- Gernal / UI / Basic features:
- (1) Multiple choice questions for easy answering and without the need to heavily parse user input
- (2) allow users to propose new questions by writing a question and four answers
- allow to record and display questions in multiple languages
- (1) define hosts or access tokens which are allowed to use the API
- (1) configure difficulty of questions (e.g. how many to display)
- (1) integrate with Gitea and possibly other services, e.g. by implementing an API which is close or similar to proprietary solutions
- (2) record questions in db with correct answer and
- (2) collect stats about questions (solves + success / fail ratio)
- (3) automatically drop questions that were often answered (to avoid that over time an attacker knows most of the questions)
- (3) drop questions that were too hard to solve (and thus eliminate the attack vector of entering garbage into the question proposal to brick the system)
- weight questions difficulty (via stats or via heuristic) to allow to better matching the configured difficulty (e.g. two hard or three simple questions)
- Future considerations:
- Proxy service to allow websites which do not want to maintain their own dataset to share a it with a hosted service, but not sharing more information than necessary
This is currently only an idea of our Infrastructure Team. If you like the idea, we are happy about contributions to this project. Feel free to reach out to us either by opening an issue, joining our Matrix Chats, or by directly contacting the maintainers of this project: