|Jens Finkhaeuser 039226fab3||2 weeks ago|
|changelog.d||1 month ago|
|docs||1 month ago|
|include||2 weeks ago|
|lib||2 weeks ago|
|scripts||2 months ago|
|subprojects||2 months ago|
|test||3 weeks ago|
|vendor||2 months ago|
|.appveyor.yml||2 months ago|
|.appveyor_account.yml||2 months ago|
|.gitignore||3 months ago|
|.gitmodules||2 months ago|
|.oclint||3 months ago|
|.semgrepignore||4 weeks ago|
|.woodpecker.yml||1 month ago|
|AUTHORS.md||3 months ago|
|CODE_OF_CONDUCT.md||3 months ago|
|CONTRIBUTING.md||4 weeks ago|
|DCO.txt||1 month ago|
|LICENSE||3 months ago|
|Pipfile||1 month ago|
|Pipfile.lock||1 month ago|
|README.md||2 months ago|
|build-config.h.in||3 months ago|
|meson.build||4 weeks ago|
|towncrier.toml||1 month ago|
CAProck is a distributed object capability system.
Object capabilities (OCAPs) are unforgeable tokens that contain information on an object, and access rights associated with it. An OCAP system provides the primitives necessary to create, validate and transmit such tokens.
You may know something similar, JSON Web Tokens. The difference between JWT and OCAP is that JWT makes a claim about a subject, such as "The bearer of this token is allowed in". This claim is verified by a signature that can be validated. In OCAP, tokens are primarily claims about objects, such as "The bearer of this token may do X with Y".
The 1989 paper "A Secure Identity-Based Capability System" by Li Gong outlines ICAP, a version of OCAP aimed at networked systems. CAProck has a similar purpose, and shares some design fundamentals.
The main difference between ICAP and CAProck is that the latter takes advantage of the capabilities (pun intended) of modern computers and the intervening years of research into cryptography and information security.
The name CAProck is a word play on capabilities. A caprock is a harder rock layer capping softer rock underneath, and partially protecting it from weathering. In a similar sense, an object capability system is a hardening layer for distributed object access. It works.
I have partially capitalised CAProck here. It may be good for distinction from the rock formation. It is harder to type, though. Your call.
In lieu of a formal specification, a series of articles on distributed authorization serves as a design rationale for the time being. This will need to change.
However, a token consists of the following elements:
- An issuer id. This identifies the key with which to verify the signature.
- A token type, with possible grant or revoke values.
- A strictly increasing sequence number. This helps disambiguate conflicting tokens.
- A scope field. For now, it refers to:
- A tuple of from and until time stamps. These identify the nominal time period in which the token applies.
- An expiry policy, with possible local or issuer values. This specifies whether the above time stamps are to be strictly evaluated as the issuer specified them, or the agent may process them according to local policy.
- A claims field. This is a list of semantic triples describing the
- A subject, which may be a public key (hash) identifying a user, a system- defined group identifier, or a wildcard indicating "any subject".
- A predicate, which is a system-defined string or a wildcard indicating "any predicate".
- An object, which is a unique identifier of the object for which authorization is managed, or a wildcard indicating "any object owned by the issuer".
- Finally, a signature over all of the above made with the issuer private key.
Crucially, this is enough that some party connecting to a service can present this informatiion, and without (necessarily) querying any other party, the service can authorize or reject the request. That is the fundamental property by which distribution is achieved, non-reliance on other, more centralized parties.
Relationship to Authentication
As we all know, AAA stands for authentication, authorization and accounting. We assume some public/private key based authentication scheme here, by which it can be verified that a party making a request is in possession of some private key. If the corresponding public key then is referenced in a token, authorization can proceed based on the token contents.
CAProck itself is not involved in this (though that may possibly change). It furthermore does not care whether besides the above verification other data about the party is verified, such as metadata in X509 certificates, etc. That is an application defined concern.
Most authentication protocols involve a third party, an authentication service. Public key cryptography does not, strictly speaking, unless your application concerns require it. Some authentication protocols do not even require both communication parties to be active and reachable at the same time, such as Signal's X3DH.
CAProck's approach here is "do what you need, but when you've done it, you can send, receive and validate tokens whether any third party is currently available or not".
Relationship with OAuth
OAuth is a decentralized authorization framework, and therefore addresses much the same issues as caprock. It also distributes tokens, which may or may not be encoded in JWT. The OAuth protocol verifies such tokens by sending them to a central service, however.
Relationship with WebAuthn
WebAuthn is an authentication scheme involving hardware tokens for Two-Factor Authentication. It is orthogonal/complementary to authorization schemes.
Relationship with OpenID (Connect)
OpenID Connect is a layer built on top of OAuth 2.0 to provide identity services. It helps strengthen authentication, a problem that is orthogonal/complementary to authorization.
Relationship with SAML
Security Assertion Markup Language is both a different way to encapsulate authentication and authorization data than caprock tokens, as well as a set of protocols for validating certain assertations. These protocols always rely on central services, much like OAuth.
Relationship with Encoding for Robust Immutable Storage (ERIS)
The ERIS project does not have much relationship to CAProck directly, but also contains a design for tokens for a capability-based system. The main difference here is that ERIS defines only one capability that permits reading a resource, and does so by encoding part of a cryptographic key into the token.
Intermediate peers either know the other part of the key, and are thus part of the authorized group for the resource, or not. By contrast, CAProck tokens transport enough information for indermediate peers to participate in managing access to a resource.
The similarity in both approaches is that key exchange -- that is, of the part of the key not contained in the token -- is not part of the use case, and must occur via other channels. However, CAProck should eventually implement this as well.
Use Case Architecture
Big title. Use cases for distributed authorization all follow the same rough pattern, so instead of listing tons of individual use cases, let's examine the rough architecture. It's been outlined in the blog posts (link above), but it's worth recapitulating here. It helps to start with a cast of characters, some of which are known from various other cryptographic protocols:
- Alice (standard cast) is the protagonist, owner of a file, and initiator of actions.
- Bob (standard cast) is another author contributing to the same file.
- Dave (standard cast) in this story becomes less generic and is a data server.
- Eve (standard cast) is an eavesdropper.
- Prilidiano, he who remembers things of the past, is a networked printer or print server. This is a stand-in for any other service that may wish to do something with a document, but isn't exactly representing any particular user. Prilidiano is pretty much the one character here which strays from a generic use case architecture to a specific use case, but I like how he illustrates things.
- Ted (standard cast) is a trusted arbitrator, in this case just a communications intermediary. Ted uses the pronouns she/her, for reasons to do with the blogging process. Deal with it.
The fundamental operation is for Bob to request something of Dave or Prilidiano that concerns a resource he doesn't own. In order for the other to process this request, he presents a CAProck token with it.
Alice is in possession of a document. She wants to collaborate on this with Bob. But they live in different time zones, so they're rarely awake at the same time. In order to facilitate this, Alice sends the file to Dave to keep. Alice then goes to sleep.
Bob wakes up, and wants to get to work. He must contact Dave to get the file, but Dave is a suspicious fellow. He wants to know that Bob is actually allowed to retrieve it.
In current software architectures, this is generally solved by Dave keeping a record of Bob that tells him that Alice permitted him to retrieve the file.
But wait! Alice and Bob's collaboration started after Alice sent the file to Dave. Following the same classic architecture, this is a problem, solvable only by Alice informing Dave that this should indeed happen. Alice has a lot on her plate, though, and forgets this, so now Bob has a problem.
Solution: What if "Let's collaborate" was an actual authorization token that Alice sent to Bob? What if Bob could present this to Dave, Dave could validate that it's indeed from Alice, and just proceed without waking her up or delaying the work? That's what caprock tokens permit.
The problems keep piling up, though. Because Bob's collaboration job was to make some production edits to the document and then have it printed, he needs to have his results accessed by Prilidiano. Dave is fine with accepting Bob's changes, because that's what Alice's authorization token told him. But Prilidiano is a friend of Bob's, nobody Alice would know. Dave finds that guy incredibly suspicious and refuses to hand out the modified document.
Solution: What if a "let's collaborate" token could also include information such as "by the way, you're allowed to share this read-only"? Again, caprock tokens permit this.
Now Bob has no problem with passing on a token to Prilidiano, who will then present it to Dave as his permission slip for the file. Perfect!
Oh no! Prilidano doesn't actually know how to reach Dave! But he knows Ted, who can relay messages to Dave. Dave finds it particularly suspicious that Ted seemingly impersonates Prilidano, and soundly rejects her communications attempts.
Solution: What if Ted could prove to Dave that she's trustworthy, because Prilidiano is and said so? That's also going to be possible with caprock tokens.
The fundamental nature of global cooperation is that not every party collaborating is going to be available for direct communications all the time. While public key cryptography allows us to authenticate any user, at least to the point of proving that they're in possession of a secret they claim to know, authorization schemes rely on central arbitrators at this time.
This does not very well reflect how people actually collaborate. In the real world, if I had to ask for permission to give my neighbour an apple from my garden, that would be weird. Why should we model the digital realm in this way, then? That, in a nutshell, is what caprock's approach to authorization tokens tries to solve.
As outlined above, a claim consists of a subject, predicate and object triplet. While CAProck does not care too much about what each of these represent to the application, it places a constraint on the subject and object fiels that these must be identifiers.
Identifiers are cryptographic hashes over some identification data, such as
public keys, and are preceded by a Byte that signals the start of an identifier.
In version 1, this is a
0x10 Byte, and identifiers are SHA3-512 hashes.
CAProck is not limited to just these identifiers, though.
The predicate is a freeform string. As discussed in the related blog posts, such strings represent core attributes, such as whether a resource is readable or writable.
It's very likely that a full specification for core attributes needs to evolve, but in lieu of that, we'll define the following:
- Predicates shall be normalized UTF-8 strings. The ICU library provides a good introduction to normalizing Unicode. If this seems complicated to you, stick to ASCII - these are, after all, not strings meant to be read by ordinary users.
- Predicates shall be namespaced in Reverse domain name notation.
Since we've just specified UTF-8 as the basic encoding, and DNS does permit
all of unicode's characters except via special encoding such as Punycode,
we'll have to be more specific:
- The string shall be subdivided into labels. Each label can use any character except those from the reseved set.
- Empty labels are not permitted. This includes a leading or trailing
empty label, such as
- The reserved set of characters is the dot
U+002E) which serves as a label separator, and the colon
U+003A) which serves as a compression prefix. Finally, the wildcard token
U+002A) has special meaning (see below).
- Predicates in the
io.interpeer.namespace are reserved.
- The colon prefix
:is to be treated as an abbreviation for the reserved namespace
io.interpeer.caprock.. That is,
io.interpeer.caprock.foois semantically equivalent to
:foo. This is because it is expected that core attributes are used extensively, and compressing their namespace reduces bandwidth, compute time and power, etc.
:core) contains core attributes.
The CAProck library itself does not currently enforce predicate formats (#14), but is expected to do so in the near future.
Subject, predicate and object can be represented by wildcards. There is, however, some difference in how wildcards should be processed.
Wildcards represent an opportunity both for more loosely defining claims, as
well as for compressing wire representations of claims. The wildcard token is
* character, as known from file globbing and similar applications. As
identifiers are, however, cryptographic hashes with a single Byte prefix, it
becomes difficult to establish how to encode a wildcard identifier. Should it
be a hash over the
* input? That would waste Bytes on the wire. Instead,
we'll use a new identifier tag,
Identifiers either match other identifiers precisely, or match a wildcard identifier. There is nothing more complicated than that. This applies for subjects as well as objects.
Note, however, that an object wildcard does not grant something to "all objects". Instead, this is scoped to the verifier. It may therefore be prudent to manage keys according to the resource scopes your application should provide.
Wildcards in predicates should be handled differently (#15):
- A predicate label may either consist of only a single wildcard character, or non-wildcard characters. Mixes of both are not permitted.
- A wildcard label matches any label in that position within the predicate.
- Implementations SHOULD support matching the last label in a predicate via a
- Implementations MAY support matching other, even multiple labels in a
predicate via wildcards, e.g.
foo.*.bar.*could match any of
foo.1.bar.1, etc., but not
Core Attribute Predicates
At the time of writing, it is very unclear how core attributes are to be defined. What is clear is that such resource access modifiers as are used in computer systems can be subdivided into being able to read and modify a resource. For either fundamental operation, modifications can exist, such as append-only modifications or execution of a script.
For now, it seems that modelling these two access classes as core attributes is the necessary, and potentially sufficient set of core attributes.
|Canonical Name||Compressed Name||Description|
||The claim subject may read the claim object.|
||The claim subject may modify the claim object.|
Token serialization requires that each field be serialized. It is generally good practice to identify a field, as well as the entire token, so that stream decoding is easier. There is some effort in e.g. the multiformats/multicodec project to arrive at standardized representations, but for a number of reasons, we're using our own here.
- The varint format this relies on is unlike they one used in related interpeer projects.
- The multihash format for e.g. specifying the issuer id as a hash over a key
- It requires that it is known that the issuer ID is in fact a hash produced by a specific digest algorithm. This knowledge serves no purpose to the library user.
- On the other hand, it assigns no meaning of "issuer ID" to this self- describing format; the format just describes that it is a hash.
- Even though digests of a particular algorithm are of a fixed length, it requires that the length is also encoded.
These differences aside, the notion of creating a "standard" table of self- describing microformats is sound, and adopted here. Whether this standard remains internal to caprock and compatible implementations or gains wider usage is of little relevance.
(Not So) Frequently Asked Questions (FAQ)
Why us a new wire format when there are e.g. X509 certificates, etc.?
Simple: a CAProck token is probably smaller. In the tests, they're about 300 Bytes each. Which means it's possible to send them in a UDP packet with additional request metadata.
DTLS handshake fragmentation is a thing, because it exchanges large certificates. That introduces complexity we should not ask for.
See the COPYING file.