blob: 1128a402c2e80483f6d91a52873b6e81c65779a2 [file] [log] [blame]
Header-Based Patch Attestation
==============================
Author: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Status: Beta, soliciting comments
Preamble
--------
Projects participating in decentralized development continue to use
RFC-2822 (email) formatted messages for code submissions and review.
This remains the only widely accepted mechanism for code collaboration
that does not rely on centralized infrastructure maintained by a single
entity, which necessarily introduces a single point of dependency and
a single point of failure.
RFC-2822 formatted messages can be delivered via a variety of means. To
name a few of the more common ones:
- email
- usenet
- aggregated archives (e.g. public-inbox)
Among these, email remains the most widely used transport mechanism for
RFC-2822 messages, most commonly delivered via subscription-based
services (mailing lists).
Email and end-to-end attestation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are two commonly used standards for cryptographic email
attestation: PGP and S/MIME. When it comes to patches sent via email,
there are significant drawbacks to both:
- Mailing list software may modify email body contents to add
subscription information footers, causing message attestation to
fail.
- Attestation via detached MIME signatures may not be preserved by
mailing list software that aggressively quarantines attachments.
- Inline PGP attestation generally frustrates developers working with
patches due to extra surrounding content and the escaping it
performs for strings containing dashes at the start of the line for
canonicalization purposes.
- Only the body of the message is attested, leaving metadata such as
"From", "Subject", and "Date" open to tampering. Git uses this
metadata to formulate git commits, so leaving them unattested is
suboptimal (they can be duplicated into the body of the message,
but git format-patch will not do this by default).
- PGP key distribution and trust delegation remains a difficult
problem to solve. Even if PGP attestation is available, the
developer on the receiving end of the patches may not make any use
of it due to not having the sender's key in their keyring.
- S/MIME certificates are increasingly difficult to obtain for
developers not working in corporate environments. At the time of
writing, only two commercial CAs continue to provide this service --
and only one does it for free.
For these reasons, end-to-end attestation is rarely used in communities
that continue to use email as their main conduit for code submissions
and review.
Email and domain-level attestation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Since unsolicited emails (SPAM) frequently forge headers in order to
appear to be coming from trusted sources, most major service providers
have adopted DKIM (RFC-6376) to provide cryptographic attestation for
header and body contents. A message that originates from gmail.com will
contain a "DKIM-Signature" header that attests the contents of the
following headers (among others):
- from
- date
- message-id
- subject
The "DKIM-Signature" header also includes a hash of the message body
(bh=) that is included in the final verification hash. When a DKIM
signature is successfully verified using a public key that is published
via gmail.com DNS records, this provides a degree of assurance that the
email message has not been modified since leaving gmail.com
infrastructure.
Just as PGP and S/MIME attestation, this has important problems when it
comes to patches sent via mailing lists:
- ML software commonly modifies the subject header in order to insert
list identification (e.g. ``[some-topic]``). Since the "subject"
header is almost always included into the list of headers attested
by DKIM, this causes DKIM signatures to fail verification.
- ML software also routinely modifies the message body for the
purposes of stripping attachments or inserting list subscription
metadata. Since the bh= hash is included in the final signature
hash, this results in a failed DKIM signature check.
Even if all of the above does not apply and the DKIM signature is
successfully verified, body canonicalization routines mandated by the
DKIM RFC may result in a false-positive successful attestation for
patches. The "relaxed" canonicalization instructs that all consecutive
whitespace is collapsed, so patches for languages like Python or GNU
Make where whitespace is syntactically significant may have different
code result in the same hash.
So, while DKIM works well enough for regular domain-level email
attestation, it still has significant drawbacks for attesting patches.
Similarly, it does not provide significant developer identity assurances
for patches sent via large public hosting services like Gmail, Fastmail,
or others -- at best, we have proof that the email traversed their
mail gateways (hopefully, after being properly authenticated).
Proposal
--------
The goal of this document is to propose a scheme that would provide
cryptographic attestation for all message contents necessary for trusted
distributed code collaboration. It draws on the success of the DKIM
standard in order to adapt (and adopt) it for this purpose.
X-Developer-Signature header
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We use DKIM RFC-6376 to implement a compatible subset of it for
developer attestation signatures, with some extra steps taken to make
the workflow fit better with patches sent via DKIM-non-compliant mailing
lists.
Differences from DKIM:
- the d= field is not used (no domain signatures involved)
- the q= field is not used (end-user tooling handles key lookup)
- the c= field is not used (see below for canonicalization)
- the i= field is optional, but MUST be the canonical email address of
the sender, if not the same as the From: field
Canonicalization
~~~~~~~~~~~~~~~~
We use the "relaxed/simple" canonicalization as defined by the DKIM
standard, but the message is first parsed by "git-mailinfo" in order to
achieve the following:
- normalize any content-transfer-encoding modifications (convert back
from base64/quoted-printable/etc into 8-bit)
- use any encountered in-body git headers (From:, Subject: Date:) to
rewrite the outer message headers
- perform any subject-line normalization in order to strip content not
considered by git-am when applying the patch
To achieve this, the message is passed through git-mailinfo with the
following flags::
cat orig.msg | git mailinfo --encoding=utf-8 m p > i
We then use the data found in "i" to replace the From:, Subject: and
Date: headers of the original message, and concatenate "m" and "p" back
together to form the body of the message, which is then normalized using
CRLF line endings and the DKIM "simple" body canonicalization (any
trailing blank lines are removed).
Any other headers included in signing are canonicalized using the
"relaxed" header canonicalization routines defined in the DKIM standard.
In other words, the body and some of the headers are normalized and
reconstituted using the "git-mailinfo" command, and then canonicalized
using DKIM's relaxed/simple standard.
Algorithms
~~~~~~~~~~
DKIM standard mostly relies on RSA signatures, though RFC 8463 extends
it to support ED25519 keys as well. Since our implementation is fully
backward compatible with the DKIM standard, it is possible to use any of
the DKIM-defined algorithms. However, for the purposes of this POC, we
only support the following two signing/hashing algorithms:
- ed25519-sha256: exactly as defined in RFC8463
- openpgp-sha256: uses OpenPGP to create the signature
POC code
--------
The provided POC code in main.py is pretty feature-complete, though it
probably needs further improvements to properly deal with corner-cases.
You will notice that it's only a few hundred lines of Python code and
does not require any external libraries/programs except libsodium and
GnuPG for crypto, plus git for message canonicalization. All of these
are already likely to be present on a developer's workstation.
Running the code
~~~~~~~~~~~~~~~~
The POC code is written in Python and requires PyNaCl libraries
in order to work. Chances are, PyNaCL is already installed on your
platform, but if it isn't, you can install it via a venv::
$ python3 -mvenv .venv
$ source .venv/bin/activate
$ pip install --upgrade pip
$ pip install -r requirements.txt
Or you can achieve the same using OS packaging::
# dnf install python3-pynacl
# apt install python3-nacl
You should also have git and gpg available as external commands in your
PATH.
ED25519 signatures
~~~~~~~~~~~~~~~~~~
ED25519 is the "nothing up my sleeve" implementation of Elliptic-Curve
Cryptography (ECC) favoured by free software enthusiasts. Its primary
benefits are algorithmic speed of all crypto operations and relative
smallness of both public/private keys and generated signatures.
To sign an email using a bundled ed25519 key, run::
$ ./main.py sign-ed25519 -k dev.key
SIGNING : ED25519 using dev.key
MSGSRC : emails/dev-unsigned.eml
--- SIGNED MESSAGE STARTS ---
[...]
X-Developer-Signature: v=1; a=ed25519-sha256; h=from:subject:date:message-id;
l=1003; bh=Pfwl/zDlAoe9nkYNQPcgDFscfSQdrGvx4kAzrnQdNQ8=;
b=WyAu9nzYMUg2ntOfnvEBpa1vLQemK7axjAVu+hhYh6VyeFmB5jKzC2TcF+2IOjfG3eGl/XNY0EWc
HUh2tF02AQwiKDVDG7mTmP1/SPpNvotD0mTWQk6LyltWKFBUpRhn
If you've ever seen email headers, you'll notice how very similar the
X-Developer-Signature is to the DKIM-Signature header.
OpenPGP signatures
~~~~~~~~~~~~~~~~~~
OpenPGP is not really an "algorithm," so this is merely an indicator
that the signature is created using an OpenPGP-compliant application.
Here it is in action, though you will need to use your own PGP key if
you want to try it::
$ ./main.py -m emails/mricon-unsigned.eml sign-pgp -k B6C41CE35664996C
SIGNING : PGP using B6C41CE35664996C
MSGSRC : emails/mricon-unsigned.eml
--- SIGNED MESSAGE STARTS ---
[...]
X-Developer-Signature: v=1; a=openpgp-sha256; h=from:subject:date:message-id;
l=1002; bh=g2Sv1ZR+jIrWukzdXbqb+aeiqyFQOBLDQY6z0BBnGg4=;
b=owGbwMvMwCG27YjM47CUmTmMp9WSGBK6vn316Z1bbjJ5DWNEgimHTc6Kx4HfTpzYcOzp9e/2jc/v
Lg7J7ChlYRDjYJAVU2Qp2xe7KajwoYdceo8pzBxWJpAhDFycAjCRBn5Ghrc/7otaV1yX6I4/sNf056
vmzjen3bn2Rk8X9GTuZd2/aQ0jw7fZJ2Pi36/X2fTK4cSnX/++nbAzsm0TObX4SpbBsrRHe/gA
OpenPGP supports ed25519 keys as well, so in reality the signature is
made with my own ed25519 subkey, but it is further wrapped in the
OpenPGP header data, which is why it is longer than the ed25519
signature in the example above. It is created using the following GnuPG
parameters::
gnupg -s -u KEYID < binary-hash-to-sign
Distributing keys
-----------------
The difficult part of various PKI schemes is not really the
cryptography, but initial trust bootstrap and key distribution. In our
case, we sidestep trust bootstrap entirely and focus solely on developer
key distribution. We propose doing it via the git repository itself,
borrowing the idea from the people behind the did:git project.
Using git to track contributor keys
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider the workflow of a Linux kernel subsystem maintainer. While a
single maintainer may receive patches from hundreds of people, they will
likely have a fairly small subset of developers with whom they
collaborate on an ongoing basis. As their relationship trust builds, the
maintainer may wish to implement an attestation mechanism to verify that
patches submitted by trusted lieutenants are not corrupted or modified
by malicious actors en-route.
The proposed POC offers several ways of achieving this:
- tracking the keys in a regular development branch
- tracking the keys in a special dedicated branch
- tracking the keys in a dedicated git repository
Using the regular development branch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Smaller projects with fewer contributors may simply choose to bundle
developer key distribution as part of its source code. The POC in
question uses the toplevel .keys directory as such location, with the
following structure::
.keys
\- sigtype
\- domain
\- local
\- selector
So, for a ed25519 signature from dev@example.org, the public key needed
for signature verification would be contained in::
.keys
\- ed25519
\- example.org
\- dev
\- default
The "default" filename is used when there is no other s= selector
specified in the signature header.
NB: Since domain/local/selector values are taken from untrusted sources,
they should be urlencoded before attempting to locate the public key on
disk or via any commands passed to "git show".
Using a dedicated ref
~~~~~~~~~~~~~~~~~~~~~
In the case of the project the size of the Linux Kernel, it would be too
onerous to track the keys of all contributors centrally, so individual
subsystem maintainers will likely want to track their own subsets of
keys from just the developers with whom they work on a regular basis.
Using the regular development branch would be too inconvenient in this
case, since it would interfere with upstream work, so it makes sense to
use a separate branch for this purpose, e.g. "refs/heads/keys" that
contains just the keys directory with no other content.
Participating contributors can then submit key additions and changes as
regular patches or pull requests and the maintainer merely needs to
remember to apply them to the proper key management branch.
Using a dedicated git repository
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Similarly, instead of using a dedicated branch, maintainers may choose
to use a wholly separate git repository for this purpose. This may be
useful if the same set of developers work on multiple projects.
Key formats for ED25519 and OpenPGP
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The public keys should be in the following format:
- ed25519: base64-encoded string
- openpgp: any format that can be passed to "gpg --import", but
preferably an ascii-armored key export
In the case of verifying PGP signatures, the POC implementation will
create a temporary keyring containing just the imported key, so it
should never clash with the default keyring.
Using the default GnuPG keyring
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It is up to the implementation whether to fall back to the default GnuPG
keyring when checking openpgp signatures. The POC code will do so and
will additionally warn if the key has insufficient trust (this check is
meaningless for in-git bundled keys, so it is not performed).
Rotating and revoking keys
~~~~~~~~~~~~~~~~~~~~~~~~~~
Keys can be retired or replaced at any time by merely changing them in
the repository, committing, and pushing (or submitting a pull
request/patch to the maintainer with the change). Maintainers can then
pull the change or apply the patch and push it out to all other
participating co-maintainers.
Contributors can have multiple valid keys if they properly specify the
selector when adding signatures -- or the verification tooling can
simply iterate through all keys listed in the directory for that
domain/local to find the matching one.
Revoked keys can be simply deleted or moved into the revoked/
subdirectory with perhaps an explanation why they were revoked.
Verifying keys before accepting them
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As stated earlier, bootstrapping trust remains a hard problem. We do not
aim to resolve it here and will cowardly defer to the participating
maintainers to pick their preferred key verification strategy, e.g.:
- meeting up in person at a conference and exchanging keys
- holding a video session and reciting fingerprints (or entire keys, in
the case of ed25519)
- using an email round-trip as proof of key ownership
This can be as lax or as strict as maintainers choose (though if the
procedure is too lax, then the whole point of cryptographic attestation
becomes moot).
Trusting the git repository
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Obviously, if keys are distributed via git, then one must trust git
itself and the commit provenance. This, again, is a "bootstrapping
trust" sort of problem that we promised to side-step, but we can at
least give the following recommendations:
- the person maintaining the keyring should PGP-sign all commits
modifying public key contents
- the repository itself should initially be cloned from trusted sources
over secure protocols
We hope to provide a separate best-practices document aimed at keyring
maintainers, should this scheme become adopted.
Automating patch attestation
----------------------------
The git-send-email application supports executing a validation hook
before sending out patches. The end-user tooling should provide git hook
integration so that patches are automatically attested every time
"git-send-email" is used.
We aim to provide a lightweight attestation utility for this purpose, as
well as implement all necessary verification routines in "b4"
client-side tooling used by many Linux developers for their patch
workflow.