| Header-Based Patch Attestation |
| ============================== |
| Author: Konstantin Ryabitsev <konstantin@linuxfoundation.org> |
| Status: Beta, soliciting comments |
| |
| Preamble |
| -------- |
| Projects participating in decentralized development continue to use |
| RFC-2822 (email) formatted messages for code submissions and review. |
| This remains the only widely accepted mechanism for code collaboration |
| that does not rely on centralized infrastructure maintained by a single |
| entity, which necessarily introduces a single point of dependency and |
| a single point of failure. |
| |
| RFC-2822 formatted messages can be delivered via a variety of means. To |
| name a few of the more common ones: |
| |
| - email |
| - usenet |
| - aggregated archives (e.g. public-inbox) |
| |
| Among these, email remains the most widely used transport mechanism for |
| RFC-2822 messages, most commonly delivered via subscription-based |
| services (mailing lists). |
| |
| Email and end-to-end attestation |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| There are two commonly used standards for cryptographic email |
| attestation: PGP and S/MIME. When it comes to patches sent via email, |
| there are significant drawbacks to both: |
| |
| - Mailing list software may modify email body contents to add |
| subscription information footers, causing message attestation to |
| fail. |
| - Attestation via detached MIME signatures may not be preserved by |
| mailing list software that aggressively quarantines attachments. |
| - Inline PGP attestation generally frustrates developers working with |
| patches due to extra surrounding content and the escaping it |
| performs for strings containing dashes at the start of the line for |
| canonicalization purposes. |
| - Only the body of the message is attested, leaving metadata such as |
| "From", "Subject", and "Date" open to tampering. Git uses this |
| metadata to formulate git commits, so leaving them unattested is |
| suboptimal (they can be duplicated into the body of the message, |
| but git format-patch will not do this by default). |
| - PGP key distribution and trust delegation remains a difficult |
| problem to solve. Even if PGP attestation is available, the |
| developer on the receiving end of the patches may not make any use |
| of it due to not having the sender's key in their keyring. |
| - S/MIME certificates are increasingly difficult to obtain for |
| developers not working in corporate environments. At the time of |
| writing, only two commercial CAs continue to provide this service -- |
| and only one does it for free. |
| |
| For these reasons, end-to-end attestation is rarely used in communities |
| that continue to use email as their main conduit for code submissions |
| and review. |
| |
| Email and domain-level attestation |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| Since unsolicited emails (SPAM) frequently forge headers in order to |
| appear to be coming from trusted sources, most major service providers |
| have adopted DKIM (RFC-6376) to provide cryptographic attestation for |
| header and body contents. A message that originates from gmail.com will |
| contain a "DKIM-Signature" header that attests the contents of the |
| following headers (among others): |
| |
| - from |
| - date |
| - message-id |
| - subject |
| |
| The "DKIM-Signature" header also includes a hash of the message body |
| (bh=) that is included in the final verification hash. When a DKIM |
| signature is successfully verified using a public key that is published |
| via gmail.com DNS records, this provides a degree of assurance that the |
| email message has not been modified since leaving gmail.com |
| infrastructure. |
| |
| Just as PGP and S/MIME attestation, this has important problems when it |
| comes to patches sent via mailing lists: |
| |
| - ML software commonly modifies the subject header in order to insert |
| list identification (e.g. ``[some-topic]``). Since the "subject" |
| header is almost always included into the list of headers attested |
| by DKIM, this causes DKIM signatures to fail verification. |
| - ML software also routinely modifies the message body for the |
| purposes of stripping attachments or inserting list subscription |
| metadata. Since the bh= hash is included in the final signature |
| hash, this results in a failed DKIM signature check. |
| |
| Even if all of the above does not apply and the DKIM signature is |
| successfully verified, body canonicalization routines mandated by the |
| DKIM RFC may result in a false-positive successful attestation for |
| patches. The "relaxed" canonicalization instructs that all consecutive |
| whitespace is collapsed, so patches for languages like Python or GNU |
| Make where whitespace is syntactically significant may have different |
| code result in the same hash. |
| |
| So, while DKIM works well enough for regular domain-level email |
| attestation, it still has significant drawbacks for attesting patches. |
| Similarly, it does not provide significant developer identity assurances |
| for patches sent via large public hosting services like Gmail, Fastmail, |
| or others -- at best, we have proof that the email traversed their |
| mail gateways (hopefully, after being properly authenticated). |
| |
| Proposal |
| -------- |
| The goal of this document is to propose a scheme that would provide |
| cryptographic attestation for all message contents necessary for trusted |
| distributed code collaboration. It draws on the success of the DKIM |
| standard in order to adapt (and adopt) it for this purpose. |
| |
| X-Developer-Signature header |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| We use DKIM RFC-6376 to implement a compatible subset of it for |
| developer attestation signatures, with some extra steps taken to make |
| the workflow fit better with patches sent via DKIM-non-compliant mailing |
| lists. |
| |
| Differences from DKIM: |
| |
| - the d= field is not used (no domain signatures involved) |
| - the q= field is not used (end-user tooling handles key lookup) |
| - the c= field is not used (see below for canonicalization) |
| - the i= field is optional, but MUST be the canonical email address of |
| the sender, if not the same as the From: field |
| |
| Canonicalization |
| ~~~~~~~~~~~~~~~~ |
| We use the "relaxed/simple" canonicalization as defined by the DKIM |
| standard, but the message is first parsed by "git-mailinfo" in order to |
| achieve the following: |
| |
| - normalize any content-transfer-encoding modifications (convert back |
| from base64/quoted-printable/etc into 8-bit) |
| - use any encountered in-body git headers (From:, Subject: Date:) to |
| rewrite the outer message headers |
| - perform any subject-line normalization in order to strip content not |
| considered by git-am when applying the patch |
| |
| To achieve this, the message is passed through git-mailinfo with the |
| following flags:: |
| |
| cat orig.msg | git mailinfo --encoding=utf-8 m p > i |
| |
| We then use the data found in "i" to replace the From:, Subject: and |
| Date: headers of the original message, and concatenate "m" and "p" back |
| together to form the body of the message, which is then normalized using |
| CRLF line endings and the DKIM "simple" body canonicalization (any |
| trailing blank lines are removed). |
| |
| Any other headers included in signing are canonicalized using the |
| "relaxed" header canonicalization routines defined in the DKIM standard. |
| |
| In other words, the body and some of the headers are normalized and |
| reconstituted using the "git-mailinfo" command, and then canonicalized |
| using DKIM's relaxed/simple standard. |
| |
| Algorithms |
| ~~~~~~~~~~ |
| DKIM standard mostly relies on RSA signatures, though RFC 8463 extends |
| it to support ED25519 keys as well. Since our implementation is fully |
| backward compatible with the DKIM standard, it is possible to use any of |
| the DKIM-defined algorithms. However, for the purposes of this POC, we |
| only support the following two signing/hashing algorithms: |
| |
| - ed25519-sha256: exactly as defined in RFC8463 |
| - openpgp-sha256: uses OpenPGP to create the signature |
| |
| POC code |
| -------- |
| The provided POC code in main.py is pretty feature-complete, though it |
| probably needs further improvements to properly deal with corner-cases. |
| You will notice that it's only a few hundred lines of Python code and |
| does not require any external libraries/programs except libsodium and |
| GnuPG for crypto, plus git for message canonicalization. All of these |
| are already likely to be present on a developer's workstation. |
| |
| Running the code |
| ~~~~~~~~~~~~~~~~ |
| The POC code is written in Python and requires PyNaCl libraries |
| in order to work. Chances are, PyNaCL is already installed on your |
| platform, but if it isn't, you can install it via a venv:: |
| |
| $ python3 -mvenv .venv |
| $ source .venv/bin/activate |
| $ pip install --upgrade pip |
| $ pip install -r requirements.txt |
| |
| Or you can achieve the same using OS packaging:: |
| |
| # dnf install python3-pynacl |
| # apt install python3-nacl |
| |
| You should also have git and gpg available as external commands in your |
| PATH. |
| |
| ED25519 signatures |
| ~~~~~~~~~~~~~~~~~~ |
| ED25519 is the "nothing up my sleeve" implementation of Elliptic-Curve |
| Cryptography (ECC) favoured by free software enthusiasts. Its primary |
| benefits are algorithmic speed of all crypto operations and relative |
| smallness of both public/private keys and generated signatures. |
| |
| To sign an email using a bundled ed25519 key, run:: |
| |
| $ ./main.py sign-ed25519 -k dev.key |
| SIGNING : ED25519 using dev.key |
| MSGSRC : emails/dev-unsigned.eml |
| --- SIGNED MESSAGE STARTS --- |
| [...] |
| X-Developer-Signature: v=1; a=ed25519-sha256; h=from:subject:date:message-id; |
| l=1003; bh=Pfwl/zDlAoe9nkYNQPcgDFscfSQdrGvx4kAzrnQdNQ8=; |
| b=WyAu9nzYMUg2ntOfnvEBpa1vLQemK7axjAVu+hhYh6VyeFmB5jKzC2TcF+2IOjfG3eGl/XNY0EWc |
| HUh2tF02AQwiKDVDG7mTmP1/SPpNvotD0mTWQk6LyltWKFBUpRhn |
| |
| If you've ever seen email headers, you'll notice how very similar the |
| X-Developer-Signature is to the DKIM-Signature header. |
| |
| OpenPGP signatures |
| ~~~~~~~~~~~~~~~~~~ |
| OpenPGP is not really an "algorithm," so this is merely an indicator |
| that the signature is created using an OpenPGP-compliant application. |
| Here it is in action, though you will need to use your own PGP key if |
| you want to try it:: |
| |
| $ ./main.py -m emails/mricon-unsigned.eml sign-pgp -k B6C41CE35664996C |
| SIGNING : PGP using B6C41CE35664996C |
| MSGSRC : emails/mricon-unsigned.eml |
| --- SIGNED MESSAGE STARTS --- |
| [...] |
| X-Developer-Signature: v=1; a=openpgp-sha256; h=from:subject:date:message-id; |
| l=1002; bh=g2Sv1ZR+jIrWukzdXbqb+aeiqyFQOBLDQY6z0BBnGg4=; |
| b=owGbwMvMwCG27YjM47CUmTmMp9WSGBK6vn316Z1bbjJ5DWNEgimHTc6Kx4HfTpzYcOzp9e/2jc/v |
| Lg7J7ChlYRDjYJAVU2Qp2xe7KajwoYdceo8pzBxWJpAhDFycAjCRBn5Ghrc/7otaV1yX6I4/sNf056 |
| vmzjen3bn2Rk8X9GTuZd2/aQ0jw7fZJ2Pi36/X2fTK4cSnX/++nbAzsm0TObX4SpbBsrRHe/gA |
| |
| OpenPGP supports ed25519 keys as well, so in reality the signature is |
| made with my own ed25519 subkey, but it is further wrapped in the |
| OpenPGP header data, which is why it is longer than the ed25519 |
| signature in the example above. It is created using the following GnuPG |
| parameters:: |
| |
| gnupg -s -u KEYID < binary-hash-to-sign |
| |
| Distributing keys |
| ----------------- |
| The difficult part of various PKI schemes is not really the |
| cryptography, but initial trust bootstrap and key distribution. In our |
| case, we sidestep trust bootstrap entirely and focus solely on developer |
| key distribution. We propose doing it via the git repository itself, |
| borrowing the idea from the people behind the did:git project. |
| |
| Using git to track contributor keys |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| Consider the workflow of a Linux kernel subsystem maintainer. While a |
| single maintainer may receive patches from hundreds of people, they will |
| likely have a fairly small subset of developers with whom they |
| collaborate on an ongoing basis. As their relationship trust builds, the |
| maintainer may wish to implement an attestation mechanism to verify that |
| patches submitted by trusted lieutenants are not corrupted or modified |
| by malicious actors en-route. |
| |
| The proposed POC offers several ways of achieving this: |
| |
| - tracking the keys in a regular development branch |
| - tracking the keys in a special dedicated branch |
| - tracking the keys in a dedicated git repository |
| |
| Using the regular development branch |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| Smaller projects with fewer contributors may simply choose to bundle |
| developer key distribution as part of its source code. The POC in |
| question uses the toplevel .keys directory as such location, with the |
| following structure:: |
| |
| .keys |
| \- sigtype |
| \- domain |
| \- local |
| \- selector |
| |
| So, for a ed25519 signature from dev@example.org, the public key needed |
| for signature verification would be contained in:: |
| |
| .keys |
| \- ed25519 |
| \- example.org |
| \- dev |
| \- default |
| |
| The "default" filename is used when there is no other s= selector |
| specified in the signature header. |
| |
| NB: Since domain/local/selector values are taken from untrusted sources, |
| they should be urlencoded before attempting to locate the public key on |
| disk or via any commands passed to "git show". |
| |
| Using a dedicated ref |
| ~~~~~~~~~~~~~~~~~~~~~ |
| In the case of the project the size of the Linux Kernel, it would be too |
| onerous to track the keys of all contributors centrally, so individual |
| subsystem maintainers will likely want to track their own subsets of |
| keys from just the developers with whom they work on a regular basis. |
| Using the regular development branch would be too inconvenient in this |
| case, since it would interfere with upstream work, so it makes sense to |
| use a separate branch for this purpose, e.g. "refs/heads/keys" that |
| contains just the keys directory with no other content. |
| |
| Participating contributors can then submit key additions and changes as |
| regular patches or pull requests and the maintainer merely needs to |
| remember to apply them to the proper key management branch. |
| |
| Using a dedicated git repository |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| Similarly, instead of using a dedicated branch, maintainers may choose |
| to use a wholly separate git repository for this purpose. This may be |
| useful if the same set of developers work on multiple projects. |
| |
| Key formats for ED25519 and OpenPGP |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| The public keys should be in the following format: |
| |
| - ed25519: base64-encoded string |
| - openpgp: any format that can be passed to "gpg --import", but |
| preferably an ascii-armored key export |
| |
| In the case of verifying PGP signatures, the POC implementation will |
| create a temporary keyring containing just the imported key, so it |
| should never clash with the default keyring. |
| |
| Using the default GnuPG keyring |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| It is up to the implementation whether to fall back to the default GnuPG |
| keyring when checking openpgp signatures. The POC code will do so and |
| will additionally warn if the key has insufficient trust (this check is |
| meaningless for in-git bundled keys, so it is not performed). |
| |
| Rotating and revoking keys |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| Keys can be retired or replaced at any time by merely changing them in |
| the repository, committing, and pushing (or submitting a pull |
| request/patch to the maintainer with the change). Maintainers can then |
| pull the change or apply the patch and push it out to all other |
| participating co-maintainers. |
| |
| Contributors can have multiple valid keys if they properly specify the |
| selector when adding signatures -- or the verification tooling can |
| simply iterate through all keys listed in the directory for that |
| domain/local to find the matching one. |
| |
| Revoked keys can be simply deleted or moved into the revoked/ |
| subdirectory with perhaps an explanation why they were revoked. |
| |
| Verifying keys before accepting them |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| As stated earlier, bootstrapping trust remains a hard problem. We do not |
| aim to resolve it here and will cowardly defer to the participating |
| maintainers to pick their preferred key verification strategy, e.g.: |
| |
| - meeting up in person at a conference and exchanging keys |
| - holding a video session and reciting fingerprints (or entire keys, in |
| the case of ed25519) |
| - using an email round-trip as proof of key ownership |
| |
| This can be as lax or as strict as maintainers choose (though if the |
| procedure is too lax, then the whole point of cryptographic attestation |
| becomes moot). |
| |
| Trusting the git repository |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| Obviously, if keys are distributed via git, then one must trust git |
| itself and the commit provenance. This, again, is a "bootstrapping |
| trust" sort of problem that we promised to side-step, but we can at |
| least give the following recommendations: |
| |
| - the person maintaining the keyring should PGP-sign all commits |
| modifying public key contents |
| - the repository itself should initially be cloned from trusted sources |
| over secure protocols |
| |
| We hope to provide a separate best-practices document aimed at keyring |
| maintainers, should this scheme become adopted. |
| |
| Automating patch attestation |
| ---------------------------- |
| The git-send-email application supports executing a validation hook |
| before sending out patches. The end-user tooling should provide git hook |
| integration so that patches are automatically attested every time |
| "git-send-email" is used. |
| |
| We aim to provide a lightweight attestation utility for this purpose, as |
| well as implement all necessary verification routines in "b4" |
| client-side tooling used by many Linux developers for their patch |
| workflow. |