| .. _code-provenance: |
| |
| Code provenance |
| =============== |
| |
| Certifying patch submissions |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| The QEMU community **mandates** all contributors to certify provenance of |
| patch submissions they make to the project. To put it another way, |
| contributors must indicate that they are legally permitted to contribute to |
| the project. |
| |
| Certification is achieved with a low overhead by adding a single line to the |
| bottom of every git commit:: |
| |
| Signed-off-by: YOUR NAME <YOUR@EMAIL> |
| |
| The addition of this line asserts that the author of the patch is contributing |
| in accordance with the clauses specified in the |
| `Developer's Certificate of Origin <https://developercertificate.org>`__: |
| |
| .. _dco: |
| |
| Developer's Certificate of Origin 1.1 |
| |
| By making a contribution to this project, I certify that: |
| |
| (a) The contribution was created in whole or in part by me and I |
| have the right to submit it under the open source license |
| indicated in the file; or |
| |
| (b) The contribution is based upon previous work that, to the best |
| of my knowledge, is covered under an appropriate open source |
| license and I have the right under that license to submit that |
| work with modifications, whether created in whole or in part |
| by me, under the same open source license (unless I am |
| permitted to submit under a different license), as indicated |
| in the file; or |
| |
| (c) The contribution was provided directly to me by some other |
| person who certified (a), (b) or (c) and I have not modified |
| it. |
| |
| (d) I understand and agree that this project and the contribution |
| are public and that a record of the contribution (including all |
| personal information I submit with it, including my sign-off) is |
| maintained indefinitely and may be redistributed consistent with |
| this project or the open source license(s) involved. |
| |
| The name used with "Signed-off-by" does not need to be your legal name, nor |
| birth name, nor appear on any government ID. It is the identity you choose to |
| be known by in the community, but should not be anonymous, nor misrepresent |
| whom you are. |
| |
| It is generally expected that the name and email addresses used in one of the |
| ``Signed-off-by`` lines, matches that of the git commit ``Author`` field. |
| It's okay if you subscribe or contribute to the list via more than one |
| address, but using multiple addresses in one commit just confuses |
| things. |
| |
| If the person sending the mail is not one of the patch authors, they are |
| nonetheless expected to add their own ``Signed-off-by`` to comply with the |
| DCO clause (c). |
| |
| Multiple authorship |
| ~~~~~~~~~~~~~~~~~~~ |
| |
| It is not uncommon for a patch to have contributions from multiple authors. In |
| this scenario, git commits will usually be expected to have a ``Signed-off-by`` |
| line for each contributor involved in creation of the patch. Some edge cases: |
| |
| * The non-primary author's contributions were so trivial that they can be |
| considered not subject to copyright. In this case the secondary authors |
| need not include a ``Signed-off-by``. |
| |
| This case most commonly applies where QEMU reviewers give short snippets |
| of code as suggested fixes to a patch. The reviewers don't need to have |
| their own ``Signed-off-by`` added unless their code suggestion was |
| unusually large, but it is common to add ``Suggested-by`` as a credit |
| for non-trivial code. |
| |
| * Both contributors work for the same employer and the employer requires |
| copyright assignment. |
| |
| It can be said that in this case a ``Signed-off-by`` is indicating that |
| the person has permission to contribute from their employer who is the |
| copyright holder. It is nonetheless still preferable to include a |
| ``Signed-off-by`` for each contributor, as in some countries employees are |
| not able to assign copyright to their employer, and it also covers any |
| time invested outside working hours. |
| |
| When multiple ``Signed-off-by`` tags are present, they should be strictly kept |
| in order of authorship, from oldest to newest. |
| |
| Other commit tags |
| ~~~~~~~~~~~~~~~~~ |
| |
| While the ``Signed-off-by`` tag is mandatory, there are a number of other tags |
| that are commonly used during QEMU development: |
| |
| * **``Reviewed-by``**: when a QEMU community member reviews a patch on the |
| mailing list, if they consider the patch acceptable, they should send an |
| email reply containing a ``Reviewed-by`` tag. Subsystem maintainers who |
| review a patch should add this even if they are also adding their |
| ``Signed-off-by`` to the same commit. |
| |
| * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch that |
| touches their subsystem, but intends to allow a different maintainer to |
| queue it and send a pull request, they would send a mail containing a |
| ``Acked-by`` tag. Where a patch touches multiple subsystems, ``Acked-by`` |
| only implies review of the maintainers' own areas of responsibility. If a |
| maintainer wants to indicate they have done a full review they should use |
| a ``Reviewed-by`` tag. |
| |
| * **``Tested-by``**: when a QEMU community member has functionally tested the |
| behaviour of the patch in some manner, they should send an email reply |
| containing a ``Tested-by`` tag. |
| |
| * **``Reported-by``**: when a QEMU community member reports a problem via the |
| mailing list, or some other informal channel that is not the issue tracker, |
| it is good practice to credit them by including a ``Reported-by`` tag on |
| any patch fixing the issue. When the problem is reported via the GitLab |
| issue tracker, however, it is sufficient to just include a link to the |
| issue. |
| |
| * **``Suggested-by``**: when a reviewer or other 3rd party makes non-trivial |
| suggestions for how to change a patch, it is good practice to credit them |
| by including a ``Suggested-by`` tag. |
| |
| Subsystem maintainer requirements |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| When a subsystem maintainer accepts a patch from a contributor, in addition to |
| the normal code review points, they are expected to validate the presence of |
| suitable ``Signed-off-by`` tags. |
| |
| At the time they queue the patch in their subsystem tree, the maintainer |
| **must** also then add their own ``Signed-off-by`` to indicate that they have |
| done the aforementioned validation. This is in addition to any of their own |
| ``Reviewed-by`` tags the subsystem maintainer may wish to include. |
| |
| When the maintainer modifies the patch after pulling into their tree, they |
| should record their contribution. This is typically done via a note in the |
| commit message, just prior to the maintainer's ``Signed-off-by``:: |
| |
| Signed-off-by: Cory Contributor <cory.contributor@example.com> |
| [Comment rephrased for clarity] |
| Signed-off-by: Mary Maintainer <mary.maintainer@mycorp.test> |
| |
| |
| Tools for adding ``Signed-off-by`` |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| There are a variety of ways tools can support adding ``Signed-off-by`` tags |
| for patches, avoiding the need for contributors to manually type in this |
| repetitive text each time. |
| |
| git commands |
| ^^^^^^^^^^^^ |
| |
| When creating, or amending, a commit the ``-s`` flag to ``git commit`` will |
| append a suitable line matching the configured git author details. |
| |
| If preparing patches using the ``git format-patch`` tool, the ``-s`` flag can |
| be used to append a suitable line in the emails it creates, without modifying |
| the local commits. Alternatively to modify all the local commits on a branch:: |
| |
| git rebase master -x 'git commit --amend --no-edit -s' |
| |
| emacs |
| ^^^^^ |
| |
| In the file ``$HOME/.emacs.d/abbrev_defs`` add: |
| |
| .. code:: elisp |
| |
| (define-abbrev-table 'global-abbrev-table |
| '( |
| ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1) |
| ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1) |
| ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1) |
| ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1) |
| )) |
| |
| with this change, if you type (for example) ``8rev`` followed by ``<space>`` |
| or ``<enter>`` it will expand to the whole phrase. |
| |
| vim |
| ^^^ |
| |
| In the file ``$HOME/.vimrc`` add:: |
| |
| iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr> |
| iabbrev 8ack Acked-by: YOUR NAME <your@email.addr> |
| iabbrev 8test Tested-by: YOUR NAME <your@email.addr> |
| iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr> |
| |
| with this change, if you type (for example) ``8rev`` followed by ``<space>`` |
| or ``<enter>`` it will expand to the whole phrase. |
| |
| Re-starting abandoned work |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| For a variety of reasons there are some patches that get submitted to QEMU but |
| never merged. An unrelated contributor may decide (months or years later) to |
| continue working from the abandoned patch and re-submit it with extra changes. |
| |
| The general principles when picking up abandoned work are: |
| |
| * Continue to credit the original author for their work, by maintaining their |
| original ``Signed-off-by`` |
| * Indicate where the original patch was obtained from (mailing list, bug |
| tracker, author's git repo, etc) when sending it for review |
| * Acknowledge the extra work of the new contributor by including their |
| ``Signed-off-by`` in the patch in addition to the orignal author's |
| * Indicate who is responsible for what parts of the patch. This is typically |
| done via a note in the commit message, just prior to the new contributor's |
| ``Signed-off-by``:: |
| |
| Signed-off-by: Some Person <some.person@example.com> |
| [Rebased and added support for 'foo'] |
| Signed-off-by: New Person <new.person@mycorp.test> |
| |
| In complicated cases, or if otherwise unsure, ask for advice on the project |
| mailing list. |
| |
| It is also recommended to attempt to contact the original author to let them |
| know you are interested in taking over their work, in case they still intended |
| to return to the work, or had any suggestions about the best way to continue. |
| |
| Inclusion of generated files |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Files in patches contributed to QEMU are generally expected to be provided |
| only in the preferred format for making modifications. The implication of |
| this is that the output of code generators or compilers is usually not |
| appropriate to contribute to QEMU. |
| |
| For reasons of practicality there are some exceptions to this rule, where |
| generated code is permitted, provided it is also accompanied by the |
| corresponding preferred source format. This is done where it is impractical |
| to expect those building QEMU to run the code generation or compilation |
| process. A non-exhaustive list of examples is: |
| |
| * Images: where an bitmap image is created from a vector file it is common |
| to include the rendered bitmaps at desired resolution(s), since subtle |
| changes in the rasterization process / tools may affect quality. The |
| original vector file is expected to accompany any generated bitmaps. |
| |
| * Firmware: QEMU includes pre-compiled binary ROMs for a variety of guest |
| firmwares. When such binary ROMs are contributed, the corresponding source |
| must also be provided, either directly, or through a git submodule link. |
| |
| * Dockerfiles: the majority of the dockerfiles are automatically generated |
| from a canonical list of build dependencies maintained in tree, together |
| with the libvirt-ci git submodule link. The generated dockerfiles are |
| included in tree because it is desirable to be able to directly build |
| container images from a clean git checkout. |
| |
| * eBPF: QEMU includes some generated eBPF machine code, since the required |
| eBPF compilation tools are not broadly available on all targetted OS |
| distributions. The corresponding eBPF C code for the binary is also |
| provided. This is a time-limited exception until the eBPF toolchain is |
| sufficiently broadly available in distros. |
| |
| In all cases above, the existence of generated files must be acknowledged |
| and justified in the commit that introduces them. |
| |
| Tools which perform changes to existing code with deterministic algorithmic |
| manipulation, driven by user specified inputs, are not generally considered |
| to be "generators". |
| |
| For instance, using Coccinelle to convert code from one pattern to another |
| pattern, or fixing documentation typos with a spell checker, or transforming |
| code using sed / awk / etc, are not considered to be acts of code |
| generation. Where an automated manipulation is performed on code, however, |
| this should be declared in the commit message. |
| |
| At times contributors may use or create scripts/tools to generate an initial |
| boilerplate code template which is then filled in to produce the final patch. |
| The output of such a tool would still be considered the "preferred format", |
| since it is intended to be a foundation for further human authored changes. |
| Such tools are acceptable to use, provided there is clearly defined copyright |
| and licensing for their output. Note in particular the caveats applying to AI |
| content generators below. |
| |
| Use of AI content generators |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| TL;DR: |
| |
| **Current QEMU project policy is to DECLINE any contributions which are |
| believed to include or derive from AI generated content. This includes |
| ChatGPT, Claude, Copilot, Llama and similar tools.** |
| |
| The increasing prevalence of AI-assisted software development results in a |
| number of difficult legal questions and risks for software projects, including |
| QEMU. Of particular concern is content generated by `Large Language Models |
| <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs). |
| |
| The QEMU community requires that contributors certify their patch submissions |
| are made in accordance with the rules of the `Developer's Certificate of |
| Origin (DCO) <dco>`. |
| |
| To satisfy the DCO, the patch contributor has to fully understand the |
| copyright and license status of content they are contributing to QEMU. With AI |
| content generators, the copyright and license status of the output is |
| ill-defined with no generally accepted, settled legal foundation. |
| |
| Where the training material is known, it is common for it to include large |
| volumes of material under restrictive licensing/copyright terms. Even where |
| the training material is all known to be under open source licenses, it is |
| likely to be under a variety of terms, not all of which will be compatible |
| with QEMU's licensing requirements. |
| |
| How contributors could comply with DCO terms (b) or (c) for the output of AI |
| content generators commonly available today is unclear. The QEMU project is |
| not willing or able to accept the legal risks of non-compliance. |
| |
| The QEMU project thus requires that contributors refrain from using AI content |
| generators on patches intended to be submitted to the project, and will |
| decline any contribution if use of AI is either known or suspected. |
| |
| This policy does not apply to other uses of AI, such as researching APIs or |
| algorithms, static analysis, or debugging, provided their output is not to be |
| included in contributions. |
| |
| Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's |
| ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content |
| generation agents which are built on top of such tools. |
| |
| This policy may evolve as AI tools mature and the legal situation is |
| clarifed. In the meanwhile, requests for exceptions to this policy will be |
| evaluated by the QEMU project on a case by case basis. To be granted an |
| exception, a contributor will need to demonstrate clarity of the license and |
| copyright status for the tool's output in relation to its training model and |
| code, to the satisfaction of the project maintainers. |