A large part of the auditing effort revolves around provenance. This is not a common word to be read in the software industry but it relates to discovering the origin and progress of changes (and authors) that were applied to a file until it has arrived to our hands.

Provenance practice is critical to assert that a given piece of work belongs (in terms of copyright) to a given person. In turn, it is the copyright holder who decides in most cases which license terms are applicable to his work.

On this page we share some provenance tips. It is by no means a complete, exhaustive nor authoritarian type of list. It is a set of friendly actions that improve considerably the provenance actions.

Source code snippets

If any developer reading this page has never copied a few lines of code from the Internet onto your own work, please raise your hand! The truth is that we do this kind of action too often. It helps to keep some track of where the snippets are coming from.

Why?
Because later in the future, when some tool compares your code against a giant database of code, it will reveal how that same snippet ended up present across half a dozen projects completely unrelated to your work. This happens because other people found the same snippet of code useful and added it up on their own code. The big problem is that some of these projects will be licensed under strong copyleft terms and this might not really fit your own terms for your work. On the other hand, a code auditor has no idea that you copied the code from the Internet, nor can he tell from where the code has come in the first place. The fact is that foreign code is present in your work and some care should be taken to prevent confusions about its origin from happening.

How?
Add information such as:

  • Where the snippet was copied from (add for example the URL)
  • When it was copied (simple date in ISO format such as YYYY-MM-DD)
  • Who copied the snippet
  • Who you see as author for the snippet (name and link to his profile or email when possible)

Whenever possible, this information should be placed as close as possible on the header of the method (function, procedure or block of code) where the snippet is applied. This helps to pass the message very clearly that part of the code is deriving from another origin and to identify where inside the source code this has happened (in case we need to remove it one day).

I would recommended that you apply a consistent way of including snippets in your code. For example, in Java it is possible to use tags and this eases the identification of such snippets in the future.

Not always is possible but try to determine which license is applicable. For example, on stackoverflow the contributions are under creative commons. To get a list of the license acronyms, consider using the SPDX license list as reference to describe the license in a consistent manner.

Example

/**
* When given an input, replace all http references on text with HTML link
* representations.
*
* @param text the plain text containing the links to be converted
* @return formatted HTML text, links are now clickable
* @origin http://stackoverflow.com/questions/1909534/java-replacing-text-url-with-clickable-html-link
* @copyright Paul Croarkin – http://stackoverflow.com/users/18995/paul-croarkin
* @license CC-BY-SA-3.0
* @retrieved 2013-12-14 by Nuno Brito
*/

public static String textToHtmlConvertingURLsToLinks(String text) {
if (text == null) {
return text;
}
return text.replaceAll(“(\\A|\\s)((http|https|ftp|mailto):\\S+)(\\s|\\z)”,
“$1<a href=\”$2\“>$2</a>$4″);
}

Using code from snippets is not a bad thing by itself. The bad aspect is having code snippets that are not-identified because they force everyone to invest a substantial amount of effort to re-create its provenance and rare times we reach a good level of certainty. So, please mark you snippets to ensure no misunderstandings happen.