Provenance of source code snippets
Provenance practice is critical to assert that a given piece of work belongs (in terms of copyright) to a given person.
Provenance practice is critical to assert that a given piece of work belongs (in terms of copyright) to a given person.
A large part of the auditing effort revolves around provenance. This is not a common word to be read in the software industry but it relates to discovering the origin and progress of changes (and authors) that were applied to a file until it has arrived to our hands.
Provenance practice is critical to assert that a given piece of work belongs (in terms of copyright) to a given person. In turn, it is the copyright holder who decides in most cases which license terms are applicable to his work.
On this page we share some provenance tips. It is by no means a complete, exhaustive nor authoritarian type of list. It is a set of friendly actions that improve considerably the provenance actions.
If any developer reading this page has never copied a few lines of code from the Internet onto your own work, please raise your hand! The truth is that we do this kind of action too often. It helps to keep some track of where the snippets are coming from.
Why?
Because later in the future, when some tool compares your code
against a giant database of code, it will reveal how that same
snippet ended up present across half a dozen projects completely
unrelated to your work. This happens because other people found
the same snippet of code useful and added it up on their own code.
The big problem is that some of these projects will be licensed
under strong copyleft terms and this might not really fit your
own terms for your work. On the other hand, a code auditor has no
idea that you copied the code from the Internet, nor can he tell
from where the code has come in the first place. The fact is that
foreign code is present in your work and some care should be taken
to prevent confusions about its origin from happening.
How?
Add information such as:
Whenever possible, this information should be placed as close as possible on the header of the method (function, procedure or block of code) where the snippet is applied. This helps to pass the message very clearly that part of the code is deriving from another origin and to identify where inside the source code this has happened (in case we need to remove it one day).
I would recommended that you apply a consistent way of including snippets in your code. For example, in Java it is possible to use tags and this eases the identification of such snippets in the future.
Not always is possible but try to determine which license is applicable. For example, on stackoverflow the contributions are under creative commons. To get a list of the license acronyms, consider using the SPDX license list as reference to describe the license in a consistent manner.
/**
* When given an input, replace all http references on text with HTML link
* representations.
*
* @param text the plain text containing the links to be converted
* @return formatted HTML text, links are now clickable
* @origin http://stackoverflow.com/questions/1909534/java-replacing-text-url-with-clickable-html-link
* @copyright Paul Croarkin – http://stackoverflow.com/users/18995/paul-croarkin
* @license CC-BY-SA-3.0
* @retrieved 2013-12-14 by Nuno Brito
*/
public static String textToHtmlConvertingURLsToLinks(String text) {
if (text == null) {
return text;
}
return text.replaceAll(“(\\A|\\s)((http|https|ftp|mailto):\\S+)(\\s|\\z)”,
“$1<a href=\”$2\“>$2</a>$4″);
}
Using code from snippets is not a bad thing by itself. The bad aspect is having code snippets that are not-identified because they force everyone to invest a substantial amount of effort to re-create its provenance and rare times we reach a good level of certainty. So, please mark you snippets to ensure no misunderstandings happen.