Conservancy Blog
Displaying posts tagged licensing
Fighting for the right to repair your electronics - we need your help
by
on May 2, 2022Defending your right to modify and repair the software on your electronics has been a cornerstone of Software Freedom Conservancy since its inception. We defend these rights in a variety of ways: petitioning the Copyright Office to return our repair and modification rights, investigating reports people send us where companies are using our member projects' code but aren't providing the source or repair and modification information that the project's license requires, contacting those companies to remind them of the license requirements, and (eventually, in rare cases after companies ignore our gentle reminders for many months) filing lawsuits against intransigent companies who refuse to give you the complete source and instructions you deserve (and that they are required to provide by the licenses of the software they freely choose to use).
In the rare cases where Software Freedom Conservancy has been forced to move its enforcement actions from gentle reminders to filing lawsuits, we have used a variety of approaches. Our lawsuit filed in 2007 against several manufacturers, used copyright law (specifically copyrights in the BusyBox project) to compel those manufacturers to comply with the GPL (such as Westinghouse). The lawsuit we filed last year against Vizio takes an approach more appropriate for widely marketed and available consumer devices. Namely, the claim in Vizio is a contract claim for third-party beneficiary rights under the GPL, which will allow us (and all other customers who bought Vizio TV's) to receive the repair and modification instructions to the software more directly.
Since we began enforcing the GPL fifteen years ago, the landscape of GPL violations has deteriorated: GPL'd software now appears in nearly every consumer device smarter than a toaster, and very rarely do the manufacturers even bother to offer source code to users — and almost never does the source release meet the requirements of the GPL. As a result, we at Software Freedom Conservancy continue to dedicate more time and resources to our enforcement efforts. We seek to ensure that the situation does not get even worse, and we believe that we can improve the situation even more.
The best approach, in our view, is to continue to bring a variety of different types of actions against intransigent violators. As always, we use litigation and litigation-like means as a last resort, but we've reached that point with dozens of companies. There are a variety of types of actions we could take and lawsuits that we could bring, and different ways we can go about preparing for them. But, to have the full scope of options, we need your help.
As a contributor to copyleft projects, one way that you can help us right now is to assign the copyrights of your software freedom works to Software Freedom Conservancy. As the Vizio suit shows, copyright-based claims will not be the sole focus of our enforcement. However, there are some key types of products where copyright claims are ideal. By assigning your copyrights to us, you can give us the ability to stand up for your software freedom and rights and, more importantly, the rights of your users. While we understand the FOSS community has some aversions to copyright assignment, we also know that, right now, many developers automatically assign their copyrights to their employers without demanding that their employers stand up for the copyleft rights of their users. We ask the community to reconsider this common practice, and request those who haven't already assigned copyright to their employer to assign their copyrights to us, and we urge those who have entered work-for-hire arrangements with employers ask those employers to give them back their copyrights immediately. (See our ContractPatch project for more information on how to do this.)
Today, we launch our self-service Copyright Assignment form. This new form, carefully vetted by our lawyers, allows you to quickly and easily assign your rights in your code, documentation, and other copyrightable works to Software Freedom Conservancy. We will use these copyrights to ensure companies follow the copyleft licenses that they use. You can assign copyrights for projects that are not members of Software Freedom Conservancy too. We will always enforce them in accordance with our Principles, and we will welcome you onto an internal mailing list and regular meetings to discuss our enforcement efforts.
Through the various software freedom lawsuits we have filed over the years, along with the lawsuits we've helped fund, Software Freedom Conservancy has established a track record of tangible enforcement actions.
We are very happy for all the support we've received from software freedom activists, developers, and other community members over the years in our software freedom enforcement actions. We hope you will continue to support us, and encourage others to do so, in whatever ways you can and, if it makes sense for you, by assigning your software freedom works to us so we can ensure the repairability of your electronics (and everyone else's!) going forward.
If Software is My Copilot, Who Programmed My Software?
by
on February 3, 2022Software freedom is our goal. Copyleft is a strategy to reach that goal. That tenet is oft forgotten by activists. Copyleft is even abused to advance proprietary goals. We too often see concern about the future of copyleft overshadow the necessary fundamental question: does a particular behavior or trend — and the inevitable outcomes of those behaviors and trends — increase or decrease users’ rights to copy, share, modify, and reinstall modified versions of their software? That question remains paramount as we face new challenges.
Introduced first by Microsoft’s GitHub in their Copilot product, computer-assisted software authorship by way of machine learning models presents a formidable challenge to software freedom’s future. Yet, we can, in fact, imagine a software freedom utopia that embodies this technology. Imagine that all software authors have access to the global archive of machine learning models — and they are fullly reproducible. Everyone has equal rights to fork these models, train them further with their own datasets, provided that they must release new models (and the input code) freely in the global archive. All code produced by these models is also made freely available under copyleft. All code that builds the models, all historical input sets, and all trained models are all also made available to everyone under copyleft licenses.
While activists might quibble about minor details to optimize imagined utopia, this thought experiment shows computer-assisted software authorship does not inherently negate software freedom. Rather, the rules, requirements, and policies that apply will determine whether software freedom is respected. To paraphrase Hamlet: there is nothing either good or bad, but the policy makes it so.
What’s the Worse That Could Happen?
[They are] not a good [person] who, without a protest, allows wrong to be committed … with the means which [they] help to supply.
— John Stewart Mill, University of St. Andrews, 1 February 1867
Obviously, ignoring machine learning for computer-assisted software authorship will not usher in this software freedom utopia. Copyleft activists cannot stand idly by in this situation, but we must temper our attention by considering the likelihood of dystopian and problematic outcomes, and the options available to prevent them.
In response to Copilot’s announcement, pundits speculated, without evidence, a prevailing feeling of “Free Software had a good run, but I guess that’s over now”. Such predictions seem consistent with the well-documented overoptimism of artificial intelligence success. Rapid replacement of traditional software development methodologies seem unlikely. As such, we should not overestimate the likelihood that these new systems will both accelerate proprietary software development, while we simultaneously fail to prevent copylefted software from enabling that activity. The former may not come to pass, so we should not unduly fret about the latter, lest we misdirect resources. In short, AI is usually slow-moving, and produces incremental change far more often than it produces radical change. The problem is thus not imminent nor the damage irreversible. However, we must respond deliberately with all due celerity — and begin that work immediately.
Currently, there are two factors that influence the timing of our response. First, if GitHub’s Copilot becomes a non-beta product available to the programming public, that would indicate necessity of an urgent response. Microsoft and GitHub are unlikely to share their product plans, so we cannot know for sure when this will occur. However, in the seven months since the first beta was made available, we’ve consistently heard anecdotally that more and more developers (particularly, FOSS developers!) have received beta invitations. Based on these (admittedly incomplete) facts, we must assume that a move from private beta to public deployment is imminent in 2022. This indicates some urgency of the problem.
Second, we already know that some of our worst fears are definitely true. Namely, that Microsoft and GitHub used copylefted software as part of Copilot’s training set.
Copilot was trained on “billions of lines of public code … written by others”. While GitHub has refused requests to release even a list of repositories included in the training set, the use of the word “public” indicates that only software with source-available licenses (even if not FOSS licenses) were input into Copilot. Furthermore, GitHub admits that during training, the system encountered a copy of the GPL more than 700,000 times. This effectively confirms that copylefted public code appears in the training set.
When questioned, former GNOME developer and GitHub CEO0, Nat Friedman, declared publicly “(1) training ML systems on public data is fair use (2) the output belongs to the operator”. Friedman himself, as well as Microsoft and GitHub’s other executives and lawyers, have ignored Software Freedom Conservancy’s requests for clarification and/or evidence supporting these statements.
Meanwhile, GitHub continues to improve this system, trained only on publicly source-available software, and seeks to market it to new users, including those who otherwise use FOSS development tools. Users continue to report gaining access to the beta and are noticing improvements. Microsoft and GitHub’s public position is meanwhile clear: they claim to have no copyleft obligations for training the model, the model itself, and deploying the service. They also believe there are no licensing obligations for the output.
While Friedman ignored the community’s requests publicly, we inquired privately with Friedman0 and other Microsoft and GitHub representatives in June 2021, asking for solid legal references for GitHub’s public legal positions of (1) and (2) above. They provided none, and reiterated, without evidence, that they believed the model does not contain copies of the software, and output produced by Copilot can be licensed under any license. We further asked if there are no licensing concerns on either side, why did Microsoft not also train the system on their large proprietary codebases such as Office? They had no immediate answer. Microsoft and GitHub promised to get back to us, but have not.
This secrecy and non-cooperativeness is expected from a proprietary software company and its subsidiary, but leaves us only with speculative conclusions to inform a strategy for copyleft here. We can reliably guess that the companies will claim “fair use” as their primary justification for creating the model and offering the service, and will argue that both the output and the trained model are not “work[s] based on the Program” (GPLv2) nor do they “copy from or adapt all or part of the work[s] in a fashion requiring copyright permission” (GPLv3/AGPLv3). Furthermore, we can reliably conclude, given the continuing product promotion, that the companies have at least a medium-term commitment to Copilot.
In short, they have already hunkered down for a protracted disagreement. Their positions are now incumbent — using their resources and power to successfully charge copyleft activists to “prove them wrong”. But we do not have to accept their unsubstantiated arguments at face value. In fact, these areas are so substantially novel that almost every issue has no definitive answers, but we must nevertheless begin to formulate our position and our response to Microsoft and GitHub’s assault on copyleft.
Trained Models, Fair Use, and Copyright Infringement
Consider GitHub’s claim that “training ML systems on public data is fair use”. We have not found any case of note — at least in the USA — that truly contemplates that question. The only legal case in the USA to look near this question is Authors Guild v. Google, Inc., 804 F.3d 202 (2d Cir. 2015). The Supreme Court denied certiorari on this case; it is not legal precedent in all jurisdictions where Microsoft and GitHub operate.
Even more, that case considered a fact pattern centered around search, not authorship of new/derived works. Google had made copies of entire copyrighted books, not for the purpose of displaying them, but so users could (1) run search queries, and (2) see a “snippet” of the search hits (i.e., to see the search hit in context). The Second Circuit held Google’s copying of the books was “fair use” because searching and providing context added value exceeding what a user could obtain from their own copies, and Google’s product did not substitute the market for the books.
The analogous fact pattern for code is obvious: GitHub could offer a search tool that assists users in finding key public repositories (and specific lines of code within those repositories) that seemed to solve tasks of interest. Developers could then easily utilitize those codebases in the usual, license-compliant ways. The actual Copilot fact pattern is not this one.
Meanwhile, the Authors Guild case begins and ends the list of major cases regarding machine learning systems and “fair use”. We should simply ignore GitHub’s risible claim that the “fair use question” on machine learning is settled.
Perhaps most importantly, in the USA, “fair use” is an affirmative defense to answer copyright infringement. In concrete terms, that means — particularly in cases where the circumstances are novel — a copyright holder brings an infringement lawsuit and then the alleged infringer shows in court that their actions met the relevant factors for “fair use” sufficiently. Frankly, we refuse to do these companies’ job for them. Copyleft activists need not tell Microsoft and GitHub why this isn’t “fair use”, rather, they need to tell us why training the model with copylefted code is “fair use” and prove that the trained model itself is not a “work based on” the GPL’d software.
GitHub has meanwhile artfully avoided the question of whether the trained model is a “work based on” the input. We contend that it probably is. However, given that “fair use” is an affirmative defense to copyright infringement, they are obviously anticipating a claim that the trained model is, in fact, a “work based on” the inputs to the model. Why else would they even bring up “fair use”, rather than simply say their use is fully non-infringing? Anyway, we have no way to even explore these questions authoritatively without examining the model, fully affixed in its tangible medium. We don’t expect GitHub to produce that unless compelled by a third party.
Indeed, discussion of these questions outside of a courtroom is moot. For this novel and contentious fact pattern, only a court decision can settle the matter adequately. As a strategic matter, copyleft activists should keep their own counsel about what we anticipate in the opposition’s “fair use” and/or non-infringement defenses, and the counter-arguments that we plan.
Copilot Users Should Worry
GitHub’s position does a great disservice to Copilot users. Their claim that “the output belongs to the operator” creates a false sense of legal justification. Users have already shown that Copilot can generate a substantial amount of unique, GPL’d code, and then (rather ironically, given GitHub’s claim that they removed the text of the GPL from the training set) also suggest a license that is non-copyleft. Friedman’s statement surely does not qualify as an indemnity for Copilot users who might face GPL enforcement actions. Users almost surely must construct their own “fair use” or “not copyrightable” defenses for Copilot’s output.
The length and detail of what Copilot can generate for users seems unbounded. The glaring example above appears primia facie to be copyright infringement; we expect further such problems. Consider the sheer amount that a fully functional and successful Copilot would generate. Surely, AI researchers seek the ability for Copilot to “figure out” that you are trying to solve some specific task when programming. The better Copilot gets at handing ready-made solutions to its users, the more likely it becomes that its output may offer the user copylefted software.
Copilot leaves copyleft compliance as an exercise for the user. Users likely face growing liability that only increases as Copilot improves. Users currently have no methods besides serendipity and educated guesses to know whether Copilot’s output is copyrighted by someone else. Proprietary software companies such as Synopsys provide so-called “scanning tools” — that can search your proprietary codebase and find hidden copylefted software. However, the FOSS tools for that job are in their infancy and unlikely to develop quickly, since historically those who want those tools are companies that primarily develop proprietary software and seek to avoid copylefted software.
We recommend users who wish to avoid infringing the copyrights of others simply avoid Copilot.
On Copyleft Maximalism and Unilateral Capitulation
Draconian copyright law generally horrifies software freedom activists for good reason. Nearly all copyleft activists would prefer a true, multilateral rewriting of copyright rules that prioritized the interest of the general public and software rights. Copyleft exists primarily because of the long-standing political non-viability of a copyright law reboot. Nothing has changed in this regard; if anything, changing legislation has become an even more expensive lobbying proposition than it was at copyleft’s advent. Copyleft activists should expect, indefinitely, for proprietary software companies and media oligarchs to control copyright legislation.
Fortunately, copyleft was designed specifically for this eventuality. Activists have called copyleft the “judo move” of software freedom, since copyleft uses the powerful copyright force (invented primarily by our opposition) against itself. That realization leads to a painful, but pragmatically necessary, awkwardness.
The issues herein — from training of machine learning models, to the copyright questions about those models, to the derivation questions about their output — are novel copyright questions. As software freedom activists, we are uniquely qualified to invent an ideal copyright structure for these technologies. But, without a path to promulgate such replacement copyright rules into the incumbent system, that exercise is futile. Furthermore, systems outside of copyright — including but not limited to EULAs, business agreements and patents — have long been used to proprietarize software without the need of copyright. Reality of facts on the ground dictate that we not concede the only wedge we have to compel software freedom; that wedge is copyleft.
Meanwhile, proprietary software companies regularly exploit any unilateral concessions on weakening of copyleft that FOSS projects make, while continuing to pursue copyright maximalism for their works. Particularly in novel areas, we must assume a copyleft maximalist approach — until courts or the legislature disarm all mechanisms to control users’ rights with regard to software. That adversarial process will frustrate us, but ultimately by choosing copyright as our primary tool, we already chose the courts as our battleground for contentious issues.
We all surely have our opinions about how copyleft should operate in these novel situations. We have even expressed some such opinions herein. But, ultimately, strong copyleft licenses do not defer the “what’s covered?” question to one individual or organization. The “judo” power comes from strong copyleft reaching to all of what copyright governs. When those issues are novel — and companies flaunt that novel manipulation of copylefted works — only a court can answer definitively.
A Community-Led Response
While these companies will likely not succeed in their efforts to disarm copyleft, they have nevertheless attacked the entire copyleft infrastructure. We must mount an effective response.
Software Freedom Conservancy has spent the last six months in deep internal discussions about this novel threat to the very efficacy of copyleft. We have a few ideas — a mix of short-term, medium-term and long-term strategies to address the problem. However, we recognize that a community (rather than the traditional BDFL) approach is needed — at least for this problem. Thus, putting first things first, we realized that we should gather the best minds in the software freedom community with direct experience in copyleft theory and practice. We will convene these individuals to a committee specifically chartered by Software Freedom Conservancy to — as quickly as reasonably possible – publish a series of recommendations to the community on how we should respond to both the immediate threat to copyleft found in Copilot, and (long-term) analyze the more general threat that AI-assisted programming techniques pose to the strategy of copyleft.
While we are not actively seeking applications for this committee, we do welcome anyone whom we have not yet solicited to participate to contact us and inquire. We will surely be unable to include everyone who is interested on the committee — either due to Conflicts of Interest or due to simple logistics of creating too large a committee. However, we will carefully consider anyone who expresses bona fide interest to participate.
Finally, as much as can be done during the pandemic using FOSS tools available, we will attempt to convene public discussions as much as possible. We will contemporaneously publish the committee’s minutes publicly. If you’d like to get involved today in public discussions about this issue, please join the mailing we launched today for this topic.
0In November 2021, Nat Friedman was replaced by Thomas Dohmke as GitHub’s CEO. However, to our knowledge, Dohmke has not retracted or clarified Friedman's comments, and at the time of writing, no one from GitHub or Microsoft that we spoke to had responded to our requests for clarification.
First Update on the Vizio lawsuit
by
on November 30, 2021Yesterday, we received from Vizio their first official response in our pending litigation against Vizio for their copyleft license violations. So, what was their response?
Did Vizio release the source code — as the GPL and LGPL require — for the modified versions of Linux, alsa-utils, GNU bash, GNU awk, BusyBox, dmesg, findutils, dmsetup, GNU tar, mount and selinux found in their TV’s firmwares? No.
Did Vizio propose a CCS candidate for us to review, provide them with additional feedback, so that we could help them get consumers who bought their TVs the source code they deserve? Nope.
Did Vizio argue that we had erred, and in fact, none of those programs we list above appear in their firmware? Not that either. (Unlikely though — after all, they surely know those programs are in their firmware!)
Instead, Vizio filed a request to “remove” the case from California State Court (into US federal court), which indicates Vizio's belief that consumers have no third-party beneficiary rights under copyleft! In other words, Vizio’s answer to this complaint is not to comply with the copyleft licenses, but instead imply that Software Freedom Conservancy — and all other purchasers of the devices who might want to assert their right under GPL and LGPL to complete, corresponding source — have no right to even ask for that source code.
That’s right: Vizio’s filing implies that only copyright holders, and no one else, have a right to ask for source code under the GPL and LGPL. While we expected Vizio held this position (since they ultimately ignored us during our discussions with them in years past), Vizio has gone a disturbing step further and asked the federal United States District Court for the Central District of California to agree to the idea that not only do you as a consumer have no right to ask for source code, but that Californians have no right to even ask their state courts to consider the question!
Vizio’s strategy is to deny consumers their rights under copyleft licenses, and we intend to fight back.
We believe in complete transparency of the copyleft compliance process, and so encourage everyone to read the filings. We’ve even paid the Pacer fees and used the Recap browser plugin, so that all the documents in the case are freely available via the Recap project archives.
Software Freedom Conservancy’s annual fundraiser is happening right now! Please help us continue our work by becoming a Sustainer. Donate now and have your donation matched by a group of generous individuals who care deeply about software freedom.
Trump's Social Media Platform and the Affero General Public License (of Mastodon)
by
on October 21, 2021An analysis: Trump's Group has 30 days to remedy the violation, or their rights in the software are permanently terminated
In 2002, we used phrases like “Web 2.0” and “AJAX” to describe the revolution that was happening in web technology for average consumers. This was just before names like Twitter and Facebook became famous worldwide. Web 2.0 was the groundwork infrastructure of the “social media” to come.
As software policy folks, my colleagues and I knew that these technologies were catalysts for change. Software applications, traditionally purchased on media and installed explicitly, were now implicitly installed through web browsers — delivered automatically, or even sometimes run on the user's behalf on someone else's computer. As copyleft activists specifically, we knew that copyleft licensing would have to adjust, too.
In late 2001, I sat and read and reread section 2(c) of the GPLv2. After much thought, I saw how it could be adapted, using the geeky computer science concept called a quine — a program that has a feature to print its own source code for the user. A similar section to GPLv2§2(c) could be written that would assure that every user of a copylefted program on the Internet would be guaranteed the rights and freedoms to copy, modify, redistribute and/or reinstall their software — which was done by offering a source-code provision feature to every user on the network. The key concept behind the Affero GPL (AGPL) version 1 was born. Others drafted and released AGPLv1 based on my idea. Five years later, I was proudly in the “room where it happened” when Affero GPL version 3 was drafted. Some of the words in that section are ones I suggested.
We were imagining a lot about the future in those days; the task of copyleft licensing drafting requires trying to foresee how others might attempt to curtail the software rights and freedoms of others. Predicting the future is difficult and error-prone. Today, a piece of Affero GPLv3's future came to pass that I would not have predicted back in November 2007 at its release.
I invented that network source code disclosure provision of the AGPL — the copyleft license later applied to the Mastodon software — in 2002 in light of that very problem: parties who don't share our values might use (or even contribute to) software written by the FOSS community. The license purposefully treats everyone equally (even people we don't like or agree with), but they must operate under the same rules of the copyleft licenses that apply to everyone else.
Today, we saw the Trump Media and Technology Group ignoring those important rules — which were designed for the social good. Once caught in the act, Trump's Group scrambled and took the site down.
Early evidence strongly supports that Trump's Group publicly launched a so-called “test site” of their “Truth Social” product, based on the AGPLv3'd Mastodon software platform. Many users were able to create accounts and use it — briefly. However, when you put any site on the Internet licensed under AGPLv3, the AGPLv3 requires that you provide (to every user) an opportunity to receive the entire Corresponding Source for the website based on that code. These early users did not receive that source code, and Trump's Group is currently ignoring their very public requests for it. To comply with this important FOSS license, Trump's Group needs to immediately make that Corresponding Source available to all who used the site today while it was live. If they fail to do this within 30 days, their rights and permissions in the software are automatically and permanently terminated. That's how AGPLv3's cure provision works — no exceptions — even if you're a real estate mogul, reality television star, or even a former POTUS.
I and my colleagues at Software Freedom Conservancy are experts at investigating non-compliance with copyleft license and enforcing those licenses once we confirm the violations. We will be following this issue very closely and insisting that Trump's Group give the Corresponding Source to all who use the site.
Finally, it's worth noting that we could find no evidence that someone illegally broke into the website. All the evidence available on the Internet (as of 2021-10-22) indicates that the site was simply deployed live early as a test, and without proper configuration (such as pre-reserving some account names). Once discovered, people merely used the site legitimately to register accounts and use its features.
Update (2021-10-22): Some have asked us how this situation relates to our Principles of Community-Oriented GPL Enforcement, since we are publicly analyzing a copyleft violation publicly. Historically, we did similarly with the Canonical, Ltd., Cambium, Ubiquiti, and Tesla (twice!) violations. We do believe that “confidentiality can increase receptiveness and responsiveness”, but once a story is already made widely known to the public by a third-party, confidentiality is no longer possible, since the public already knows the details. At that moment, the need to educate the public supersedes any value in non-disclosure.