Conservancy Blog

Displaying posts tagged licensing

Give Up GitHub: The Time Has Come!

by Bradley M. Kühn and Denver Gingerich on June 30, 2022

Those who forget history often inadvertently repeat it. Some of us recall that twenty-one years ago, the most popular code hosting site, a fully Free and Open Source (FOSS) site called SourceForge, proprietarized all their code — never to make it FOSS again. Major FOSS projects slowly left SourceForge since it was now, itself, a proprietary system, and antithetical to FOSS. FOSS communities learned that it was a mistake to allow a for-profit, proprietary software company to become the dominant FOSS collaborative development site. SourceForge slowly collapsed after the DotCom crash, and today, SourceForge still refuses to solve these problems⁰. We learned a valuable lesson that was a bit too easy to forget — especially when corporate involvement manipulates FOSS communities to its own ends. We now must learn the SourceForge lesson again with Microsoft's GitHub.

GitHub has, in the last ten years, risen to dominate FOSS development. They did this by building a user interface and adding social interaction features to the existing Git technology. (For its part, Git was designed specifically to make software development distributed without a centralized site.) In the central irony, GitHub succeeded where SourceForge failed: they have convinced us to promote and even aid in the creation of a proprietary system that exploits FOSS. GitHub profits from those proprietary products (sometimes from customers who use it for problematic activities). Specifically, GitHub profits primarily from those who wish to use GitHub tools for in-house proprietary software development. Yet, GitHub comes out again and again seeming like a good actor — because they point to their largess in providing services to so many FOSS endeavors. But we've learned from the many gratis offerings in Big Tech: if you aren't the customer, you're the product. The FOSS development methodology is GitHub's product, which they've proprietarized and repackaged with our active (if often unwitting) help.

FOSS developers have been for too long the proverbial frog in slowly boiling water. GitHub's behavior has gotten progressively worse, and we've excused, ignored, or otherwise acquiesced to cognitive dissonance. We at Software Freedom Conservancy have ourselves been part of the problem; until recently, even we'd become too comfortable, complacent, and complicit with GitHub. Giving up GitHub will require work, sacrifice and may take a long time, even for us: we at Software Freedom Conservancy historically self-hosted our primary Git repositories, but we did use GitHub as a mirror. We urged our member projects and community members to avoid GitHub (and all proprietary software development services and infrastructure), but this was not enough. Today, we take a stronger stance. We are ending all our own uses of GitHub, and announcing a long-term plan to assist FOSS projects to migrate away from GitHub. While we will not mandate our existing member projects to move at this time, we will no longer accept new member projects that do not have a long-term plan to migrate away from GitHub. We will provide resources to support any of our member projects that choose to migrate, and help them however we can.

There are so many good reasons to give up on GitHub, and we list the major ones on our Give Up On GitHub site. We were already considering this action ourselves for some time, but last week's event showed that this action is overdue.

Specifically, we at Software Freedom Conservancy have been actively communicating with Microsoft and their GitHub subsidiary about our concerns with “Copilot” since they first launched it almost exactly a year ago. Our initial video chat call (in July 2021) with Microsoft and GitHub representatives resulted in several questions which they said they could not answer at that time, but would “answer soon”. After six months of no response, Bradley published his essay, If Software is My Copilot, Who Programmed My Software? — which raised these questions publicly. Still, GitHub did not answer our questions. Three weeks later, we launched a committee of experts to consider the moral implications of AI-assisted software, along with a parallel public discussion. We invited Microsoft and GitHub representives to the public discussion, and they ignored our invitation. Last week, after we reminded GitHub of (a) the pending questions that we'd waited a year for them to answer and (b) of their refusal to join public discussion on the topic, they responded a week later, saying they would not join any public nor private discussion on this matter because “a broader conversation [about the ethics of AI-assisted software] seemed unlikely to alter your [SFC's] stance, which is why we [GitHub] have not responded to your [SFC's] detailed questions”. In other words, GitHub's final position on Copilot is: if you disagree with GitHub about policy matters related to Copilot, then you don't deserve a reply from Microsoft or GitHub. They only will bother to reply if they think they can immediately change your policy position to theirs. But, Microsoft and GitHub will leave you hanging for a year before they'll tell you that!

Nevertheless, we were previously content to leave all this low on the priority list — after all, for its first year of existence, Copilot appeared to be more research prototype than product. Facts changed last week when GitHub announced Copilot as a commercial, for-profit product. Launching a for-profit product that disrespects the FOSS community in the way Copilot does simply makes the weight of GitHub's bad behavior too much to bear.

Our three primary questions for Microsoft/GitHub (i.e., the questions they had been promising answers to us for a year, and that they now formally refused to answer) regarding Copilot were:

What case law, if any, did you rely on in Microsoft & GitHub's public claim, stated by GitHub's (then) CEO, that: “(1) training ML systems on public data is fair use, (2) the output belongs to the operator, just like with a compiler”? In the interest of transparency and respect to the FOSS community, please also provide the community with your full legal analysis on why you believe that these statements are true.
We think that we can now take Microsoft and GitHub's refusal to answer as an answer of its own: they obviously stand by their former CEO's statement (the only one they've made on the subject), and simply refuse to justify their unsupported legal theory to the community with actual legal analysis.
If it is, as you claim, permissible to train the model (and allow users to generate code based on that model) on any code whatsoever and not be bound by any licensing terms, why did you choose to only train Copilot's model on FOSS? For example, why are your Microsoft Windows and Office codebases not in your training set?
Microsoft and GitHub's refusal to answer also hints at the real answer to this question, too: While GitHub gladly exploits FOSS inappropriately, they value their own “intellectual property” much more highly than FOSS, and are content to ignore and erode the rights of FOSS users but not their own.
Can you provide a list of licenses, including names of copyright holders and/or names of Git repositories, that were in the training set used for Copilot? If not, why are you withholding this information from the community?
We can only wildly speculate as to why they refuse to answer this question. However, good science practices would mean that they could answer that question in any event. (Good scientists take careful notes about the exact inputs to their experiments.) Since GitHub refuses to answer, our best guess is that they don't have the ability to carefully reproduce their resulting model, so they don't actually know the answer to whose copyrights they infringed and when and how.

As a result of GitHub's bad actions, today we call on all FOSS developers to leave GitHub. We acknowledge that answering that call requires sacrifice and great inconvenience, and will take much time to accomplish. Yet, refusing GitHub's services is the primary power developers have to send a strong message to GitHub and Microsoft about their bad behavior. GitHub's business model has always been “proprietary vendor lock-in”. That's the very behavior FOSS was founded to curtail, and it's why quitting incumbent proprietary software in favor of a FOSS solution is often difficult. But remember: GitHub needs FOSS projects to use their proprietary infrastructure more than we need their proprietary infrastructure. Alternatives exist, albeit with less familiar interfaces and on less popular websites — but we can also help improve those alternatives. And, if you join us, you will not be alone. We've launched a website, GiveUpGitHub.org, where we'll provide tips, ideas, methods, tools and support to those that wish to leave GitHub with us. Watch that site and our blog throughout 2022 (and beyond!) for more.

Most importantly, we are committed to offering alternatives to projects that don't yet have another place to go. We will be announcing more hosting instance options, and a guide for replacing GitHub services in the coming weeks. If you're ready to take on the challenge now and give up GitHub today, we note that CodeBerg, which is based on Gitea implements many (although not all) of GitHub. Thus, we're also going to work on even more solutions, continue to vet other FOSS options, and publish and/or curate guides on (for example) how to deploy a self-hosted instance of the GitLab Community Edition.

Meanwhile, the work of our committee continues to carefully study the general question of AI-assisted software development tools. One recent preliminary finding was that AI-assisted software development tools can be constructed in a way that by-default respects FOSS licenses. We will continue to support the committee as they explore that idea further, and, with their help, we are actively monitoring this novel area of research. While Microsoft's GitHub was the first mover in this area, by way of comparison, early reports suggest that Amazon's new CodeWhisperer system (also launched last week) seeks to provide proper attribution and licensing information for code suggestions¹.

This harkens to long-standing problems with GitHub, and the central reason why we must together give up on GitHub. We've seen with Copilot, with GitHub's core hosting service, and in nearly every area of endeavor, GitHub's behavior is substantially worse than that of their peers. We don't believe Amazon, Atlassian, GitLab, or any other for-profit hoster are perfect actors. However, a relative comparison of GitHub's behavior to those of its peers shows that GitHub's behavior is much worse. GitHub also has a record of ignoring, dismissing and/or belittling community complaints on so many issues, that we must urge all FOSS developers to leave GitHub as soon as they can. Please, join us in our efforts to return to a world where FOSS is developed using FOSS.

We expect this particular blog post will generate a lot of discussion. We welcome you to interact with SFC staff on our public mailing list about this effort.

Footnotes

⁰SourceForge is now built as a (apparently proprietary) fork of a different FOSS system (called Allura). SourceForge's CEO ignored our multiple inquiries asking if SourceForge really is running upstream Allura (i.e., has no proprietary modifications), and our repeated requests for a link that explains how a project can leave SourceForge for self-hosted Allura. The responses from SourceForge management were quite similar to those received since 2001 — when they first went proprietary.

¹However, we have not analyzed CodeWhisperer in depth so we cannot say for sure if Amazon's implementation is compliant with the respective licenses. Nevertheless, Amazon's behavior here shows sharp contrast with Microsoft's GitHub: Amazon acknowledges the obvious fact that there are license obligations that deserve attention and care when building AI-assisted programming solutions.

[permalink]

Tags: conservancy, GPL, Git, licensing, FOSS Sustainability

A Federal Hearing about Rights under GPL

by Bradley M. Kühn on May 11, 2022

Possible Opportunity for the Public To Hear Oral Arguments in Key GPL Enforcement Case

In our previous update regarding our copyleft enforcement lawsuit against Vizio, we talked about how Vizio “removed” the case to USA federal court (namely, the Central District of California), and how we filed a motion to “remand” the case back to state court. While this all seems like minor legal wrangling early in a case, this very first skirmish in our case goes to the very heart of the right for software repair for consumers. While it won't be a final decision in the case, this motion will be the first indication whether the federal courts view the GPL as purely a copyright license, or as a contract, or as both. That question has been central to legal debate about the GPL for decades, and, thanks to our case, for the first time, a federal Court will directly consider this question.

Our view (and the view of many attorneys whose opinions we trust) and which is supported by substantial case law, is that the GPL functions as both a copyright license and a contract, and that third parties who receive distribution of GPL'd (and LGPL'd) software are third-party beneficiaries. We've done both copyright-based and contract-based enforcement, and both have their advantages. Contract-based enforcement as a third-party has advantages that are central to the GPL's policy goals. Consumers are the first to discover violations in the first place. Consumers are the most likely to utilize complete, corresponding source code (CCS) to enhance their use of the products they have purchased. Third-party, contractual based enforcement gives consumers legal authority when they ask companies for access to the source code that should be available to them. In other words, this approach gives consumers the ability to ask the Court directly for the most important thing that copyleft assures: a right to receive the CCS and “the scripts used to control compilation and installation of the executable”. Indeed, in our suit we have asked only for access to the source code, not for any money.

Our case now is the first of its kind to adjudicate the third-party beneficiary contractual theory. We are excited that a federal district Court is poised to give its first answer to the central question to this endeavor, namely: “Are the GPL and LGPL merely copyright licenses, and thus preempted and only subject matter for the US federal courts, or can a third-party bring a contract claim in state court?” If this question intrigues you, we encourage you to read our motion for remand, Vizio's reply to that motion and our rebuttal reply.

Most importantly, clear your calendar for this Friday 13 May 2022 at 10:30 US/Pacific! While Judge Staton may chose to rule on this motion strictly based on those paper filings, the judge has scheduled a hearing for that date and time. What's more, anyone in the world can attend this hearing to listen! Instructions for how to attend are found on Judge Staton's website ⁰.

While, as FOSS activists, we're very sad that the Judge has chosen to use a proprietary videochat platform, we're glad that PSTN dial-in is provided, and we'll be dialing in and encourage you to do so as well. Watch our microblog for live updates!

⁰ Please take careful note of the warning on the Judge's website: Recording, copying, photographing and rebroadcasting of court proceedings is prohibited by federal law. Remember: you can take as many notes as you like, and even live blog/microblog what you hear, but take great care to follow the directives on Judge Staton's website.

[permalink]

Tags: conservancy, GPL, licensing

Fighting for the right to repair your electronics - we need your help

by Denver Gingerich on May 2, 2022

Defending your right to modify and repair the software on your electronics has been a cornerstone of Software Freedom Conservancy since its inception. We defend these rights in a variety of ways: petitioning the Copyright Office to return our repair and modification rights, investigating reports people send us where companies are using our member projects' code but aren't providing the source or repair and modification information that the project's license requires, contacting those companies to remind them of the license requirements, and (eventually, in rare cases after companies ignore our gentle reminders for many months) filing lawsuits against intransigent companies who refuse to give you the complete source and instructions you deserve (and that they are required to provide by the licenses of the software they freely choose to use).

In the rare cases where Software Freedom Conservancy has been forced to move its enforcement actions from gentle reminders to filing lawsuits, we have used a variety of approaches. Our lawsuit filed in 2007 against several manufacturers, used copyright law (specifically copyrights in the BusyBox project) to compel those manufacturers to comply with the GPL (such as Westinghouse). The lawsuit we filed last year against Vizio takes an approach more appropriate for widely marketed and available consumer devices. Namely, the claim in Vizio is a contract claim for third-party beneficiary rights under the GPL, which will allow us (and all other customers who bought Vizio TV's) to receive the repair and modification instructions to the software more directly.

Since we began enforcing the GPL fifteen years ago, the landscape of GPL violations has deteriorated: GPL'd software now appears in nearly every consumer device smarter than a toaster, and very rarely do the manufacturers even bother to offer source code to users — and almost never does the source release meet the requirements of the GPL. As a result, we at Software Freedom Conservancy continue to dedicate more time and resources to our enforcement efforts. We seek to ensure that the situation does not get even worse, and we believe that we can improve the situation even more.

The best approach, in our view, is to continue to bring a variety of different types of actions against intransigent violators. As always, we use litigation and litigation-like means as a last resort, but we've reached that point with dozens of companies. There are a variety of types of actions we could take and lawsuits that we could bring, and different ways we can go about preparing for them. But, to have the full scope of options, we need your help.

As a contributor to copyleft projects, one way that you can help us right now is to assign the copyrights of your software freedom works to Software Freedom Conservancy. As the Vizio suit shows, copyright-based claims will not be the sole focus of our enforcement. However, there are some key types of products where copyright claims are ideal. By assigning your copyrights to us, you can give us the ability to stand up for your software freedom and rights and, more importantly, the rights of your users. While we understand the FOSS community has some aversions to copyright assignment, we also know that, right now, many developers automatically assign their copyrights to their employers without demanding that their employers stand up for the copyleft rights of their users. We ask the community to reconsider this common practice, and request those who haven't already assigned copyright to their employer to assign their copyrights to us, and we urge those who have entered work-for-hire arrangements with employers ask those employers to give them back their copyrights immediately. (See our ContractPatch project for more information on how to do this.)

Today, we launch our self-service Copyright Assignment form. This new form, carefully vetted by our lawyers, allows you to quickly and easily assign your rights in your code, documentation, and other copyrightable works to Software Freedom Conservancy. We will use these copyrights to ensure companies follow the copyleft licenses that they use. You can assign copyrights for projects that are not members of Software Freedom Conservancy too. We will always enforce them in accordance with our Principles, and we will welcome you onto an internal mailing list and regular meetings to discuss our enforcement efforts.

Through the various software freedom lawsuits we have filed over the years, along with the lawsuits we've helped fund, Software Freedom Conservancy has established a track record of tangible enforcement actions.

We are very happy for all the support we've received from software freedom activists, developers, and other community members over the years in our software freedom enforcement actions. We hope you will continue to support us, and encourage others to do so, in whatever ways you can and, if it makes sense for you, by assigning your software freedom works to us so we can ensure the repairability of your electronics (and everyone else's!) going forward.

[permalink]

Tags: conservancy, licensing, resources

If Software is My Copilot, Who Programmed My Software?

by Bradley M. Kühn on February 3, 2022

Software freedom is our goal. Copyleft is a strategy to reach that goal. That tenet is oft forgotten by activists. Copyleft is even abused to advance proprietary goals. We too often see concern about the future of copyleft overshadow the necessary fundamental question: does a particular behavior or trend — and the inevitable outcomes of those behaviors and trends — increase or decrease users’ rights to copy, share, modify, and reinstall modified versions of their software? That question remains paramount as we face new challenges.

Introduced first by Microsoft’s GitHub in their Copilot product, computer-assisted software authorship by way of machine learning models presents a formidable challenge to software freedom’s future. Yet, we can, in fact, imagine a software freedom utopia that embodies this technology. Imagine that all software authors have access to the global archive of machine learning models — and they are fullly reproducible. Everyone has equal rights to fork these models, train them further with their own datasets, provided that they must release new models (and the input code) freely in the global archive. All code produced by these models is also made freely available under copyleft. All code that builds the models, all historical input sets, and all trained models are all also made available to everyone under copyleft licenses.

While activists might quibble about minor details to optimize imagined utopia, this thought experiment shows computer-assisted software authorship does not inherently negate software freedom. Rather, the rules, requirements, and policies that apply will determine whether software freedom is respected. To paraphrase Hamlet: there is nothing either good or bad, but the policy makes it so.

What’s the Worse That Could Happen?

[They are] not a good [person] who, without a protest, allows wrong to be committed … with the means which [they] help to supply.

— John Stewart Mill, University of St. Andrews, 1 February 1867

Obviously, ignoring machine learning for computer-assisted software authorship will not usher in this software freedom utopia. Copyleft activists cannot stand idly by in this situation, but we must temper our attention by considering the likelihood of dystopian and problematic outcomes, and the options available to prevent them.

In response to Copilot’s announcement, pundits speculated, without evidence, a prevailing feeling of “Free Software had a good run, but I guess that’s over now”. Such predictions seem consistent with the well-documented overoptimism of artificial intelligence success. Rapid replacement of traditional software development methodologies seem unlikely. As such, we should not overestimate the likelihood that these new systems will both accelerate proprietary software development, while we simultaneously fail to prevent copylefted software from enabling that activity. The former may not come to pass, so we should not unduly fret about the latter, lest we misdirect resources. In short, AI is usually slow-moving, and produces incremental change far more often than it produces radical change. The problem is thus not imminent nor the damage irreversible. However, we must respond deliberately with all due celerity — and begin that work immediately.

Currently, there are two factors that influence the timing of our response. First, if GitHub’s Copilot becomes a non-beta product available to the programming public, that would indicate necessity of an urgent response. Microsoft and GitHub are unlikely to share their product plans, so we cannot know for sure when this will occur. However, in the seven months since the first beta was made available, we’ve consistently heard anecdotally that more and more developers (particularly, FOSS developers!) have received beta invitations. Based on these (admittedly incomplete) facts, we must assume that a move from private beta to public deployment is imminent in 2022. This indicates some urgency of the problem.

Second, we already know that some of our worst fears are definitely true. Namely, that Microsoft and GitHub used copylefted software as part of Copilot’s training set.

Copilot was trained on “billions of lines of public code … written by others”. While GitHub has refused requests to release even a list of repositories included in the training set, the use of the word “public” indicates that only software with source-available licenses (even if not FOSS licenses) were input into Copilot. Furthermore, GitHub admits that during training, the system encountered a copy of the GPL more than 700,000 times. This effectively confirms that copylefted public code appears in the training set.

When questioned, former GNOME developer and GitHub CEO⁰, Nat Friedman, declared publicly “(1) training ML systems on public data is fair use (2) the output belongs to the operator”. Friedman himself, as well as Microsoft and GitHub’s other executives and lawyers, have ignored Software Freedom Conservancy’s requests for clarification and/or evidence supporting these statements.

Meanwhile, GitHub continues to improve this system, trained only on publicly source-available software, and seeks to market it to new users, including those who otherwise use FOSS development tools. Users continue to report gaining access to the beta and are noticing improvements. Microsoft and GitHub’s public position is meanwhile clear: they claim to have no copyleft obligations for training the model, the model itself, and deploying the service. They also believe there are no licensing obligations for the output.

While Friedman ignored the community’s requests publicly, we inquired privately with Friedman⁰ and other Microsoft and GitHub representatives in June 2021, asking for solid legal references for GitHub’s public legal positions of (1) and (2) above. They provided none, and reiterated, without evidence, that they believed the model does not contain copies of the software, and output produced by Copilot can be licensed under any license. We further asked if there are no licensing concerns on either side, why did Microsoft not also train the system on their large proprietary codebases such as Office? They had no immediate answer. Microsoft and GitHub promised to get back to us, but have not.

This secrecy and non-cooperativeness is expected from a proprietary software company and its subsidiary, but leaves us only with speculative conclusions to inform a strategy for copyleft here. We can reliably guess that the companies will claim “fair use” as their primary justification for creating the model and offering the service, and will argue that both the output and the trained model are not “work[s] based on the Program” (GPLv2) nor do they “copy from or adapt all or part of the work[s] in a fashion requiring copyright permission” (GPLv3/AGPLv3). Furthermore, we can reliably conclude, given the continuing product promotion, that the companies have at least a medium-term commitment to Copilot.

In short, they have already hunkered down for a protracted disagreement. Their positions are now incumbent — using their resources and power to successfully charge copyleft activists to “prove them wrong”. But we do not have to accept their unsubstantiated arguments at face value. In fact, these areas are so substantially novel that almost every issue has no definitive answers, but we must nevertheless begin to formulate our position and our response to Microsoft and GitHub’s assault on copyleft.

Trained Models, Fair Use, and Copyright Infringement

Consider GitHub’s claim that “training ML systems on public data is fair use”. We have not found any case of note — at least in the USA — that truly contemplates that question. The only legal case in the USA to look near this question is Authors Guild v. Google, Inc., 804 F.3d 202 (2d Cir. 2015). The Supreme Court denied certiorari on this case; it is not legal precedent in all jurisdictions where Microsoft and GitHub operate.

Even more, that case considered a fact pattern centered around search, not authorship of new/derived works. Google had made copies of entire copyrighted books, not for the purpose of displaying them, but so users could (1) run search queries, and (2) see a “snippet” of the search hits (i.e., to see the search hit in context). The Second Circuit held Google’s copying of the books was “fair use” because searching and providing context added value exceeding what a user could obtain from their own copies, and Google’s product did not substitute the market for the books.

The analogous fact pattern for code is obvious: GitHub could offer a search tool that assists users in finding key public repositories (and specific lines of code within those repositories) that seemed to solve tasks of interest. Developers could then easily utilitize those codebases in the usual, license-compliant ways. The actual Copilot fact pattern is not this one.

Meanwhile, the Authors Guild case begins and ends the list of major cases regarding machine learning systems and “fair use”. We should simply ignore GitHub’s risible claim that the “fair use question” on machine learning is settled.

Perhaps most importantly, in the USA, “fair use” is an affirmative defense to answer copyright infringement. In concrete terms, that means — particularly in cases where the circumstances are novel — a copyright holder brings an infringement lawsuit and then the alleged infringer shows in court that their actions met the relevant factors for “fair use” sufficiently. Frankly, we refuse to do these companies’ job for them. Copyleft activists need not tell Microsoft and GitHub why this isn’t “fair use”, rather, they need to tell us why training the model with copylefted code is “fair use” and prove that the trained model itself is not a “work based on” the GPL’d software.

GitHub has meanwhile artfully avoided the question of whether the trained model is a “work based on” the input. We contend that it probably is. However, given that “fair use” is an affirmative defense to copyright infringement, they are obviously anticipating a claim that the trained model is, in fact, a “work based on” the inputs to the model. Why else would they even bring up “fair use”, rather than simply say their use is fully non-infringing? Anyway, we have no way to even explore these questions authoritatively without examining the model, fully affixed in its tangible medium. We don’t expect GitHub to produce that unless compelled by a third party.

Indeed, discussion of these questions outside of a courtroom is moot. For this novel and contentious fact pattern, only a court decision can settle the matter adequately. As a strategic matter, copyleft activists should keep their own counsel about what we anticipate in the opposition’s “fair use” and/or non-infringement defenses, and the counter-arguments that we plan.

Copilot Users Should Worry

GitHub’s position does a great disservice to Copilot users. Their claim that “the output belongs to the operator” creates a false sense of legal justification. Users have already shown that Copilot can generate a substantial amount of unique, GPL’d code, and then (rather ironically, given GitHub’s claim that they removed the text of the GPL from the training set) also suggest a license that is non-copyleft. Friedman’s statement surely does not qualify as an indemnity for Copilot users who might face GPL enforcement actions. Users almost surely must construct their own “fair use” or “not copyrightable” defenses for Copilot’s output.

The length and detail of what Copilot can generate for users seems unbounded. The glaring example above appears primia facie to be copyright infringement; we expect further such problems. Consider the sheer amount that a fully functional and successful Copilot would generate. Surely, AI researchers seek the ability for Copilot to “figure out” that you are trying to solve some specific task when programming. The better Copilot gets at handing ready-made solutions to its users, the more likely it becomes that its output may offer the user copylefted software.

Copilot leaves copyleft compliance as an exercise for the user. Users likely face growing liability that only increases as Copilot improves. Users currently have no methods besides serendipity and educated guesses to know whether Copilot’s output is copyrighted by someone else. Proprietary software companies such as Synopsys provide so-called “scanning tools” — that can search your proprietary codebase and find hidden copylefted software. However, the FOSS tools for that job are in their infancy and unlikely to develop quickly, since historically those who want those tools are companies that primarily develop proprietary software and seek to avoid copylefted software.

We recommend users who wish to avoid infringing the copyrights of others simply avoid Copilot.

On Copyleft Maximalism and Unilateral Capitulation

Draconian copyright law generally horrifies software freedom activists for good reason. Nearly all copyleft activists would prefer a true, multilateral rewriting of copyright rules that prioritized the interest of the general public and software rights. Copyleft exists primarily because of the long-standing political non-viability of a copyright law reboot. Nothing has changed in this regard; if anything, changing legislation has become an even more expensive lobbying proposition than it was at copyleft’s advent. Copyleft activists should expect, indefinitely, for proprietary software companies and media oligarchs to control copyright legislation.

Fortunately, copyleft was designed specifically for this eventuality. Activists have called copyleft the “judo move” of software freedom, since copyleft uses the powerful copyright force (invented primarily by our opposition) against itself. That realization leads to a painful, but pragmatically necessary, awkwardness.

The issues herein — from training of machine learning models, to the copyright questions about those models, to the derivation questions about their output — are novel copyright questions. As software freedom activists, we are uniquely qualified to invent an ideal copyright structure for these technologies. But, without a path to promulgate such replacement copyright rules into the incumbent system, that exercise is futile. Furthermore, systems outside of copyright — including but not limited to EULAs, business agreements and patents — have long been used to proprietarize software without the need of copyright. Reality of facts on the ground dictate that we not concede the only wedge we have to compel software freedom; that wedge is copyleft.

Meanwhile, proprietary software companies regularly exploit any unilateral concessions on weakening of copyleft that FOSS projects make, while continuing to pursue copyright maximalism for their works. Particularly in novel areas, we must assume a copyleft maximalist approach — until courts or the legislature disarm all mechanisms to control users’ rights with regard to software. That adversarial process will frustrate us, but ultimately by choosing copyright as our primary tool, we already chose the courts as our battleground for contentious issues.

We all surely have our opinions about how copyleft should operate in these novel situations. We have even expressed some such opinions herein. But, ultimately, strong copyleft licenses do not defer the “what’s covered?” question to one individual or organization. The “judo” power comes from strong copyleft reaching to all of what copyright governs. When those issues are novel — and companies flaunt that novel manipulation of copylefted works — only a court can answer definitively.

A Community-Led Response

While these companies will likely not succeed in their efforts to disarm copyleft, they have nevertheless attacked the entire copyleft infrastructure. We must mount an effective response.

Software Freedom Conservancy has spent the last six months in deep internal discussions about this novel threat to the very efficacy of copyleft. We have a few ideas — a mix of short-term, medium-term and long-term strategies to address the problem. However, we recognize that a community (rather than the traditional BDFL) approach is needed — at least for this problem. Thus, putting first things first, we realized that we should gather the best minds in the software freedom community with direct experience in copyleft theory and practice. We will convene these individuals to a committee specifically chartered by Software Freedom Conservancy to — as quickly as reasonably possible – publish a series of recommendations to the community on how we should respond to both the immediate threat to copyleft found in Copilot, and (long-term) analyze the more general threat that AI-assisted programming techniques pose to the strategy of copyleft.

While we are not actively seeking applications for this committee, we do welcome anyone whom we have not yet solicited to participate to contact us and inquire. We will surely be unable to include everyone who is interested on the committee — either due to Conflicts of Interest or due to simple logistics of creating too large a committee. However, we will carefully consider anyone who expresses bona fide interest to participate.

Finally, as much as can be done during the pandemic using FOSS tools available, we will attempt to convene public discussions as much as possible. We will contemporaneously publish the committee’s minutes publicly. If you’d like to get involved today in public discussions about this issue, please join the mailing we launched today for this topic.

⁰In November 2021, Nat Friedman was replaced by Thomas Dohmke as GitHub’s CEO. However, to our knowledge, Dohmke has not retracted or clarified Friedman's comments, and at the time of writing, no one from GitHub or Microsoft that we spoke to had responded to our requests for clarification.

[permalink]

Tags: conservancy, law, licensing

Next page (older) » « Previous page (newer)

1 2 [3] 4 5 6 7 8