Give Up GitHub: The Time Has Come!
by
on June 30, 2022Those who forget history often inadvertently repeat it. Some of us recall that twenty-one years ago, the most popular code hosting site, a fully Free and Open Source (FOSS) site called SourceForge, proprietarized all their code — never to make it FOSS again. Major FOSS projects slowly left SourceForge since it was now, itself, a proprietary system, and antithetical to FOSS. FOSS communities learned that it was a mistake to allow a for-profit, proprietary software company to become the dominant FOSS collaborative development site. SourceForge slowly collapsed after the DotCom crash, and today, SourceForge still refuses to solve these problems0. We learned a valuable lesson that was a bit too easy to forget — especially when corporate involvement manipulates FOSS communities to its own ends. We now must learn the SourceForge lesson again with Microsoft's GitHub.
GitHub has, in the last ten years, risen to dominate FOSS development. They did this by building a user interface and adding social interaction features to the existing Git technology. (For its part, Git was designed specifically to make software development distributed without a centralized site.) In the central irony, GitHub succeeded where SourceForge failed: they have convinced us to promote and even aid in the creation of a proprietary system that exploits FOSS. GitHub profits from those proprietary products (sometimes from customers who use it for problematic activities). Specifically, GitHub profits primarily from those who wish to use GitHub tools for in-house proprietary software development. Yet, GitHub comes out again and again seeming like a good actor — because they point to their largess in providing services to so many FOSS endeavors. But we've learned from the many gratis offerings in Big Tech: if you aren't the customer, you're the product. The FOSS development methodology is GitHub's product, which they've proprietarized and repackaged with our active (if often unwitting) help.
FOSS developers have been for too long the proverbial frog in slowly boiling water. GitHub's behavior has gotten progressively worse, and we've excused, ignored, or otherwise acquiesced to cognitive dissonance. We at Software Freedom Conservancy have ourselves been part of the problem; until recently, even we'd become too comfortable, complacent, and complicit with GitHub. Giving up GitHub will require work, sacrifice and may take a long time, even for us: we at Software Freedom Conservancy historically self-hosted our primary Git repositories, but we did use GitHub as a mirror. We urged our member projects and community members to avoid GitHub (and all proprietary software development services and infrastructure), but this was not enough. Today, we take a stronger stance. We are ending all our own uses of GitHub, and announcing a long-term plan to assist FOSS projects to migrate away from GitHub. While we will not mandate our existing member projects to move at this time, we will no longer accept new member projects that do not have a long-term plan to migrate away from GitHub. We will provide resources to support any of our member projects that choose to migrate, and help them however we can.
There are so many good reasons to give up on GitHub, and we list the major ones on our Give Up On GitHub site. We were already considering this action ourselves for some time, but last week's event showed that this action is overdue.
Specifically, we at Software Freedom Conservancy have been actively communicating with Microsoft and their GitHub subsidiary about our concerns with “Copilot” since they first launched it almost exactly a year ago. Our initial video chat call (in July 2021) with Microsoft and GitHub representatives resulted in several questions which they said they could not answer at that time, but would “answer soon”. After six months of no response, Bradley published his essay, If Software is My Copilot, Who Programmed My Software? — which raised these questions publicly. Still, GitHub did not answer our questions. Three weeks later, we launched a committee of experts to consider the moral implications of AI-assisted software, along with a parallel public discussion. We invited Microsoft and GitHub representives to the public discussion, and they ignored our invitation. Last week, after we reminded GitHub of (a) the pending questions that we'd waited a year for them to answer and (b) of their refusal to join public discussion on the topic, they responded a week later, saying they would not join any public nor private discussion on this matter because “a broader conversation [about the ethics of AI-assisted software] seemed unlikely to alter your [SFC's] stance, which is why we [GitHub] have not responded to your [SFC's] detailed questions”. In other words, GitHub's final position on Copilot is: if you disagree with GitHub about policy matters related to Copilot, then you don't deserve a reply from Microsoft or GitHub. They only will bother to reply if they think they can immediately change your policy position to theirs. But, Microsoft and GitHub will leave you hanging for a year before they'll tell you that!
Nevertheless, we were previously content to leave all this low on the priority list — after all, for its first year of existence, Copilot appeared to be more research prototype than product. Facts changed last week when GitHub announced Copilot as a commercial, for-profit product. Launching a for-profit product that disrespects the FOSS community in the way Copilot does simply makes the weight of GitHub's bad behavior too much to bear.
Our three primary questions for Microsoft/GitHub (i.e., the questions they had been promising answers to us for a year, and that they now formally refused to answer) regarding Copilot were:
-
What case law, if any, did you rely on in Microsoft & GitHub's public claim, stated by GitHub's (then) CEO, that: “(1) training ML systems on public data is fair use, (2) the output belongs to the operator, just like with a compiler”? In the interest of transparency and respect to the FOSS community, please also provide the community with your full legal analysis on why you believe that these statements are true.
We think that we can now take Microsoft and GitHub's refusal to answer as an answer of its own: they obviously stand by their former CEO's statement (the only one they've made on the subject), and simply refuse to justify their unsupported legal theory to the community with actual legal analysis.
-
If it is, as you claim, permissible to train the model (and allow users to generate code based on that model) on any code whatsoever and not be bound by any licensing terms, why did you choose to only train Copilot's model on FOSS? For example, why are your Microsoft Windows and Office codebases not in your training set?
Microsoft and GitHub's refusal to answer also hints at the real answer to this question, too: While GitHub gladly exploits FOSS inappropriately, they value their own “intellectual property” much more highly than FOSS, and are content to ignore and erode the rights of FOSS users but not their own.
-
Can you provide a list of licenses, including names of copyright holders and/or names of Git repositories, that were in the training set used for Copilot? If not, why are you withholding this information from the community?
We can only wildly speculate as to why they refuse to answer this question. However, good science practices would mean that they could answer that question in any event. (Good scientists take careful notes about the exact inputs to their experiments.) Since GitHub refuses to answer, our best guess is that they don't have the ability to carefully reproduce their resulting model, so they don't actually know the answer to whose copyrights they infringed and when and how.
As a result of GitHub's bad actions, today we call on all FOSS developers to leave GitHub. We acknowledge that answering that call requires sacrifice and great inconvenience, and will take much time to accomplish. Yet, refusing GitHub's services is the primary power developers have to send a strong message to GitHub and Microsoft about their bad behavior. GitHub's business model has always been “proprietary vendor lock-in”. That's the very behavior FOSS was founded to curtail, and it's why quitting incumbent proprietary software in favor of a FOSS solution is often difficult. But remember: GitHub needs FOSS projects to use their proprietary infrastructure more than we need their proprietary infrastructure. Alternatives exist, albeit with less familiar interfaces and on less popular websites — but we can also help improve those alternatives. And, if you join us, you will not be alone. We've launched a website, GiveUpGitHub.org, where we'll provide tips, ideas, methods, tools and support to those that wish to leave GitHub with us. Watch that site and our blog throughout 2022 (and beyond!) for more.
Most importantly, we are committed to offering alternatives to projects that don't yet have another place to go. We will be announcing more hosting instance options, and a guide for replacing GitHub services in the coming weeks. If you're ready to take on the challenge now and give up GitHub today, we note that CodeBerg, which is based on Gitea implements many (although not all) of GitHub. Thus, we're also going to work on even more solutions, continue to vet other FOSS options, and publish and/or curate guides on (for example) how to deploy a self-hosted instance of the GitLab Community Edition.
Meanwhile, the work of our committee continues to carefully study the general question of AI-assisted software development tools. One recent preliminary finding was that AI-assisted software development tools can be constructed in a way that by-default respects FOSS licenses. We will continue to support the committee as they explore that idea further, and, with their help, we are actively monitoring this novel area of research. While Microsoft's GitHub was the first mover in this area, by way of comparison, early reports suggest that Amazon's new CodeWhisperer system (also launched last week) seeks to provide proper attribution and licensing information for code suggestions1.
This harkens to long-standing problems with GitHub, and the central reason why we must together give up on GitHub. We've seen with Copilot, with GitHub's core hosting service, and in nearly every area of endeavor, GitHub's behavior is substantially worse than that of their peers. We don't believe Amazon, Atlassian, GitLab, or any other for-profit hoster are perfect actors. However, a relative comparison of GitHub's behavior to those of its peers shows that GitHub's behavior is much worse. GitHub also has a record of ignoring, dismissing and/or belittling community complaints on so many issues, that we must urge all FOSS developers to leave GitHub as soon as they can. Please, join us in our efforts to return to a world where FOSS is developed using FOSS.
We expect this particular blog post will generate a lot of discussion. We welcome you to interact with SFC staff on our public mailing list about this effort.
Footnotes
0SourceForge is now built as a (apparently proprietary) fork of a different FOSS system (called Allura). SourceForge's CEO ignored our multiple inquiries asking if SourceForge really is running upstream Allura (i.e., has no proprietary modifications), and our repeated requests for a link that explains how a project can leave SourceForge for self-hosted Allura. The responses from SourceForge management were quite similar to those received since 2001 — when they first went proprietary.
1However, we have not analyzed CodeWhisperer in depth so we cannot say for sure if Amazon's implementation is compliant with the respective licenses. Nevertheless, Amazon's behavior here shows sharp contrast with Microsoft's GitHub: Amazon acknowledges the obvious fact that there are license obligations that deserve attention and care when building AI-assisted programming solutions.
Please email any comments on this entry to info@sfconservancy.org.