Conservancy Blog
Displaying posts tagged Git
Give Up GitHub: The Time Has Come!
by
on June 30, 2022Those who forget history often inadvertently repeat it. Some of us recall that twenty-one years ago, the most popular code hosting site, a fully Free and Open Source (FOSS) site called SourceForge, proprietarized all their code — never to make it FOSS again. Major FOSS projects slowly left SourceForge since it was now, itself, a proprietary system, and antithetical to FOSS. FOSS communities learned that it was a mistake to allow a for-profit, proprietary software company to become the dominant FOSS collaborative development site. SourceForge slowly collapsed after the DotCom crash, and today, SourceForge still refuses to solve these problems0. We learned a valuable lesson that was a bit too easy to forget — especially when corporate involvement manipulates FOSS communities to its own ends. We now must learn the SourceForge lesson again with Microsoft's GitHub.
GitHub has, in the last ten years, risen to dominate FOSS development. They did this by building a user interface and adding social interaction features to the existing Git technology. (For its part, Git was designed specifically to make software development distributed without a centralized site.) In the central irony, GitHub succeeded where SourceForge failed: they have convinced us to promote and even aid in the creation of a proprietary system that exploits FOSS. GitHub profits from those proprietary products (sometimes from customers who use it for problematic activities). Specifically, GitHub profits primarily from those who wish to use GitHub tools for in-house proprietary software development. Yet, GitHub comes out again and again seeming like a good actor — because they point to their largess in providing services to so many FOSS endeavors. But we've learned from the many gratis offerings in Big Tech: if you aren't the customer, you're the product. The FOSS development methodology is GitHub's product, which they've proprietarized and repackaged with our active (if often unwitting) help.
FOSS developers have been for too long the proverbial frog in slowly boiling water. GitHub's behavior has gotten progressively worse, and we've excused, ignored, or otherwise acquiesced to cognitive dissonance. We at Software Freedom Conservancy have ourselves been part of the problem; until recently, even we'd become too comfortable, complacent, and complicit with GitHub. Giving up GitHub will require work, sacrifice and may take a long time, even for us: we at Software Freedom Conservancy historically self-hosted our primary Git repositories, but we did use GitHub as a mirror. We urged our member projects and community members to avoid GitHub (and all proprietary software development services and infrastructure), but this was not enough. Today, we take a stronger stance. We are ending all our own uses of GitHub, and announcing a long-term plan to assist FOSS projects to migrate away from GitHub. While we will not mandate our existing member projects to move at this time, we will no longer accept new member projects that do not have a long-term plan to migrate away from GitHub. We will provide resources to support any of our member projects that choose to migrate, and help them however we can.
There are so many good reasons to give up on GitHub, and we list the major ones on our Give Up On GitHub site. We were already considering this action ourselves for some time, but last week's event showed that this action is overdue.
Specifically, we at Software Freedom Conservancy have been actively communicating with Microsoft and their GitHub subsidiary about our concerns with “Copilot” since they first launched it almost exactly a year ago. Our initial video chat call (in July 2021) with Microsoft and GitHub representatives resulted in several questions which they said they could not answer at that time, but would “answer soon”. After six months of no response, Bradley published his essay, If Software is My Copilot, Who Programmed My Software? — which raised these questions publicly. Still, GitHub did not answer our questions. Three weeks later, we launched a committee of experts to consider the moral implications of AI-assisted software, along with a parallel public discussion. We invited Microsoft and GitHub representives to the public discussion, and they ignored our invitation. Last week, after we reminded GitHub of (a) the pending questions that we'd waited a year for them to answer and (b) of their refusal to join public discussion on the topic, they responded a week later, saying they would not join any public nor private discussion on this matter because “a broader conversation [about the ethics of AI-assisted software] seemed unlikely to alter your [SFC's] stance, which is why we [GitHub] have not responded to your [SFC's] detailed questions”. In other words, GitHub's final position on Copilot is: if you disagree with GitHub about policy matters related to Copilot, then you don't deserve a reply from Microsoft or GitHub. They only will bother to reply if they think they can immediately change your policy position to theirs. But, Microsoft and GitHub will leave you hanging for a year before they'll tell you that!
Nevertheless, we were previously content to leave all this low on the priority list — after all, for its first year of existence, Copilot appeared to be more research prototype than product. Facts changed last week when GitHub announced Copilot as a commercial, for-profit product. Launching a for-profit product that disrespects the FOSS community in the way Copilot does simply makes the weight of GitHub's bad behavior too much to bear.
Our three primary questions for Microsoft/GitHub (i.e., the questions they had been promising answers to us for a year, and that they now formally refused to answer) regarding Copilot were:
-
What case law, if any, did you rely on in Microsoft & GitHub's public claim, stated by GitHub's (then) CEO, that: “(1) training ML systems on public data is fair use, (2) the output belongs to the operator, just like with a compiler”? In the interest of transparency and respect to the FOSS community, please also provide the community with your full legal analysis on why you believe that these statements are true.
We think that we can now take Microsoft and GitHub's refusal to answer as an answer of its own: they obviously stand by their former CEO's statement (the only one they've made on the subject), and simply refuse to justify their unsupported legal theory to the community with actual legal analysis.
-
If it is, as you claim, permissible to train the model (and allow users to generate code based on that model) on any code whatsoever and not be bound by any licensing terms, why did you choose to only train Copilot's model on FOSS? For example, why are your Microsoft Windows and Office codebases not in your training set?
Microsoft and GitHub's refusal to answer also hints at the real answer to this question, too: While GitHub gladly exploits FOSS inappropriately, they value their own “intellectual property” much more highly than FOSS, and are content to ignore and erode the rights of FOSS users but not their own.
-
Can you provide a list of licenses, including names of copyright holders and/or names of Git repositories, that were in the training set used for Copilot? If not, why are you withholding this information from the community?
We can only wildly speculate as to why they refuse to answer this question. However, good science practices would mean that they could answer that question in any event. (Good scientists take careful notes about the exact inputs to their experiments.) Since GitHub refuses to answer, our best guess is that they don't have the ability to carefully reproduce their resulting model, so they don't actually know the answer to whose copyrights they infringed and when and how.
As a result of GitHub's bad actions, today we call on all FOSS developers to leave GitHub. We acknowledge that answering that call requires sacrifice and great inconvenience, and will take much time to accomplish. Yet, refusing GitHub's services is the primary power developers have to send a strong message to GitHub and Microsoft about their bad behavior. GitHub's business model has always been “proprietary vendor lock-in”. That's the very behavior FOSS was founded to curtail, and it's why quitting incumbent proprietary software in favor of a FOSS solution is often difficult. But remember: GitHub needs FOSS projects to use their proprietary infrastructure more than we need their proprietary infrastructure. Alternatives exist, albeit with less familiar interfaces and on less popular websites — but we can also help improve those alternatives. And, if you join us, you will not be alone. We've launched a website, GiveUpGitHub.org, where we'll provide tips, ideas, methods, tools and support to those that wish to leave GitHub with us. Watch that site and our blog throughout 2022 (and beyond!) for more.
Most importantly, we are committed to offering alternatives to projects that don't yet have another place to go. We will be announcing more hosting instance options, and a guide for replacing GitHub services in the coming weeks. If you're ready to take on the challenge now and give up GitHub today, we note that CodeBerg, which is based on Gitea implements many (although not all) of GitHub. Thus, we're also going to work on even more solutions, continue to vet other FOSS options, and publish and/or curate guides on (for example) how to deploy a self-hosted instance of the GitLab Community Edition.
Meanwhile, the work of our committee continues to carefully study the general question of AI-assisted software development tools. One recent preliminary finding was that AI-assisted software development tools can be constructed in a way that by-default respects FOSS licenses. We will continue to support the committee as they explore that idea further, and, with their help, we are actively monitoring this novel area of research. While Microsoft's GitHub was the first mover in this area, by way of comparison, early reports suggest that Amazon's new CodeWhisperer system (also launched last week) seeks to provide proper attribution and licensing information for code suggestions1.
This harkens to long-standing problems with GitHub, and the central reason why we must together give up on GitHub. We've seen with Copilot, with GitHub's core hosting service, and in nearly every area of endeavor, GitHub's behavior is substantially worse than that of their peers. We don't believe Amazon, Atlassian, GitLab, or any other for-profit hoster are perfect actors. However, a relative comparison of GitHub's behavior to those of its peers shows that GitHub's behavior is much worse. GitHub also has a record of ignoring, dismissing and/or belittling community complaints on so many issues, that we must urge all FOSS developers to leave GitHub as soon as they can. Please, join us in our efforts to return to a world where FOSS is developed using FOSS.
We expect this particular blog post will generate a lot of discussion. We welcome you to interact with SFC staff on our public mailing list about this effort.
Footnotes
0SourceForge is now built as a (apparently proprietary) fork of a different FOSS system (called Allura). SourceForge's CEO ignored our multiple inquiries asking if SourceForge really is running upstream Allura (i.e., has no proprietary modifications), and our repeated requests for a link that explains how a project can leave SourceForge for self-hosted Allura. The responses from SourceForge management were quite similar to those received since 2001 — when they first went proprietary.
1However, we have not analyzed CodeWhisperer in depth so we cannot say for sure if Amazon's implementation is compliant with the respective licenses. Nevertheless, Amazon's behavior here shows sharp contrast with Microsoft's GitHub: Amazon acknowledges the obvious fact that there are license obligations that deserve attention and care when building AI-assisted programming solutions.
Free Software: Behind the Scenes
by
on January 15, 2019We wrote a few weeks ago about how Conservancy has several projects that support new people or less technical people and help bring new people into free software. We also support many projects that most folks probably don't think about very often. Many of our projects exist relatively outside of the spotlight and facilitate the creation of free software by providing tools, systems and infrastructure for developers.
Testing and Automation
Once you've got some code, how do you make sure it works everywhere you want it to -- in the way that you want it to? Testing and automation. Selenium is a suite of tools for browser automation. The W3C recommended their WebDriver tool as the best tool for the development of a more accessible and collaborative web last year. Just a few short months ago, we welcomed Reproducible Builds, a project that attests that your build is safe and uncompromised. The integrity of code is critical if you care about user safety and true software freedom and that's why each build needs to be tested and verified using a free software tool.
Interoperability and efficiency are also important. Projects that ignore this can find it hard to increase adoption. QEMU is a generic and free/open source machine emulator and virtualizer that helps developers build programs that work on different kinds of hardware. This lets developers create free software that works on all kinds of machines and with all kinds of hardware. Buildbot is a framework which enables software developers to automate software builds by scheduling different pieces of work. Both tools help developers create software that is useful to all kinds of users on all different systems.
Freedom All the Way Down the Stack
It's a little easier to expain why you want software for the tools that users directly interact with, but what about the tools that most users never see? The bits that talk to the hardware, the pieces that turn on your machine and the code that powers the internet also need to be free. You can't mix and match fee and non-free code and be sure you are getting all of the benefits of user freedom. That's why we are proud to spport so many projects that live close to the bare metal and work on critical interstitial bits that don't always get a lot of press.
Samba removes barriers to interoperability and is standard on nearly all distributions of Linux. Samba is what allows GNU/Linux and Unix machines to access file and print servers that are designed with Windows users in mind. This kind of hardware to hardware level interoperability makes it easy for folks to choose a free operating system for their personal machine, when their workplace or school isn't ready to switch.
Harvey OS provides a fully free operating system with a very compact kernel in which all resources are treated as files. This provides Unix users new ways of working with permissions and applications. Coreboot is an extended firmware platform, which provides users with a lightning fast and fully free boot system for desktops, laptops, servers and tablets. Start with freedom as soon as you boot!
We must have a free software foundation to build on top of, if we ever hope to offer users a completely free computing environment, both online and off. Linux XIA is a protocol stack for Linux that uses eXpress Internet Architecture (XIA) to enable a more trustworthy and interoperable internet while also improving continuity for network users.
Metalink is dedicated to improving downloads. Metalink makes it much easier for people — especially those in areas with inferior Internet connections — to download Open Source and Free Software. Just one non-free piece in the puzzle can counteract the intention to provide user freedom, privacy and security by that free software developers are working to provide throughout the rest of the stack.
Nuts and Bolts
We love supporting tools that free software developers use as part of their workflow to create more free software. We host three version control systems at Conservancy; Git, Mercurial and Darcs, which is a distributed revision control system written in Haskell.
We also support projects that help developers maintain their internal code. Kallithea is a free software source code management system that we use for many of our own scripts and systems. It lets teams easily maintain different versions of internal code projects. phpMyAdmin is a free and open source web interface for the MySQL and MariaDB database systems. It's a mature project that helps folks administrate their web-based MySQL instances.
Conservancy believes that everyone deserves full software freedom, without backdoors or exceptions. Developers deserve free tools and users deserve freedom all the way down to the bare metal. We don't live in that world just yet, but it's got to be built one piece at a time. Many of our projects aren't famous, but they're all important for securing full user freedom and that's why we support their work here at Conservancy.