SFC Announces Aspirational Statement on LLM-backed generative AI for Programming

October 25, 2024

In 2022, Software Freedom Conservancy (SFC) convened a committee in the wake of Microsoft's GitHub Copilot announcement, to meet and begin considering the complex questions that arise from the use of large language models (LLMs) in generative AI systems that seek to assist software developers.

Today, we announce a joint statement by this committee, entitled Machine-Learning-Assisted Programming that Respects User Freedom.

Everyone on our committee has watched as interest in this issue has grown in the FOSS community. While the Committee was initially convened to consider how copyleft related to these systems, our focus changed as we considered the complex issues. With the unending influx of models, products, and projects in this area, we began to see a potential dystopia: no systems available today are reproducible by the public, and all of them seem to disrespect user rights and freedoms in some manner. Rather than despair, we turned our minds to what FOSS does best: imagining the ideal if corporate interests were not the primary force defining society's relationship with software.

In the past, the FOSS community has responded to new challenges with a race-to-the-bottom document that defines the bare minimum of user rights and freedoms that the community of activists will accept. For-profit companies hope to legitimately claim whatever they produce is “FOSS enough”. As such, we have avoided any process that effectively auto-endorses the problematic practices of companies whose proprietary products are already widely deployed . No system, particularly a proprietary one, should ever be "too big to fail".

While our proposal may seem unrealistic, nearly every proposal in the history of FOSS has seemed unrealistic — until it happened. We call on the FOSS community to not lament what is, but to dream and strive for what can be. The statement follows:

Machine-Learning-Assisted Programming that Respects User Freedom

There has been intense industry ballyhoo about a specific branch of Artificial Intelligence (AI): generative AI backed by large language models (LLMs). We have reached an era in computing history where input data sets for many different types of works are quite large (after decades of Internet content archiving), and hardware is powerful enough to rebuild LLMs repetitively. As FOSS (Free and Open Source Software) activists, we must turn at least a modicum of attention to the matter, lest its future be dominated by the same proprietary software companies that have curtailed user rights for so long.

LLM-backed generative AI impacts the rights of everyone — including developers, creators, and users. Software freedom, both in theory and practice, yields substantial public good. Yet, traditional, narrow FOSS analysis has boundaries and confines; it's inadequate when applied to these technologies.

We propose an aspirational vision of a FOSS, LLM-backed generative AI system for computer-assisted programming that software rights supporters would be proud to use and improve.

This narrow approach is by design. We are keenly cognizant that LLMs have been built for myriad works — from visual art, to the spoken human voice, to music, to literature, to actors' performances. However, this document focuses on systems that employ LLM-backed generative AI to assist programmers because such systems have a critical role in the future of FOSS. While the impact of AI-based programming assistants' in the daily life of programmers remains unclear (in the long term), it seems likely that AI assistants have the potential to advance FOSS goals around the democratization of software development. For example, such systems help newcomers get started with unfamiliar codebases. We must look hopefully to these technologies and seek ways to deploy them that help everyone.

Aspirational Target for a Software-Rights-Respecting AI Assisted Programming System

The ideal system for generative-AI-assisted programming should have the following properties:

The system is built using only FOSS, and is used only for the creation of FOSS, and never for proprietary software. In this manner, the system would propagate and improve interest in software freedom and rights.
The system must respect the principle of “FOSS in, FOSS out, and FOSS throughout”. In detail, this means:
1. All software and generally useful technical information (including but not limited to: user interface code and applications for generating new material from the model, data cleaning code, model architecture, hyper parameters, model weights, and the model itself) needed to create the system are freely available to the public under a FOSS license¹.
2. All training data should be fully identified, and available freely and publicly on the Internet, under a FOSS license.
The system will aid the user in adding necessary licensing notices and determining any licensing requirements² of the output.

As an aspirational document, this is not intended to be prescriptive nor definitional. We describe the absolute ideal LLM-backed generative AI system for FOSS that we can imagine. Articulating the ideal paves the road to understanding why common consensus remains insufficient. We must be the change we want in the world, and strive for what is right — until the politically unviable becomes viable.

It is well established that FOSS activists consider it a moral imperative to share any generally useful technical information under a FOSS license. As such, we should not tolerate any portion of the software and generally useful technical information released under a license that is non-FOSS. ↩
Since recitation (i.e., verbatim repeating of parts of the training set) is known to occur in these systems, we know they will occasionally output Works Based on the training set, so our ideal system would be capable of notifying the user that recitation occurred and properly mark the licensing for it. ↩