Conservancy Blog
Open Source AI Definition Erodes the Meaning of “Open Source”
by
on October 31, 2024This week, the Open Source Initiative (OSI) made their new Open Source Artificial Intelligence Definition (OSAID) official with its 1.0 release. With this announcement, we have reached the moment that software freedom advocates have feared for decades: the definition of “open source” — with which OSI was entrusted — now differs in significant ways from the views of most software freedom advocates.
There has been substantial acrimony during the drafting process of OSAID, and this blog post does not summarize all the community complaints about the OSAID and its drafting process. Other bloggers and the press have covered those. The TLDR here, IMO is simply stated: the OSAID fails to require reproducibility by the public of the scientific process of building these systems, because the OSAID fails to place sufficient requirements on the licensing and public disclosure of training sets for so-called “Open Source” systems. The OSI refused to add this requirement because of a fundamental flaw in their process; they decided that “there was no point in publishing a definition that no existing AI system could currently meet”. This fundamental compromise undermined the community process, and amplified the role of stakeholders who would financially benefit from OSI's retroactive declaration that their systems are “open source”. The OSI should have refrained from publishing a definition yet, and instead labeled this document as ”recommendations” for now.
As the publication date of the OSAID approached, I could not help but
remember a fascinating statement that Donald E. Knuth, one of the founders
of the field of computer
science, once
said: [M]y role is to be on the bottom of things. … I try to
digest … knowledge into a form that is accessible to people who don't
have time for such study
. If we wish to engage in the
highly philosophical (and easily politically corruptible) task
of defining what terms like “software freedom” and
“open source” mean, we must learn to be on the “bottom of
things”. OSI made an unforced error in this regard. While they could
have humbly announced this as “recommendations” or “guidelines”,
they instead formalized it as a “definition” — with equivalent authority to their
OSD.
Yet, OSI itself only turned its attention to AI only recently, when they announced their “deep dive” — for which Microsoft's GitHub was OSI's “Thought Leader”. OSI has responded too rapidly to this industry ballyhoo. Their celerity of response made OSI an easy target for regulatory capture.
By comparison, the original OSD was first published in February 1999. That was at least twelve years after the widespread industry adoption of various FOSS programs (such as the GNU C Compiler and BSD). The concept was explored and discussed publicly (under the moniker “Free Software”) for decades before it was officially “defined”. The OSI announced itself as the “marketing department for Free Software” and based the OSD in large part on the independently developed Debian Free Software Guidelines (DFSG). The OSD was thus the culmination of decades of thought and consideration, and primarily developed by a third-party (Debian) — which provided a balance on OSI's authority. (Interestingly, some folks from Debian are attempting to check OSI's authority again due to the premature publication of the OSAID.)
OSI claims that they must move quickly so that they can counter the software companies from coopting the term “open source” for their own aims. But OSI failed to pursue trademark protection for “open source” in the early days, so the OSI can't stop Mark Zuckerberg and his cronies in any event from using the “open source” moniker for his Facebook and Instagram products — let alone his new Llama product. Furthermore, OSI's insistence that the definition was urgently needed and that the definition be engineered as a retrofit to apply to an existing, available system has yielded troublesome results. Simply put, OSI has a tiny sample set to examine, in 2024, of what LLM-backed generative AI systems look like. To make a final decision about the software freedom and rights implications of such a nascent field led to an automatic bias to accept the actions of first movers as legitimate. By making this definition official too soon, OSI has endorsed demonstrably bad LLM-backed generative AI systems as “open source” by definition!
OSI also disenfranchised the users and content creators in this process. FOSS activists should be engaging with the larger discussions with impacted communities of content creators about what “open source” means to them, and how they feel about incorporation of their data in the training sets into these third-party systems. The line between data and code is so easily crossed with these systems that we cannot rely on old, rote conclusions that the “data is separate and can be proprietary (or even unavailable), and yet the system remains ‘open source’”. That adage fails us when analyzing this technology, and we must take careful steps — free from the for-profit corporate interest of AI fervor — as we decide how our well-established philosophies apply to these changes.
FOSS activists err when we unilaterally dictate and define what is ethical, moral, open and Free in areas outside of software. Software rights theorists can (and should) make meaningful contributions in these other areas, but not without substantial collaboration with those creative individuals who produce the source material. Where were the painters, the novelists, the actors, the playwrights, the musicians, and the poets in the OSAID drafting process? The OSD was (of course) easier because our community is mostly programmers and developers (or folks adjacent to those fields); software creators knew best how to consider philosophical implications of pure software products. The OSI, and the folks in its leadership, definitely know software well, but I wouldn't name any of them (or myself) as great thinkers in these many areas outside software that are noticeably impacted by the promulgation of LLMs that are trained on those creative works. The Open Source community remains consistently in danger of excessive insularity, and the OSAID is an unfortunate example of how insular we can be.
Meanwhile, I have spent literally months of time over the last 30 years trying to make sure the coalition of software freedom & rights activists remained in basic congruence (at least publicly) with those (like OSI) who are oriented towards a more for-profit and corporate open source approach. Until today, I was always able to say: “I believe that anything the OSI calls ‘open source’ gives you all the rights and freedoms that you deserve”. I now cannot say that again unless/until the OSI revokes the OSAID. Unfortunately, that Rubicon may have now been permanently crossed! OSI has purposely made it politically unviable for them to revoke the OSAID. Instead, they plan only incremental updates to the OSAID. Once entities begin to rely on this definition as written, OSI will find it nearly impossible to later declare systems that were “open source” under 1.0 as no longer so (under later versions). So, we are likely stuck with OSAID's key problems forever. OSI undermines its position as a philosophical leader in Open Source as long as OSAID 1.0 stands as a formal defintion.
I truly don't know for sure (yet) if the only way to respect user rights in an LLM-backed generative AI system is to only use training sets that are publicly available and licensed under Free Software licenses. I do believe that's the ideal and preferred form for modification of those systems. Nevertheless, a generally useful technical system that is built by collapsing data (in essence, via highly lossy compression) into a table of floating point numbers is philosophically much more complicated than binary software and its Corresponding Source. So, having studied the issue myself, I believe the Socratic Epiphany currently applies. Perhaps there is an acceptable spot for compromise regarding the issues of training set licensing, availability and similar reproducibility issues. My instincts, after 25 years as a software rights philosopher, lead me to believe that it will take at least a decade for our best minds to find a reasonable answer on where the bright line is of acceptable behavior with regard to these AI systems. While OSI claims their OSAID is humble, I beg to differ. The humble act now is to admit that it was just too soon to publish a “definition” and rebrand these the OSAID 1.0 as “current recommendations”. That might not grab as many headlines or raise as much money as the OSAID did, but it's the moral and ethical way out of this bad situation.
Finally, rather than merely be a pundit on this matter, I am instead today putting myself forward to try to be part of the solution. I plan to run for the OSI Board of Directors at the next elections on a single-issue platform: I will work arduously for my entire term to see the OSAID repealed, and republished not as a definition, but merely recommendations, and to also issue a statement that OSI published the definition sooner than was appropriate. I'll write further about the matter as the next OSI Board election approaches. I also call on other software rights activists to run with me on a similar platform; the OSI has myriad seats that are elected by different constituents, so there is opportunity to run as a ticket on this issue. (Please contact me privately if you'd like to be involved with this ticket at the next OSI Board election. Note, though, that election results are not actually binding, as OSI's by-laws allow the current Board to reject results of the elections.)
Excitement for GPL enforcement at Linux Plumbers
by
on October 3, 2024We were excited and very happy to participate in Linux Plumbers Conference this year, which happened last month (Sep 18-20) in Vienna. As one of the premiere programs using a software right to repair license (GPLv2), Linux is crucial for the future of software freedom in our devices, from those we use to develop and write new code, to the phones many of us carry with us, to the many appliances and even cars that bring conveniences to our lives. And so we were delighted to discuss Linux and its role in our connected future with Linux kernel developers and other enthusiasts who attended this technical conference.
We hosted a BoF, Let's talk about GPL and LGPL enforcement!, which brought dozens of developers together to discuss the hard questions of how we can ensure that Linux's license is enforced so people can get the code they're entitled to, and the current state of GPL and LGPL enforcement across the board. After some discussion of how often companies use software under the GPL and LGPL without honoring the license terms (it's unfortunately very very common), we fielded some questions about source candidates that people had received. The first example that a participant provided as a positive example of a company meeting its obligations turned out to actually be from a company that SFC had sued in the past, showing that SFC's prior enforcement efforts were helping to change behavior, causing companies to provide GPL/LGPL source code when they hadn't before.
The discussion moved on to how we can bring the next generation of developers into the Linux community, so they can keep improving the Linux kernel in the coming decades. It was noted that a lot of new computer users aren't getting the same computing environment that most Linux developers grew up with. In particular, most Linux developers today started computing with desktop or laptop computers that gave them a wide range of software options, and easy ways to switch operating systems and other key software. However, today most new computer users are getting less capable devices, not because they are less powerful, but because the devices don't have the same malleability and accessibility as they did two decades ago, which is due in part to GPL violations where the user is prevented from reinstalling modified Linux or other software onto their device.
This really struck me, as I had many conversations in the "hallway track" where I asked people how they got into FOSS, and the responses were invariably a version of "to do more interesting things with my computer". It was clear that the computing devices of the 90s and early 2000s really promoted this developer mindset, and that we would have to keep the momentum going to ensure that new developers would have the same opportunities. This leaves us with a mission to make sure that as computing platforms change, we retain the freedoms that enabled the current generation of technology to flourish.
While GPL enforcement isn't the only factor in ensuring people can access developer tools and make meaningful changes to their devices, it is certainly an important piece of the puzzle, given everything we heard at Plumbers this year. With large percentages of Linux devices still distributed without giving users the freedoms that Linux's license is designed to give them, GPL enforcement is immensely important, as our discussions at Plumbers and elsewhere remind us.
The feedback from the BoF was overwhelmingly positive, and we were so happy to be able to take questions, share information, connect with longtime contributors and meet newcomers with such a keen interest in copyleft and enforcement. As always, we invite feedback about this work. You can email us anytime at compliance@sfconservancy.org, and we'll be scheduling some synchronous sessions later in the year.
In the meantime, we are proud to continue the work to ensure that everyone can repair and modify the software on their Linux devices, and everything else using software right-to-repair licenses, for current and future generations of software users and developers.
Prioritizing software right to repair: engaging corporate response teams
by
on February 3, 2024Across organizations who develop and deploy software, there are a wide range of time-sensitive concerns that arise. Perhaps the most diligent team that responds to such time-sensitive concerns is the cybersecurity team. It is crucial for them to quickly understand the security concern, patch it without introducing any regressions, and deploy it. In extreme cases this is all done within a few hours — a monumental task crammed into less time than a dinner party (and often replacing such a social event at the last minute; these teams are truly dedicated).
Many other teams exist across organizations for different levels of risk and concern. In our experience, on average among many companies, the team that receives among the lowest priorities is the team that responds to concerns about a company's copyleft compliance. Now we can think of some reasons for this: the team is often not connected to the team that collated the software containing copylefted code, or that latter team was not given proper instruction for how to comply with the licenses (and/or does not read the licenses themselves). So the team responding when someone notes a copyleft compliance deficiency is ill-equipped to handle it, and is often stonewalled by developer teams when they ask them for help, so the requests for correct source code under copyleft licenses usually languish.
With this in mind, we at SFC are helping prioritize the copyleft compliance concerns an organization may face due to some of the above. To reflect the importance of teams responding to copyleft compliance concerns, we recommend that companies create a team that we are calling a "Copyleft Compliance Incident Response Team" (CCIRT). This will help convey to management the importance of properly staffing the team, but also how it must be taken seriously by other teams that the CCIRT relies on to respond to incidents. Where companies employ Compliance Officers, they will likely be obvious leaders for this team.
Now some companies may not need a CCIRT. Unlike security vulnerabilities, failing to comply with copyleft licenses is entirely preventable. If you know your company already has policies and procedures that yield compliant results (of the same form as compliant source candidates that we praise in the comments on Use The Source), then there is no need for a CCIRT. However, our experience shows that most companies do not have such policies and procedures, in which case a CCIRT is necessary until such policies and procedures can reliably produce compliant source candidates from the start.
We recently launched Use The Source (alluded to above), which helps device owners and companies see whether source code candidates (the most important part of copyleft compliance) are giving users their software right to repair, i.e. whether they comply with the copyleft licenses they use. We realize companies may be concerned about SFC publishing their source candidates before they have had a chance to double-check them for compliance, due to some of the issues with policies and procedures mentioned above. As a result, we are giving companies the opportunity to be notified before we post a source candidate of theirs, so that they can take up to 7 days to update the candidate with any fixes they feel may be necessary before we post it. And the sooner a company contacts us, the better, as we are offering up to 37 days from the launch of Use The Source before we publish candidates we receive. See our CCIRT notification timeline for details. For historical purposes, the additional grace period that we provided at launch time is detailed here.
We hope that this new terminology will help organizations prioritize copyleft compliance appropriately, and that everyone can benefit from the shared discussions of source candidates and their compliance with copyleft licenses. We look forward to working with companies and device owners to promote exceptional examples of software right to repair (through our comments on Use The Source) as we find them.
Supporter Interview with Elijah (and Oliver!) Voigt
by
on January 15, 2024CC-BY-NA 4.0 Lucy Voigt
Thanks so much to one of our matching supporters, The Voigt Family! We're so happy to highlight a young family involved in free software and hear from about what they think about our work and the future. Read on to hear from Eli from a quick interview we did!
SFC:Tell us a bit about yourself! Where are you from, what are some of your hobbies? Social media?
Eli: I moved from Chicago to Portland as a tween. I have since adopted many Pacific Northwest hobbies like hiking, camping, and enjoying microbrews.
SFC: Why do you care about software freedom? How long have you been involved?
Eli: In college (almost 10 years ago? Oh no.) I helped run the Oregon State University Linux Users Group (OSU LUG) where we ran InstallFests and gave talks on different Open Source tools. Prior to that I used open source software like Linux and Blender to produce 3D art.
Software Freedom is important to me because world class software tools should be accessible to everybody. Growing up middle class I had the privilege of a computer and free time, but I couldn't afford expensive 3D software like Adobe. Thankfully I got into Blender because it was free but also because it was good!
I definitely think of Software Freedom as a spectrum. For example: using Blender on Windows is a win compared with using Adobe products.
SFC: How do you use free software in your life?
Eli: I use Linux and free software whenever I can. I also run a physical server in my basement which hosts instances of open source services like Gitea for friends and family. Being a nights-and-weekends Sysadmin isn't for everybody but I love it!
SFC: On the spectrum on developer to end user, where do you lie? And how do you think we could do better bridging that divide?
Eli: I am definitely more of a Developer, and I struggle with bringing co-workers, friends, and family into the fold of Free Software. When a tool is Free, Convenient, and Good people are more than happy to use it. Beyond that though I have no idea!
SFC: What's got you most excited from the past year of our work?
Eli: I was a huge fan of FOSSY! I could only make the first day because we had a BABY during the conference. The one day I went I got to speak to Andrew Kelley (of Ziglang) and I learned about running AI models on my laptop which was enlightening and fun! I also volunteered and got to see so many community folks for the first time since COVID.
SFC: What issues happened this past year that you were happy we spoke about?
Eli: I think the work you're doing with Right to Repair is really meaningful. It's the kind of thing every consumer agrees with and wants but we still need to fight for!
SFC: Do you think we are doing a good job reaching a wider audience and do you see us at places you expect?
Eli: I am sure running a conference like FOSSY, especially in a post-COVID-lockdown world, is challenging but really helped me feel connected to the SF Conservancy and the community around your work. I can't wait to see it grow over the coming years.
SFC: Have you been involved with any of our member projects in the past?
Eli: I am a huge fan of Busybox! When I put on my system administrator hat (at work and for fun) I use it every day.
SFC: What other organizations are you supporting this year? charities, local, non-tech, etc
Eli: A few of my recurring donations I want to plug:
- My local public broadcasting channel: Oregon Public Broadcasting
- The Wayback Machine
- My go-to for Climate Change stories: Grist
SFC: Did you have the first FOSSY Baby?
Eli: Yes! His name is Oliver and he just turned 6 months old (as of January 15)!
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64