Conservancy Blog
Displaying posts tagged licensing
Open Source AI Definition Erodes the Meaning of “Open Source”
by
on October 31, 2024This week, the Open Source Initiative (OSI) made their new Open Source Artificial Intelligence Definition (OSAID) official with its 1.0 release. With this announcement, we have reached the moment that software freedom advocates have feared for decades: the definition of “open source” — with which OSI was entrusted — now differs in significant ways from the views of most software freedom advocates.
There has been substantial acrimony during the drafting process of OSAID, and this blog post does not summarize all the community complaints about the OSAID and its drafting process. Other bloggers and the press have covered those. The TLDR here, IMO is simply stated: the OSAID fails to require reproducibility by the public of the scientific process of building these systems, because the OSAID fails to place sufficient requirements on the licensing and public disclosure of training sets for so-called “Open Source” systems. The OSI refused to add this requirement because of a fundamental flaw in their process; they decided that “there was no point in publishing a definition that no existing AI system could currently meet”. This fundamental compromise undermined the community process, and amplified the role of stakeholders who would financially benefit from OSI's retroactive declaration that their systems are “open source”. The OSI should have refrained from publishing a definition yet, and instead labeled this document as ”recommendations” for now.
As the publication date of the OSAID approached, I could not help but
remember a fascinating statement that Donald E. Knuth, one of the founders
of the field of computer
science, once
said: [M]y role is to be on the bottom of things. … I try to
digest … knowledge into a form that is accessible to people who don't
have time for such study
. If we wish to engage in the
highly philosophical (and easily politically corruptible) task
of defining what terms like “software freedom” and
“open source” mean, we must learn to be on the “bottom of
things”. OSI made an unforced error in this regard. While they could
have humbly announced this as “recommendations” or “guidelines”,
they instead formalized it as a “definition” — with equivalent authority to their
OSD.
Yet, OSI itself only turned its attention to AI only recently, when they announced their “deep dive” — for which Microsoft's GitHub was OSI's “Thought Leader”. OSI has responded too rapidly to this industry ballyhoo. Their celerity of response made OSI an easy target for regulatory capture.
By comparison, the original OSD was first published in February 1999. That was at least twelve years after the widespread industry adoption of various FOSS programs (such as the GNU C Compiler and BSD). The concept was explored and discussed publicly (under the moniker “Free Software”) for decades before it was officially “defined”. The OSI announced itself as the “marketing department for Free Software” and based the OSD in large part on the independently developed Debian Free Software Guidelines (DFSG). The OSD was thus the culmination of decades of thought and consideration, and primarily developed by a third-party (Debian) — which provided a balance on OSI's authority. (Interestingly, some folks from Debian are attempting to check OSI's authority again due to the premature publication of the OSAID.)
OSI claims that they must move quickly so that they can counter the software companies from coopting the term “open source” for their own aims. But OSI failed to pursue trademark protection for “open source” in the early days, so the OSI can't stop Mark Zuckerberg and his cronies in any event from using the “open source” moniker for his Facebook and Instagram products — let alone his new Llama product. Furthermore, OSI's insistence that the definition was urgently needed and that the definition be engineered as a retrofit to apply to an existing, available system has yielded troublesome results. Simply put, OSI has a tiny sample set to examine, in 2024, of what LLM-backed generative AI systems look like. To make a final decision about the software freedom and rights implications of such a nascent field led to an automatic bias to accept the actions of first movers as legitimate. By making this definition official too soon, OSI has endorsed demonstrably bad LLM-backed generative AI systems as “open source” by definition!
OSI also disenfranchised the users and content creators in this process. FOSS activists should be engaging with the larger discussions with impacted communities of content creators about what “open source” means to them, and how they feel about incorporation of their data in the training sets into these third-party systems. The line between data and code is so easily crossed with these systems that we cannot rely on old, rote conclusions that the “data is separate and can be proprietary (or even unavailable), and yet the system remains ‘open source’”. That adage fails us when analyzing this technology, and we must take careful steps — free from the for-profit corporate interest of AI fervor — as we decide how our well-established philosophies apply to these changes.
FOSS activists err when we unilaterally dictate and define what is ethical, moral, open and Free in areas outside of software. Software rights theorists can (and should) make meaningful contributions in these other areas, but not without substantial collaboration with those creative individuals who produce the source material. Where were the painters, the novelists, the actors, the playwrights, the musicians, and the poets in the OSAID drafting process? The OSD was (of course) easier because our community is mostly programmers and developers (or folks adjacent to those fields); software creators knew best how to consider philosophical implications of pure software products. The OSI, and the folks in its leadership, definitely know software well, but I wouldn't name any of them (or myself) as great thinkers in these many areas outside software that are noticeably impacted by the promulgation of LLMs that are trained on those creative works. The Open Source community remains consistently in danger of excessive insularity, and the OSAID is an unfortunate example of how insular we can be.
Meanwhile, I have spent literally months of time over the last 30 years trying to make sure the coalition of software freedom & rights activists remained in basic congruence (at least publicly) with those (like OSI) who are oriented towards a more for-profit and corporate open source approach. Until today, I was always able to say: “I believe that anything the OSI calls ‘open source’ gives you all the rights and freedoms that you deserve”. I now cannot say that again unless/until the OSI revokes the OSAID. Unfortunately, that Rubicon may have now been permanently crossed! OSI has purposely made it politically unviable for them to revoke the OSAID. Instead, they plan only incremental updates to the OSAID. Once entities begin to rely on this definition as written, OSI will find it nearly impossible to later declare systems that were “open source” under 1.0 as no longer so (under later versions). So, we are likely stuck with OSAID's key problems forever. OSI undermines its position as a philosophical leader in Open Source as long as OSAID 1.0 stands as a formal defintion.
I truly don't know for sure (yet) if the only way to respect user rights in an LLM-backed generative AI system is to only use training sets that are publicly available and licensed under Free Software licenses. I do believe that's the ideal and preferred form for modification of those systems. Nevertheless, a generally useful technical system that is built by collapsing data (in essence, via highly lossy compression) into a table of floating point numbers is philosophically much more complicated than binary software and its Corresponding Source. So, having studied the issue myself, I believe the Socratic Epiphany currently applies. Perhaps there is an acceptable spot for compromise regarding the issues of training set licensing, availability and similar reproducibility issues. My instincts, after 25 years as a software rights philosopher, lead me to believe that it will take at least a decade for our best minds to find a reasonable answer on where the bright line is of acceptable behavior with regard to these AI systems. While OSI claims their OSAID is humble, I beg to differ. The humble act now is to admit that it was just too soon to publish a “definition” and rebrand these the OSAID 1.0 as “current recommendations”. That might not grab as many headlines or raise as much money as the OSAID did, but it's the moral and ethical way out of this bad situation.
Finally, rather than merely be a pundit on this matter, I am instead today putting myself forward to try to be part of the solution. I plan to run for the OSI Board of Directors at the next elections on a single-issue platform: I will work arduously for my entire term to see the OSAID repealed, and republished not as a definition, but merely recommendations, and to also issue a statement that OSI published the definition sooner than was appropriate. I'll write further about the matter as the next OSI Board election approaches. I also call on other software rights activists to run with me on a similar platform; the OSI has myriad seats that are elected by different constituents, so there is opportunity to run as a ticket on this issue. (Please contact me privately if you'd like to be involved with this ticket at the next OSI Board election. Note, though, that election results are not actually binding, as OSI's by-laws allow the current Board to reject results of the elections.)
Excitement for GPL enforcement at Linux Plumbers
by
on October 3, 2024We were excited and very happy to participate in Linux Plumbers Conference this year, which happened last month (Sep 18-20) in Vienna. As one of the premiere programs using a software right to repair license (GPLv2), Linux is crucial for the future of software freedom in our devices, from those we use to develop and write new code, to the phones many of us carry with us, to the many appliances and even cars that bring conveniences to our lives. And so we were delighted to discuss Linux and its role in our connected future with Linux kernel developers and other enthusiasts who attended this technical conference.
We hosted a BoF, Let's talk about GPL and LGPL enforcement!, which brought dozens of developers together to discuss the hard questions of how we can ensure that Linux's license is enforced so people can get the code they're entitled to, and the current state of GPL and LGPL enforcement across the board. After some discussion of how often companies use software under the GPL and LGPL without honoring the license terms (it's unfortunately very very common), we fielded some questions about source candidates that people had received. The first example that a participant provided as a positive example of a company meeting its obligations turned out to actually be from a company that SFC had sued in the past, showing that SFC's prior enforcement efforts were helping to change behavior, causing companies to provide GPL/LGPL source code when they hadn't before.
The discussion moved on to how we can bring the next generation of developers into the Linux community, so they can keep improving the Linux kernel in the coming decades. It was noted that a lot of new computer users aren't getting the same computing environment that most Linux developers grew up with. In particular, most Linux developers today started computing with desktop or laptop computers that gave them a wide range of software options, and easy ways to switch operating systems and other key software. However, today most new computer users are getting less capable devices, not because they are less powerful, but because the devices don't have the same malleability and accessibility as they did two decades ago, which is due in part to GPL violations where the user is prevented from reinstalling modified Linux or other software onto their device.
This really struck me, as I had many conversations in the "hallway track" where I asked people how they got into FOSS, and the responses were invariably a version of "to do more interesting things with my computer". It was clear that the computing devices of the 90s and early 2000s really promoted this developer mindset, and that we would have to keep the momentum going to ensure that new developers would have the same opportunities. This leaves us with a mission to make sure that as computing platforms change, we retain the freedoms that enabled the current generation of technology to flourish.
While GPL enforcement isn't the only factor in ensuring people can access developer tools and make meaningful changes to their devices, it is certainly an important piece of the puzzle, given everything we heard at Plumbers this year. With large percentages of Linux devices still distributed without giving users the freedoms that Linux's license is designed to give them, GPL enforcement is immensely important, as our discussions at Plumbers and elsewhere remind us.
The feedback from the BoF was overwhelmingly positive, and we were so happy to be able to take questions, share information, connect with longtime contributors and meet newcomers with such a keen interest in copyleft and enforcement. As always, we invite feedback about this work. You can email us anytime at compliance@sfconservancy.org, and we'll be scheduling some synchronous sessions later in the year.
In the meantime, we are proud to continue the work to ensure that everyone can repair and modify the software on their Linux devices, and everything else using software right-to-repair licenses, for current and future generations of software users and developers.
Prioritizing software right to repair: engaging corporate response teams
by
on February 3, 2024Across organizations who develop and deploy software, there are a wide range of time-sensitive concerns that arise. Perhaps the most diligent team that responds to such time-sensitive concerns is the cybersecurity team. It is crucial for them to quickly understand the security concern, patch it without introducing any regressions, and deploy it. In extreme cases this is all done within a few hours — a monumental task crammed into less time than a dinner party (and often replacing such a social event at the last minute; these teams are truly dedicated).
Many other teams exist across organizations for different levels of risk and concern. In our experience, on average among many companies, the team that receives among the lowest priorities is the team that responds to concerns about a company's copyleft compliance. Now we can think of some reasons for this: the team is often not connected to the team that collated the software containing copylefted code, or that latter team was not given proper instruction for how to comply with the licenses (and/or does not read the licenses themselves). So the team responding when someone notes a copyleft compliance deficiency is ill-equipped to handle it, and is often stonewalled by developer teams when they ask them for help, so the requests for correct source code under copyleft licenses usually languish.
With this in mind, we at SFC are helping prioritize the copyleft compliance concerns an organization may face due to some of the above. To reflect the importance of teams responding to copyleft compliance concerns, we recommend that companies create a team that we are calling a "Copyleft Compliance Incident Response Team" (CCIRT). This will help convey to management the importance of properly staffing the team, but also how it must be taken seriously by other teams that the CCIRT relies on to respond to incidents. Where companies employ Compliance Officers, they will likely be obvious leaders for this team.
Now some companies may not need a CCIRT. Unlike security vulnerabilities, failing to comply with copyleft licenses is entirely preventable. If you know your company already has policies and procedures that yield compliant results (of the same form as compliant source candidates that we praise in the comments on Use The Source), then there is no need for a CCIRT. However, our experience shows that most companies do not have such policies and procedures, in which case a CCIRT is necessary until such policies and procedures can reliably produce compliant source candidates from the start.
We recently launched Use The Source (alluded to above), which helps device owners and companies see whether source code candidates (the most important part of copyleft compliance) are giving users their software right to repair, i.e. whether they comply with the copyleft licenses they use. We realize companies may be concerned about SFC publishing their source candidates before they have had a chance to double-check them for compliance, due to some of the issues with policies and procedures mentioned above. As a result, we are giving companies the opportunity to be notified before we post a source candidate of theirs, so that they can take up to 7 days to update the candidate with any fixes they feel may be necessary before we post it. And the sooner a company contacts us, the better, as we are offering up to 37 days from the launch of Use The Source before we publish candidates we receive. See our CCIRT notification timeline for details. For historical purposes, the additional grace period that we provided at launch time is detailed here.
We hope that this new terminology will help organizations prioritize copyleft compliance appropriately, and that everyone can benefit from the shared discussions of source candidates and their compliance with copyleft licenses. We look forward to working with companies and device owners to promote exceptional examples of software right to repair (through our comments on Use The Source) as we find them.
How I watched a Motion for Summary Judgment hearing
by
on October 12, 2023In SFC's ongoing lawsuit against Vizio asking to receive the source code for the copylefted components on their TVs, last week we had a hearing with the judge to discuss the Motion for Summary Judgment that Vizio filed (requesting that the court reject our case before it even went to trial). A couple of our staff attended in-person (in an Orange County courthouse in Southern California) while others, like myself, watched remotely.
I was hoping to be able to use a standard interface to view the proceedings (such as streaming video provided to a <video/> element on a webpage), but unfortunately that was not available. The only way to view hearings in this court remotely is via Zoom, which SFC has talked about recently. This presented me with a conundrum - do I join via Zoom to see what was said? Or am I prevented from accessing this civic discourse because the court chooses not to use a standard video sharing method, preventing a large segment of society from taking part? As part of their normal practice, the court does not record (nor allow recording except through an official court reporter that can be hired by the parties to take a textual transcript) of proceedings, so I needed to decide with some urgency how to proceed, as failing to join now would mean I couldn't see the hearing at all, neither now nor in the future.
I am not sure how other countries approach this problem, and maybe it is no different elsewhere, but it did concern me deeply how this technical decision to demand the use of proprietary software could leave so many people disenfranchised, both with respect to their legal system, and other public services as well.
As part of SFC's policy to allow the use proprietary software if it is critical to our mission, I decided that it was more important for me to be able to view the proceedings (and avoid charging many hundreds of dollars to SFC for an international flight and hotel). Note that SFC would never require this of me, and would gladly pay for me to attend in-person to avoid the proprietary software, but I felt personally it was the right decision for me to make in this context.
Once this dilemma was resolved (for better or worse), I went through the technical steps required to join the Zoom call for the court hearing, where I was presented with this text:
By clicking "Join", you agree to our {0} and {1}.
Now there were no links to {0} or {1}, so I made some guesses as to what I was agreeing to. In the best case, I was agreeing to nothing, and in the worst case I was agreeing that 0 and 1 provided the foundation for all humanity which, while potentially troubling, did have a certain appeal as a technologist. In any case, I clicked Join (possibly leaving an indelible mark on the future of the universe) and was at last able to observe the hearing, after dialing in by (SIP) phone for the audio, to reduce the amount of proprietary code being run for me to view the hearing.
The hearing event itself was familiar to those who have attended such court proceedings - there were many other cases heard that day, that touched on issues such as whether you could get a DUI while riding a horse (answer: yes), to much more serious and unfortunate clear instances of DARVO tactics in domestic disputes (which we hope will not ultimately sway the judge). It appeared the judge wanted to save our hearing for last, possibly due to its complexity or novelty. The lawyers in most of the other matters appeared remotely.
Once the other cases were heard, the judge turned to us, with both our lawyers and Vizio's lawyer physically present in the courtroom. She asked Vizio to go first (since it was Vizio's motion), and their lawyer went over the points from their Motion for Summary Judgment, eventually clarifying seven specific objections Vizio had made to our case in its motion - the judge had clearly read our brief and wanted to know more on these seven topics given how we addressed them.
It was a bit jarring to hear my own name mentioned in court, as one of the objections was to an email I had sent to Vizio when we informed them they were violating the GPL. While not a problem for our case, it reminded me of the need to be extra careful, since anything we say to a company who violates the GPL can end up in court. But it also reminded me of why it is important we do this: if people feel scared to file lawsuits when companies fail to comply with the software freedom licenses they choose to use, then we at SFC must step up and use our resources and substantial experience to make sure the unfounded claims by companies of how they should be able to get away with violating are firmly rebuffed.
After Vizio's lawyer had finished, the judge turned to our lawyers for a response. Our lawyers presented an excellent litany of reasons why SFC's case is not preempted by copyright (for example, there is an extra element, provision of source code, that copyright remedies do not provide), and why we have rights as a third-party to the GPL contract between Vizio and the developers of the software that Vizio chose to use (as an example, the GPL itself clearly states, "You [Vizio] must make sure that they [third-party recipients such as SFC], too, receive or can get the source code").
Our lawyers finished with some examples of how contract law works, where if you agree to make some copies, but don't pay the money required in the contract, then that's a contract claim, not a copyright claim. In that case, a party has stiffed the beneficiary on the money. And in our case, as our lawyer so eloquently ended the hearing: "Vizio has stiffed us on the code".
We are extremely proud of our lawyers in this case, especially the two lawyers who argued in-person for us on Thursday: Naomi Jane Gray and Don Thompson, as well our General Counsel Rick Sanders. Whether companies are held accountable for following the software right to repair licenses they choose to use is immensely important - they need to give us the same rights they have, and we're incredibly happy that our legal team are so laser-focused on this.
We look forward to hearing the judge's decision on this motion when it comes out (in the meantime, you can read the hearing transcript if you like). Whatever the result, we will keep fighting for your software rights, everywhere software is used, using the legal mechanisms available (when required), to make sure everyone can control their technology.