Parsing GitHub’s data on queer participation in FOSS communities
byon June 6, 2017
Earlier this year, GitHub conducted a broad survey of “those who use, build, and maintain open source software.” They just released the results, and for those of us who care deeply about the inclusiveness of FOSS communities, it’s a lot of sobering reading. There’s still a dearth of women participating. It also provides numbers to incidents of bad behavior, and the impacts those have on our communities.
There’s potentially one bright spot in the demographic data, though—and you get the sense the authors were happy to find it, too, since they call it out themselves. They note:
Along other dimensions [than gender], representation is stronger: 1% of respondents identify as transgender (including 9% of women…), and 7% identify as lesbian, gay, bisexual, asexual, or another minority sexual orientation.
As far as I know, this is the first attempt to broadly quantify queer participation in the FOSS community, and I’m really grateful GitHub made it. As discussions about diversity in our communities have come to the fore, I’ve been frustrated that it’s been hard to include queer identities in them, because we didn’t have basic information like whether or not we’re even underrepresented in the first place. GitHub’s results start to help us answer those questions.
I say start to help us answer them, because no one survey will ever answer them authoritatively. Before people run out to declare we’re succeeding at building queer-inclusive communities, I want to contextualize these results a little to help people better understand what they do and don’t tell us.
One limit in this data is in the audience surveyed. GitHub “collected responses from 5,500 randomly sampled respondents sourced from over 3,800 open source repositories on GitHub.com, and over 500 responses from a non-random sample of communities that work on other platforms.” This skews the audience towards relatively technical participants in FOSS communities in a couple of different ways. First, surveying people who are active on GitHub or comparable development platforms means we’re only surveying people who work directly with the tools of developing software. This survey doesn’t collect responses from people who draw and post UI mockups, e-mail in suggested revisions to documentation, or answer other users’ questions on external forums. More than anything, I would love to see a similar survey conducted with a more expansive view of who participates in FOSS.
Second, including more respondents from GitHub biases the audience toward people who are working on newer or more modernized projects. Think of all the FOSS projects that predate GitHub and still host their development elsewhere: GNU, GNOME, Firefox, LibreOffice… when you think about the applications individuals use every day, these are a lot of the most popular ones. Including 500 responses from other communities helps mitigate that, but it’s not clear that’s a representative ratio, and the fact that they were chosen non-randomly is less than ideal too (although I recognize it’s not obvious how to incorporate those responses in a way that would be both random and fair).
Another reason these results aren’t authoritative is that sexual orientation and gender identity are complex. A single question about each on a survey will never be sufficient to accurately capture the community’s full diversity. Most survey results are sensitive to how their questions are worded, and this is famously true for these sorts of questions about identity. Wikipedia’s article “Demographics of sexual orientation” provides a good primer on these issues if you want to learn more. Briefly, it matters a lot whether you ask whether the respondent identifies themselves a certain way, versus whether others would identify them that way, or whether they’ve engaged in activities that could be classified that way. Words like “gay” are also categories that were invented in the west, so people from other countries and cultures may not recognize or identify themselves with them. I think GitHub’s survey asks the two most useful yes/no questions you can ask to inform discussions about queer participation in FOSS, but there’s lots of room for other surveys to dig deeper on these topics.
None of this is to say the survey is flawed or should’ve been done differently. There are many trade-offs involved in designing a survey like this, and I think the trade-offs GitHub made are both clear and justifiable. The best way to understand where we truly stand is not to try to craft a single perfect survey, but to have many surveys with different structures. Then we can learn the most by comparing and contrasting their results. I hope more surveys follow GitHub’s lead to ask about sexual orientation and gender identity, whether they’re small projects surveying their users, large cross-community surveys like this, or anywhere in between.
All that said, the numbers in these results seem to be on the high end when compared against similar surveys of large general populations. I think the authors are right to call them out as a bright spot, and I’m personally encouraged by them too.
Let’s be optimistic for a moment and and assume these results mean that queers are at least proportionally represented in FOSS communities. Does that mean we’re queer-friendly?
Not necessarily. Just like a workplace can have both gender balance and rampant sexism—in wage gaps, in promotion choices, in who does and doesn’t get heard at meetings—our communities can have both proportional queer participation and hostility toward us. Identity policing, bi erasure, transphobia—we see these problems in spaces that are explicitly or even exclusively queer. Of course they can arise in FOSS communities too.
While we work on getting more numbers, we should also be working to defend against these problems. There are a couple of concrete things we can do.
First, we should be working to adopt strong codes of conduct in more of our communites. Any code of conduct worth its salt like Geek Feminism’s already prohibits harassment based on gender identity and sexual orientation. We should be joining this work, both out of self-interest and to help our allies who have been looking out for us in turn.
(An aside to my fellow queer men: this goes extra for us, because this is one of those times when we can wield our male privilege as a force for good. Since it’s mostly been women leading the charge for codes of conduct so far, it’s easy for opponents to try to minimize this work as women “just” advocating for themselves. Tell your community you want a code of conduct, tell them you want it for your own wellbeing, and shut down that train of thought before it even leaves the station.)
Second, us queers need to be out more in our communities, to build personal networks that can identify, discuss, and resolve these issues when they arise. This is easier said than done. Most of our interactions in the community happen on channels focused on getting work done: planning development, putting together documentation, reviewing changes. There’s rarely a good time to say “hey, I’m queer” in these spaces. It’s easy for it not to come up until the annual conference after-party.
We have to be more out than that, for the sake of new or occasional participants. When queers are considering getting involved in a project, seeing people like them already invested can help demonstrate to them that this is a place where they’re welcome. If they’re harassed, they’re more likely to report it if it’s easy to find someone they feel will understand and be receptive.
We can’t wait to come out until the big meetup. We have to be out on the mailing lists, in the chat rooms, on social media. I’m not saying you have to make a dedicated coming out thread, but try putting a sign in your avatars, e-mail signatures, or personal bios. (Personally I love to paste the rainbow flag emoji 🏳️🌈 anywhere I can get away with it, but I know that symbol doesn’t work for everyone. Here’s to more representation in future Unicode standards.) Some people may ask you not to bring “politics” or “sexuality” into the community, but being out is more fundamental than that: it’s about making sure queer people can be in the space at all. If a straight person complains that you bring up your queerness too much, that means you’re undeniably out, and that’s the goal.
How the TC Heartland decision helps free and open source software
byon May 23, 2017
Yesterday, the United States Supreme Court published a decision that is likely to make it harder for patent holders to use frivolous infringement lawsuits to extort settlement fees. In the TC Heartland LLC v. Kraft Foods Group Brands LLC case, the Court ruled that patent holders can only file suit in the jurisdiction where the alleged infringer is incorporated. Prior to TC Heartland, US patent holders had more flexibility to file suit in multiple jurisdictions, and as a result would often select seemingly unrelated jurisdictions for strategic reasons.
The Eastern District Court in Texas is, by far, the most popular venue in the United States for patent holders to file suit, due to its reputation for plaintiff-friendly judges and aggressively brisk (and, therefore, cheaper) litigation schedules. The United States federal court system has ninety four district courts, yet over a third of all patent litigations filed in the United States in the first quarter of 2017 were filed in the Eastern District. And, traditionally, the overwhelming majority of such cases filed in the Eastern District have been brought by non-practicing entities ("NPEs"; unaffectionately known as "patent trolls") — patent holders who enforce patents without being engaged in the business of selling the inventions disclosed in the patents. The media has covered the remarkable growth of a cottage industry centered around patent litigation in Marshall, Texas, the small town where the Eastern District is located. Many NPEs have built their business models around the economies of scale and efficiencies of pushing frivolous suits through this single venue. Hopefully, the fresh burden of having to file suit on a defendant's "home turf" will reduce the volume of nuisance patent litigation — and disrupt the business models that fund it.
As a public charity, Conservancy is not a traditional target for NPEs: we don't generate the kind of product-related revenue streams that NPEs typically hold for ransom in exchange for quick settlement payments. That said, we acknowledge that the threat of NPE litigation casts a shadow on the entire technology sector, including on free and open source communities. We believe that community-vetted free and open source licenses are sufficient to create a pool of explicit and/or implied patent licenses between contributors and users. But, that hasn't stopped many a nervous in-house counsel from using layers of extraneous paperwork to reduce the patent exposure they think participating in a free and open source software project may create. We hope that the TC Heartland decision sends a signal to would-be NPEs that the US judiciary will no longer be as complicit in facilitating nuisance patent litigation. We also hope that software developers and users of all types are encouraged by the decision, and are less likely to allow fear, uncertainty, and doubt around NPE patent exposure chill their participation in free and open source software communities.
FSF's Stallman Applauds Conservancy's Linux Enforcement
byon May 11, 2017
In his statement, Stallman reiterates the importance of the Principles of Community-Oriented GPL Enforcement and the need for lawsuits, but only as a last resort.
We thank RMS for his support of our work and for asking more people to become Conservancy Supporters.
Why GPL Compliance Tutorials Should Be Free as in Freedom
byon April 25, 2017
I am honored to be a co-author and editor-in-chief of the most comprehensive, detailed, and complete guide on matters related to compliance of copyleft software licenses such as the GPL. This book, Copyleft and the GNU General Public License: A Comprehensive Tutorial and Guide (which we often call the Copyleft Guide for short) is 155 pages filled with useful material to help everyone understand copyleft licenses for software, how they work, and how to comply with them properly. It is the only document to fully incorporate esoteric material such as the FSF's famous GPLv3 rationale documents directly alongside practical advice, such as the pristine example, which is the only freely published compliance analysis of a real product on the market. The document explains in great detail how that product manufacturer made good choices to comply with the GPL. The reader learns by both real-world example as well as abstract explanation.
However, the most important fact about the Copyleft Guide is not its useful and engaging content. More importantly, the license of this book gives freedom to its readers in the same way the license of the copylefted software does. Specifically, we chose the Creative Commons Attribution Share-Alike 4.0 license (CC BY-SA) for this work. We believe that not just software, but any generally useful technical information that teaches people should be freely sharable and modifiable by the general public.
The reasons these freedoms are necessary seem so obvious that I'm surprised I need to state them. Companies who want to build internal training courses on copyleft compliance for their employees need to modify the materials for that purpose. They then need to be able to freely distribute them to employees and contractors for maximum effect. Furthermore, like all documents and software alike, there are always “bugs”, which (in the case of written prose) usually means there are sections that fail to communicate to maximum effect. Those who find better ways to express the ideas need the ability to propose patches and write improvements. Perhaps most importantly, everyone who teaches should avoid NIH syndrome. Education and science work best when we borrow and share (with proper license-compliant attribution, of course!) the best material that others develop, and augment our works by incorporating them.
These reasons are akin to those that led Richard M. Stallman to write his
Software Should Be Free. Indeed, if you reread that essay now
— as I just did — you'll see that much of the damage and many of
the same problems to the advancement of software that RMS documents in that
essay also occur in the world of tutorial documentation about FLOSS
licensing. As too often happens in the Open Source community, though,
folks seek ways to proprietarize, for profit, any copyrighted work that
doesn't already have a copyleft license attached. In the field of copyleft
compliance education, we see the same behavior: organizations who wish to
control the dialogue and profit from selling compliance education seek to
proprietarize the meta-material of compliance education, rather than
sharing freely like the software itself. This yields an ironic
exploitation, since the copyleft license documented therein exists as a
strategy to assure the freedom to share knowledge. These educators tell
their audiences with a straight face:
Sure, the software is
free as in freedom, but if you want to learn how its license
works, you have to license our proprietary materials! This behavior
uses legal controls to curtail the sharing of knowledge, limits the
advancement and improvement of those tutorials, and emboldens silos of
know-how that only wealthy corporations have the resources to access and
afford. The educational dystopia that these organizations create is
precisely what I sought to prevent by advocating for software freedom for
While Conservancy's primary job provides non-profit infrastructure for Free Software projects, we also do a bit of license compliance work as well. But we practice what we preach: we release all the educational materials that we produce as part of the Copyleft Guide project under CC BY-SA. Other Open Source organizations are currently hypocrites on this point; they tout the values of openness and sharing of knowledge through software, but they take their tutorial materials and lock them up under proprietary licenses. I hereby publicly call on such organizations (including but not limited to the Linux Foundation) to license materials such as those under CC BY-SA.
I did not make this public call for liberation of such materials without first trying friendly diplomacy first. Conservancy has been in talks with individuals and staff who produce these materials for some time. We urged them to join the Free Software community and share their materials under free licenses. We even offered volunteer time to help them improve those materials if they would simply license them freely. After two years of that effort, it's now abundantly clear that public pressure is the only force that might work0. Ultimately, like all proprietary businesses, the training divisions of Linux Foundation and other entities in the compliance industrial complex (such as Black Duck) realize they can make much more revenue by making materials proprietary and choosing legal restrictions that forbid their students from sharing and improving the materials after they complete the course. While the reality of this impasse regarding freely licensing these materials is probably an obvious outcome, multiple sources inside these organizations have also confirmed for me that liberation of the materials for the good of general public won't happen without a major paradigm shift — specifically because such educational freedom will reduce the revenue stream around those materials.
Of course, I can attest first-hand that freely liberating tutorial materials curtails revenue. Karen Sandler and I have regularly taught courses on copyleft licensing based on the freely available materials for a few years — most recently in January 2017 at LinuxConf Australia and at at OSCON in a few weeks. These conferences do kindly cover our travel expenses to attend and teach the tutorial, but compliance education is not a revenue stream for Conservancy. (By contrast, Linux Foundation generates US$3.8 million/year using proprietary training materials, per their 2015 Form 990, page 9, line 2c.) While, in an ideal world, we'd get revenue from education to fund our other important activities, we believe that there is value in doing this education as currently funded by our individual Supporters; these education efforts fit withour charitable mission to promote the public good. We furthermore don't believe that locking up the materials and refusing to share them with others fits a mission of software freedom, so we never considered such as a viable option. Finally, given the institutionally-backed FUD that we continue to witness, we seek to draw specific attention to the fundamental difference in approach that Conservancy (as a charity) take toward this compliance education work. (My recent talk on compliance covered on LWN includes some points on that matter, if you'd like further reading).
0One notable exception to these efforts was the success of my colleague, Karen Sandler's (and others) in convincing the OpenChain project to choose CC-0 licensing. However, OpenChain has released only 68 presentation slides, and a 12-page specification, and some of the slides simply encourage people to go buy an LF proprietary training course!