This Copyright Lawsuit Could Shape the Future of Generative AI

Algorithms that create art, text, and code are spreading fast—but legal challenges could throw a wrench in the works.
Photo collage of famous painters' portraits pieced together with bits of code
Illustration: Jacqui VanLiew; Getty Images

The tech industry might be reeling from a wave of layoffsa dramatic crypto-crash, and ongoing turmoil at Twitter, but despite those clouds some investors and entrepreneurs are already eyeing a new boom—built on artificial intelligence that can generate coherent textcaptivating images, and functional computer code. But that new frontier has a looming cloud of its own.

A class-action lawsuit filed in a federal court in California this month takes aim at GitHub Copilot, a powerful tool that automatically writes working code when a programmer starts typing. The coder behind the suit argues that GitHub is infringing copyright because it does not provide attribution when Copilot reproduces open-source code covered by a license requiring it.

The lawsuit is at an early stage, and its prospects are unclear because the underlying technology is novel and has not faced much legal scrutiny. But legal experts say it may have a bearing on the broader trend of generative AI tools. AI programs that generate paintings, photographs, and illustrations from a prompt, as well as text for marketing copy, are all built with algorithms trained on previous work produced by humans. 

Visual artists have been the first to question the legality and ethics of AI that incorporates existing work. Some people who make a living from their visual creativity are upset that AI art tools trained on their work can then produce new images in the same style. The Recording Industry Association of America, a music industry group, has signaled that AI-powered music generation and remixing could be a new area of copyright concern.

“This whole arc that we're seeing right now—this generative AI space—what does it mean for these new products to be sucking up the work of these creators?” says Matthew Butterick, a designer, programmer, and lawyer who brought the lawsuit against GitHub.

Copilot is a powerful example of the creative and commercial potential of generative AI technology. The tool was created by GitHub, a subsidiary of Microsoft that hosts the code for hundreds of millions of software projects. GitHub made it by training an algorithm designed to generate code from AI startup OpenAI on the vast collection of code it stores, producing a system that can preemptively complete large pieces of code after a programmer makes a few keystrokes. A recent study by GitHub suggests that coders can complete some tasks in less than half the time normally required when using Copilot as an aid. 

But as some coders quickly noticed, Copilot will occasionally reproduce recognizable snippets of code cribbed from the millions of lines in public code repositories. The lawsuit filed by Butterick and others accuses MicrosoftGitHub, and OpenAI of infringing on copyright because this code does not include the attribution required by the open-source licenses covering that code. 

Programmers have, of course, always studied, learned from, and copied each other's code. But not everyone is sure it is fair for AI to do the same, especially if AI can then churn out tons of valuable code itself, without respecting the source material’s license requirements. “As a technologist, I'm a huge fan of AI ,” Butterick says. “I'm looking forward to all the possibilities of these tools. But they have to be fair to everybody.”

Thomas Dohmke, the CEO of GitHub, says that Copilot now comes with a feature designed to prevent copying from existing code. “When you enable this, and the suggestion that Copilot would make matches code published on GitHub—not even looking at the license—it will not make that suggestion,” he says

Whether this provides enough legal protection remains to be seen, and the coming legal case may have broader implications. “Assuming it doesn’t settle, it’s definitely going to be a landmark case,” says Luis Villa, a coder turned lawyer who specializes in cases related to open source. 

Villa, who knows GitHub cofounder Nat Friedman personally, does not believe it is clear that tools like Copilot go against the ethos of open source and free software. “The free software movement in the ’80s and ’90s talked a lot about reducing the power of copyrights in order to increase people’s ability to code,” he says. “I find it a little bit frustrating that we're now in a position where some people are running around saying we need maximum copyright in order to protect these communities.”

Whatever the outcome of the Copilot case, Villa says it could shape the destiny of other areas of generative AI. If the outcome of the Copilot case hinges on how similar AI-generated code is to its training material, there could be implications for systems that reproduce images or music that matches the style of material in their training data. 

Anil Dash, the CEO of Glitch and a board member of the Electronic Frontier Foundation, says that the legal debate is just one part of a bigger adjustment set in train by generative AI. “When people see AI creating art, creating writing, and creating code, they think ‘What is all this, what does it mean to my business, and what does it mean to society?’” he says. “I don't think every organization has thought deeply about it, and I think that's sort of the next frontier.” As more people begin to ponder and experiment with generative AI, there will probably be more lawsuits too.