GitHub Copilot, GitHub’s service that intelligently suggests lines of code, is now available in a plan for enterprises months after launching for individual users and educators.
Called GitHub Copilot for business, the new plan, which costs $19 per user per month, comes with all the features in the single-license Copilot tier along with corporate licensing and policy controls. That includes a toggle that lets IT admins prevent suggested code that matches public code on GitHub from being shown to developers, a likely response to the intellectual property controversies brewing around Copilot.
Available as a downloadable extension for development environments including Microsoft Visual Studio, Noevim and JetBrains, Copilot is powered by an AI model called Codex, developed by OpenAI, that’s trained on billions of lines of public code to suggest additional lines of code and functions given the context of existing code. Copilot — which had over 400,000 subscribers as of August — can surface a programming approach or solution in response to a description of what a developer wants to accomplish (eg, “Say hello world”), drawing on its knowledge base and the current context.
At least a portion of the code on which Codex was trained is copyrighted or under a restrictive license, an issue with which some advocacy groups have taken issue. Users have been able to prompt Copilot to generate code from Quake, code snippets in personal codebases and example code from books like “Mastering JS” and “Think JavaScript”; GitHub itself admits that, about 1% of the time, Copilot suggestions contain code snippets longer than ~150 characters that match the training data.
GitHub claims that fair use, or the doctrine in US law that permits the use of copyrighted material without first having to obtain permission from the rightsholder, protects it in the event that Copilot was knowingly or unknowingly developed against copyrighted code. But not everyone agrees. The Free Software Foundation, a nonprofit to advocate for the free software movement, has called Copilot “unacceptable and unjust.” And Microsoft, GitHub and OpenAI are being south in a class action lawsuit that accuses them of violating copyright law by allowing Copilot to regurgitate sections of licensed code without providing credit.
GitHub’s liability aside, some legal experts have argued that Copilot could put companies at risk if they were to unwittingly incorporate copyrighted suggestions from the tool into their production software. As Elain Atwell notes in a piece on Kolide’s corporate blog, because Copilot strips code of its licenses, it’s difficult to tell which code is permissible to deploy and which might have incompatible terms of use.
GitHub’s attempt at rectifying this is a filter, first introduced to the Copilot platform in June, that checks code suggestions with their surrounding code of about 150 characters against public GitHub code and hides suggestions if there’s a match or “near match.” But it’s an imperfect measure. Tim Davis, a computer science professor at Texas A&M University, found that enabling the filter caused Copilot to emit large chunks of his copyrighted code including all attribution and license text.
@github copilot, with “public code” blocked, emits large chunks of my copyrighted code, with no attribution, no LGPL license. For example, the simple prompt “sparse matrix transpose, cs_” produces my cs_transpose in CSparse. My code on left, github on right. not ok pic.twitter.com/sqpOThi8nf
— Tim Davis (@DocSparse) October 16, 2022
GitHub plans to introduce additional features in 2023 aimed at helping developers make informed decisions about whether to use Copilot’s suggestions, including the ability to identify strings matching public code with a reference to those repositories. And for GitHub Copilot for business customers, GitHub claims it won’t retain code snippets for training or share code, regardless if the data comes from public repositories, private repositories, non-GitHub repositories or local files.
But it’s unclear whether those steps will be enough to allay companies’ fears over legal challenges.