July 9, 2025 - by Pamela Langham

Copyright Law and the Training of AI Models

Kadrey v. Meta Platforms, ___ F.Supp.3d ___, 2025 WL 1752484, Case No. 23-cv-03417 (Cal. N.D. June 25, 2025). 

The Copyright Act of 1976 (17 U.S.C. §§ 101 et. seq.)(the “Act”) grants creators exclusive rights over their work to encourage creativity and allow creators control over their work, including the right to financially benefit from their intellectual labor. However, copyright protection is not absolute, as the Act includes exceptions like fair use. Recently, copyright has been at the forefront of debates surrounding the use of copyrighted material by tech companies to train generative artificial intelligence (AI) large language models (LLMs). In June, the U.S. District Court for the Northern District of California ruled in favor of Meta Platforms in Kadrey v. Meta, concluding that Meta’s use of copyrighted books to train its LLM called Llama constituted fair use. This article summarizes the court's opinion.

Case Background

The plaintiffs in Kadrey v. Meta were a group of 13 authors. They filed a motion for partial summary judgment, alleging that Meta unlawfully copied their copyrighted books without permission to train Llama. They argued that this reproduction did not qualify as fair use. Meta also filed a motion for partial summary judgment, asserting its fair use defense.  

Facts

AI creates new content by identifying patterns in training data, with outputs limited by the data’s scope. LLMs, a type of AI, are trained on vast text datasets to understand and generate text by predicting word sequences. They can perform tasks like drafting emails, summarizing documents, or coding. LLMs benefit from quality training data, especially books, which enhance their “memory” and ability to process longer prompts and maintain coherent conversations. Books are ideal for training due to their consistent style, structure, and proper grammar. Developers can fine-tune LLMs to improve task-specific performance or prevent offensive outputs.

Meta trained its Llama AI model using the plaintiffs’ copyrighted books, but implemented “mitigations,” a training method designed to prevent Llama from regurgitating the plaintiffs' works. “Neither party’s expert opined that Llama was able to regurgitate more than 50 words from any of the plaintiffs’ books, even in response to ‘adversarial’ prompting” designed specifically to make LLMs regurgitate the language it is trained on. 

Copyright Law and Fair Use

Embedded in the U.S. Constitution (U.S. Const., Art. I, § 8, cl. 8), copyright law has evolved, but it encourages and promotes creativity by granting the author of an original work “a bundle of exclusive rights.” 1 Even though that “bundle” may include the right to reproduce copyrighted work, to create derivative works, and to issue licenses, it is not absolute. “These rights. . . are ‘subject to a list of statutory exceptions, including the exception for fair use provided in 17 U.S.C. § 107.’” 2 Fair use is a “complete defense” to copyright Infringement. 3 That is to say, “the fair use of a copyright work. . . is not an infringement of copyright.” 4   

In 1976, Congress codified the common law doctrine of fair use in 17 U.S.C. § 107, which provides: “[T]he fair use of a copyrighted work. . . for purposes such as criticism, comment, news reporting, teaching. . . scholarship, or research, is not an infringement of copyright.” To determine whether a particular use is “fair use,” the statute sets out four factors to be considered: “(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or value of the copyrighted work.” 17 U.S.C. § 107.

All four factors must be examined on the facts and circumstances of each particular case. The factors are not rigidly applied, and one particular factor “may prove more important in some contexts than in others.” 5 In applying the fair use doctrine, courts take a holistic approach, requiring judicial balancing depending on relevant circumstances. Fair use is an affirmative defense, placing the burden of proof on the party invoking it. It is highly fact-dependent, but is a mixed question of law and fact.

When ruling on fair use, the courts have considered whether a copier’s use adds something new to the original work with a different purpose or character, a concept known as transformative. Additionally, the fourth factor requires a court to consider the financial effects of the copying in the market. “It can require a court to consider the amount of money that the copyright owner might lose,” due to the copying. 6 Financial losses to the copyright owner because of a copier’s use directly conflict with the purpose of the Act: providing authors and creators with exclusive rights, and to benefit financially from their creativity. 

For purposes of summarizing Kedray v. Meta, the court’s decision turned on factors one and four. The court reiterated that factor one focuses primarily on whether Meta’s secondary use of the books was transformative - does the new work add something new, with a further purpose or different character. The court determined that the plaintiffs likely wrote the books for education or entertainment. Whereas, Meta used the copyrighted books to train their LLMs to generate new text, assisting users with drafting documents, or creative ideation. These end uses, the court said “are different from the use to which the plaintiffs’ books are generally put,” namely education or entertainment. Therefore, when Meta copied the books “to develop a tool that can perform those functions is a use with a different purpose and character than the books themselves.” Consequently, the court ruled that factor one favored Meta. 

Citing the U.S. Supreme Court case of Harper & Row, the court stated that the fourth factor is “undoubtedly the single most important element of fair use.” 7 The court identified three ways that a copyright plaintiff could demonstrate that the use of copyrighted works to train AI models “harmed the market,” or was financially detrimental to the copyright holder. First, a plaintiff could show that at no cost (free) the LLM tool is able to regurgitate their copyrighted work. Second, a plaintiff could show that their copyrighted work should be licensed if a tech company wants to train its AI model on their books, arguing that allowing tech companies to use copyrighted works without a license “precludes the development of that market.” Lastly, a plaintiff could demonstrate that a defendant’s AI tool output is so similar to the original work that the output is, in essence, a substitute. The court found that Llama did not have the ability to regurgitate a substantial portion of the plaintiffs' books, rather it could only regurgitate a small portion of the copyrighted books. Therefore, the court reasoned Llama “does not threaten to have a ‘meaningful or significant effect’ upon the potential market for or value of the plaintiffs’ books.” 

Key Legal Findings

The court found that Meta’s training of Llama constituted fair use of the copyrighted books. However, the court also strongly suggested (although in dicta) that the outcome of the decision to grant Meta partial summary judgment on fair use may have been different if the plaintiffs had submitted a stronger evidentiary record to support the fourth factor of the fair use test - economic harm. 

Conclusion

Fair use is never mechanically applied rather it is a fact-specific principle requiring a case-by-case analysis. Here, the court determined that Meta’s use of the copyrighted books to train its LLM was highly transformative. Consequently, the court stated that the plaintiffs “need to win decisively on the fourth factor” of the fair use doctrine in order to defeat Meta’s affirmative defense. Plaintiffs failed to create a genuine issue of material fact as to the fourth factor by presenting no meaningful evidence on market dilution at all. Absent that evidence, the court determined that the fourth factor also favored Meta. Therefore, Meta was granted partial summary judgment on its fair use defense. The ruling is a technical win for Meta, but also signals to all current and future plaintiffs that they should develop a robust evidentiary record to demonstrate market harm in these types of cases to defeat an affirmative defense of fair use. 

_____________

1 Harper & Row, Publishers, Inc. v. Nation Enterprises, 471 U.S. 539, 546 (1985); see also 17 U.S.C. § 106. 
2 Bouchat v. Balt. Ravens Ltd. P’ship, 619 F.3d 301, 307 (4th Cir. 2010). 
3 Id.
4 Id.
5 Google LLC v. Oracle Am., Inc., 593 U.S. 1 (2021).
6 Id., at 35.
7 Harper & Row, 471 U.S. at 566. 

For further reading on the topic please see the following:

Copyright Triumphs Over AI Training: Court Decision Sends Strong Message to Tech Industry, Langham, Pamela, MSBA Blog, Feb. 25, 2025. 

U.S. Supreme Court Warhol Foundation Decision on Copyright Fair Use - SCOTUS: Transformation Alone Is Not Fair Use of a Copyrighted Work, Alderman, Elliott, MSBA Blog, Nov. 28, 2023. 

Current Status of the Intersection of Artificial Intelligence and Copyright Law, Langham, Pamela, MSBA Blog, Nov. 9, 2023.

 

The banner image was created by Genius Pro - Google's paid subscription generative artificial intelligence tool.