How Fair Use Favors OpenAI in the ANI Lawsuit

Open AI

INTRODUCTION AND CONTEMPORARY RELEVANCE

Ever since ANI filed a lawsuit against OpenAI for alleged infringement of its copyright,  existing discrepancies in the legal framework surrounding its permissible bounds have cropped up, and policymakers all over hope to receive much-needed clarity on the issue through the medium of this verdict. Fair Use is one of the principles being mooted in defense of OpenAI to argue that the latter’s Use of the former’s copyrighted content fits within Fair Use thresholds and is, thereby, justifiable. This principle means that as long as the copy of the copyrighted content is within its ‘fair use,’ it is classified as an exception and meets legal standards.

RELEVANT LAWS

While the Fair Use of Copyright Doctrine has been codified under Section 52 of the Indian Copyright Act, 1957, Indian courts frequently assess the facts and circumstances of each case by referring to the four doctrinal factors laid under Section 107 of the US Copyright Law. Thus, a defense that has the Fair Use Principle at its bedrock, in order to be an acceptable defense, must prove the existence of these factors in its favor, namely, the purpose and character of Use, the nature of copyrighted work, the amount and substantiality of the portion used and the effect of Use upon potential markets. [1] This article aims to prove how the alleged copying fits within Fair Use by assessing these four factors to render OpenAI’s challenge devoid of merit.

TRANSFORMATIVE NON-COMMERCIAL CHARACTER OF AUTOMATED RESPONSES FULFILLS FIRST FACTOR REQUIREMENT

Courts assess two key questions under this head- Firstly, whether the copyright infringement has been undertaken for commercial benefits or is it for non-profit educational purposes, and secondly, whether the copied material is so varied in context, purpose, and use that it can be classified as ‘transformative’ not to have a significant economic impact on the owner.

Open AI
[Image Sources: Shutterstock]

As regards the first question, it is imperative to understand that ChatGPT, a generative AI  based on the Large Language Model or LLM, builds upon knowledge acquired through its training processes to produce automated responses to user prompts. If such knowledge is extracted and fed from copyrighted content, the training process is far from a ‘commercial benefit’ if not for non-profit educational purposes. Regarding the second plank, the ‘Transformative Work’ concept relies on the overpowering additions to the original content to have a greater lobbying force, as opposed to all other factors. Here, the rationale behind the adoption of the lowest threshold by the US Supreme Court to derive originality from a work, or the Minimal Degree of Creativity Test, should give some insight into how ChatGPT responses that use multifarious data sources to generate responses do not regurgitate a single data source but also accompany it with its insights developed through an understanding of those sources. Moreover, it can be argued that both the training process and responses classify as “Purposive Transformation even without Physical Transformation” because of a lack of public access in the former and the usage of a multifarious digital corpus to produce them in the latter. This reasoning aligns with the US decision of Authors Guild v. Google, Inc., 804 F.3d 202, 229 (2d. Cir. 2015), also known as the Google Books Case. [2] [3]

NATURE OF COPYRIGHTED WORK FULFILLS SECOND FACTOR REQUIREMENT

As a corollary of the general rule of Protection of Expression over Protection of ideas per-se, the second factor prioritizes unpublished works over published works in granting a narrow approach to the Fair Use Doctrine. The aim behind this, despite having the least weightage among others, is to grant first rights over publication and distribution to unpublished works’ owners. This means that ChatGPT, on account of using already published materials for training processes, has this factor heavily in its favor.

GENERATIVE AI NEGATES AMOUNT AND SUBSTANTIALITY TO FULFILL THIRD FACTOR REQUIREMENT

The percentage of copyrighted content and the underlying purpose play a central role in determining Fair Use, i.e., the percentage used must either be justifiable or proportional to the purpose or of the lowest degree to claim this defense. As long as OpenAI can prove that its responses or the derivative work produced are not ‘substantially similar’ to the original copyrighted content used in training programs, the copyright infringement claim through derivative work would not hold water. At the heart of OpenAI lies the concept of generative AI, which only builds upon existing content to produce a modified output that is so transformative as to fall within the scope of Fair Use without damaging the market for the original work. [4]

UNVIABILITY OF MARKET DISRUPTION CONCERNS

A copyright infringement must adversely affect either the direct or derivative market of the original work in order for it to be classified as an Unfair use. This impact can be caused by a loss in the target audience, downward revenue, and economic damage. An AI model that uses copyrighted content to merely learn and unlearn patterns and structures of that hoard of data is unlikely, if not impossible, to disrupt the already existing market of that data. Additionally, generative content that has within its pool not a single but innumerable data sources to learn from and produce can never become a perfect substitute for any singled-out data source. [5] Therefore, the concern of market disruption is not viable to vitiate the defense of Fair Use.

CONCLUSION

The essence of the Fair Use defense lies in proving the existence of the above four sine-qua-non factors. The above elimination process shows that undertakings of OpenAI are justified under the Fair Use of Copyright defense, i.e., its Use is neither commercial nor is it a replica of a single document, it builds upon already published works, the amount and substantiality of copyrighted content used is meager. Lastly, it is doubtful that its operations would lead to market disruptions in journalism. Thus, the established norm of protection of expression, not ideas in themselves, lays the ground for OpenAI’s defense in ANI’s challenge.

Whether the courts accept this defense or not is a moot question that is yet to be decided. However, the author believes that technological progress warrants a preference for the ‘permissionless innovation principle’ over the precautionary pursuit of every breakthrough invention in an increasingly evolving digital landscape. A balance must be carved between copyright infringement and fair Use but not at the cost of innovation, and a decision favoring generative AI will ideally align with this idea.

Author: Shubhanjali Dwivedi, in case of any queries please contact/write back to us at support@ipandlegalfilings.com or IP & Legal Filing

REFERENCES

  • ‘Copyright and Fair Use: A Guide for the Harvard Community’ (Harvard University, 31 May 2016) <https://ogc.harvard.edu/files/ogc/files/ogc_copyright_and_fair_use_guide_5-31-16.pdf> accessed 13 December 2024
  • Rich Stim, ‘Summaries of Fair Use Cases’ (Stanford Libraries, October 2019) <https://fairuse.stanford.edu/overview/fair-use/cases/> accessed 13 December 2024
  • Jiarui Liu, ‘An Empirical Study of Transformative Use in Copyright Law’ (Stanford Technology Law Review, 2019) <https://law.stanford.edu/wp-content/uploads/2019/02/Liu_20190203.pdf> accessed 13 December 2024
  • Ilan Rakhmanov, ‘Generative AI won’t replace Journalists, But It can enhance Journalism’ (Forbes, 23 May 2024) <https://www.forbes.com/councils/forbestechcouncil/2024/05/23/generative-ai-wont-replace-journalists-but-it-can-enhance-journalism/> accessed 13 December 2024
  • ‘OpenAI and Journalism’ (Open AI, 8 January 2024) <https://openai.com/index/openai-and-journalism/> accessed 13 December 2024