A digital news outlet, Raw Story Media, brought an infringement claim in the United States (US) District Court, Southern District of New York in 2024 alleging that the large language model OpenAI scraped its copyrighted articles from the web. This was done to train the ChatGPT artificial intelligence (AI) model which generates text-based responses based on large datasets. Raw Story Media claims that OpenAI did not obtain permission to use its content and that the AI-generated outputs which paraphrase or summarise news articles amount to derivative works that infringe Raw Story Media’s copyright.
The case primarily hinges on whether OpenAI’s use of copyrighted materials for training purposes falls under fair use – a legal doctrine in US copyright law that allows for limited use of copyrighted material without permission for purposes such as research, commentary, education, or news reporting – or, whether such use would amount to a violation of copyright laws (see William W Collins ‘Generative AI in Legal Tech: Navigating Current Litigation and Legal Frameworks for SaaS Companies’ (www.linkedin.com, accessed 2-12-2024)).
Although judgment has been passed dismissing Raw Story Media’s claim in favour of OpenAI, the case is still in its preliminary stages and may have many years of legal debate ahead of it. Importantly, however, it raises important legal questions regarding:
OpenAI argues that its use of publicly available data (including Raw Story Media’s articles) for training its AI models falls within ‘fair use’ because it does not reproduce the articles directly but rather uses them to train the model in a way that is transformative and not a direct copy of the original content (see Collins (op cit)).
OpenAI puts forward that training AI models with publicly available data is transformative: its purpose being for educational and research purposes – which is what the doctrine of fair use seeks to achieve.
Raw Story Media on the other hand asserts that the content generated (paraphrased and/or rephrased) by ChatGPT constitutes derivative works (as opposed to sufficiently distinct, original works) because they closely resemble the content of its articles, merely reproducing its content into a new form, which infringes on its exclusive rights as a copyright holder.
The court’s ruling will have far-reaching implications for the AI industry, as it could establish important legal precedents regarding how AI companies use copyrighted content for training (ie, either with or without obtaining explicit licenses from the content creators) and the limits of fair use (see Collins (op cit)).
In South Africa, copyright law is governed by the Copyright Act 98 of 1978, which protects original literary, musical, and artistic works, and prescribes similar copyright principles found in international copyright regimes. Unfortunately, South Africa’s copyright laws face unforeseen challenges when applied to new technologies like AI.
Unlike US copyright law which explicitly includes a fair use doctrine, South Africa’s Copyright Act provides a fair dealing exception. Fair dealing allows the use of copyrighted works for certain purposes such as research, teaching, and private study, but is narrower than fair use. The defence of transformative use in the AI context cannot thus be as easily put forward before the South African courts, since the law fails to make provision for broad exceptions such as transformative works.
The question of AI-generated content and ownership is less clear under South African law. If an AI model generates content based on copyrighted materials, it raises questions about whether the AI is creating a derivative work or whether the output is sufficiently original to be considered a new work in its own right (see Brian Glassman ‘Complete Beginner’s Guide to Generative AI’ (www.dreamhost.com, accessed 2-12-2024)). South African law will need to evolve to address these concerns, particularly as AI becomes more integrated into sectors like journalism, entertainment, and education.
South African content creators and businesses could benefit from stronger protection of their works against unauthorised use by AI companies. On the other hand, AI developers need access to large datasets to improve the performance of their models. A legal framework that balances these interests will be necessary to foster innovation while respecting the rights of content creators.
The Raw Story Media case raises fundamental questions about copyright law in the digital age, especially as it pertains to emerging AI technologies and the use of large-scale datasets that include publicly available content.
As AI technologies evolve, copyright holders (such as news outlets and content creators) are increasingly concerned that their intellectual property is being exploited without permission, recognition or compensation, especially when the AI outputs are monetized or lead to products that directly compete with the content creators’ own work.
On the other hand, AI companies argue that training AI models on publicly available data is analogous to traditional research uses, such as using books or articles to develop academic work or software.
In South Africa, the principles of fair dealing and copyright protection for digital works are similarly underdeveloped in relation to new technologies. The South African legislature may need to update the Copyright Act to better address the challenges posed by AI, ensuring that there is a balance between encouraging technological innovation and protecting the economic interests of content creators.
The case underscores the need for a modernised copyright framework that accounts for the realities of the digital economy and AI development, ensuring that both creators and innovators can benefit from the growing role of technology in creative industries.
Celine Bakker BA (Law) LLB (Stell) is a legal practitioner at SL Law in Cape Town.
This article was first published in De Rebus in 2025 (March) DR 48.
De Rebus proudly displays the “FAIR” stamp of the Press Council of South Africa, indicating our commitment to adhere to the Code of Ethics for Print and online media, which prescribes that our reportage is truthful, accurate and fair. Should you wish to lodge a complaint about our news coverage, please lodge a complaint on the Press Council’s website at www.presscouncil.org.za or e-mail the complaint to enquiries@ombudsman.org.za. Contact the Press Council at (011) 4843612.
South African COVID-19 Coronavirus. Access the latest information on: www.sacoronavirus.co.za
|