Select Page
AI » AI & Copyright: Navigating the Uncharted Waters
artificial intelligence ai and machine learning

AI & Copyright: Navigating the Uncharted Waters

Jun 21, 2023

In 1847, French composer Ernest Bourget stirred controversy over copyright by refusing to pay for his drink, claiming the café’s musicians played his music, thus owing him. Every epoch introduces new questions about copyright, from photography and recorded music to radio, home videos, and now, generative AI.

AI models such as ChatGPT and Midjourney discern patterns from massive datasets. Though OpenAI’s exact data sources for ChatGPT4 remain undisclosed, similar projects utilize varied resources, from the Common Crawl’s public web archive to Project Gutenberg’s books and scientific paper archives. While much of this content is freely accessible, it’s also copyrighted.

Unlike traditional data retrieval, these AI systems don’t merely reproduce a single piece of data from their training set. Instead, they match patterns from vast arrays of references. As Tim O’Reilly once stated, ‘data isn’t oil – data is sand,’ painting a picture of the individual data point’s insignificance in the grand scheme.

Current copyright laws struggle to comprehend these models, as they don’t reproduce singular items but generate new content based on an aggregate. This relates to the Ernest Bourget incident, raising complex issues about AI, copyright, and even brand identity when AI recreates or imitates known styles or personas.

These models’ ‘freely available’ aspect adds another layer of complexity. As reported by FT, OpenAI, and Google are discussing payment for access to training data with newspaper publishers. While news platforms may not have a valid argument for payment based on search appearances, they have a point if AI can synthesize news from multiple sources, bypassing direct traffic to the original sites. This could extend to other specialized domains as well.

In essence, OpenAI’s use of massive data isn’t to build a database but an attempt to automate intelligence creation. However, framing this as a copyright issue may be overly simplistic. The uncharted waters of AI and copyright are yet to be fully navigated, leaving us with more questions than answers.

You might also be interested in these articles:

Mastering GEO: Elevate Your Content in AI Search

Mastering GEO: Elevate Your Content in AI Search

Generative Engine Optimization (GEO) has emerged as a pivotal strategy in the rapidly evolving digital space. This new form of optimization extends beyond traditional SEO by maximizing content visibility within AI-driven platforms such as ChatGPT, Claude, SGE, Gemini,...

read more