OpenAI and Microsoft Face New York Times Copyright LawsuitMedia Giant Alleges 'Billions of Dollars in Statutory and Actual Damages'
The New York Times is suing OpenAI and its major backer Microsoft for alleged copyright infringement.
The Times, which counts over 10 million subscribers to its print and digital editions, alleged in a complaint filed in the Southern District of New York that OpenAI had used without permission "millions" of the daily newspaper's copyrighted articles to train the large language models that power generative artificial intelligence chatbots such as ChatGPT.
Microsoft's "intimate involvement in OpenAI's operations and, therefore, its copyright infringement" make it party to the lawsuit, as does the Redmond, Washington-based technology giant's use of OpenAI's technology in its own products, which helped boost Microsoft's market capitalization by $1 trillion over the past year, the Times said.
The LLMs used by Microsoft also "provide infringing content and, at times, misinformation to users of its products and online services" including via Bing Chat, aka Copilot, the Times alleged.
The newspaper is owned by The New York Times Company, which is controlled by the Ochs-Sulzberger family through a trust. The company reported $2.4 billion in revenue for 2022.
Microsoft's cumulative investment in OpenAI has exceeded $13 billion, although in an apparent move to mitigate antitrust concerns, the company doesn't own a direct stake in the AI firm. In January, Microsoft negotiated a deal via which it would receive 75% of OpenAI's profits until its investment is repaid, after which Microsoft will own a 49% stake in the firm.
The Times said its legal move followed failed negotiations with OpenAI. The media giant wanted to reach an agreement via which OpenAI would pay to license its content.
Other media firms have struck deals with OpenAI. Axel Springer, the publisher of Business Insider, recently hammered out a deal allowing OpenAI to use its data for three years in exchange for "tens of millions" of euros. In April, The Associated Press signed a two-year deal allowing OpenAI to use select news content dating back to 1985 to train its algorithms.
Generative AI tools use large language models to handle users' queries, and the better the quality of data used to train an LLM, the better the potential results. By working more closely with reputable news outlets, generative AI tools such as ChatGPT could in theory deliver more accurate and trustworthy responses. To date, numerous instances have come to light of chatbots misattributing information or attributing factually incorrect information to an incorrect source.
The Times said in its complaint that it had analyzed the web crawlers AI companies use to gather LLM data for training their algorithms, as well as ChatGPT's query responses, and found it was returning content that was nearly identical to Times' articles, while at the same time sometimes inaccurately attributing its responses to information sourced from the Times.
The unauthorized use of the Times' content to create AI products that directly compete with the Times "threatens" the newspaper's business and gives companies that use AI a "free-ride on The Times's massive investment in its journalism by using it to build substitutive products without permission or payment," according to the Times' complaint. While OpenAI and Microsoft "engaged in widescale copying from many sources," they placed "particular emphasis" on using Times' content, it said.
"This action seeks to hold them responsible for the billions of dollars in statutory and actual damages that they owe for the unlawful copying and use of The Times's uniquely valuable works," according to the complaint. "If The Times and other news organizations cannot produce and protect their independent journalism, there will be a vacuum that no computer or artificial intelligence can fill."
Creators Seek Compensation
The Times' lawsuit is the latest to be filed by organizations, artists and writers from across the United States who are seeking legal remedies to stop AI companies from using their intellectual property without recompense or permission. Social media platform X's owner Elon Musk recently pursued four firms, including Israel's Bright Data, for scraping user-generated data from his platform.
Dozens of print and digital news executives in October lobbied members of Congress, seeking stronger copyright protection, required disclosures and transparency for training AI models as well as liability and accountability.
The U.S. Copyright Office is studying the legal and policy issues posed by AI. Responding to its call for comment, the News Media Alliance, which comprises more than 18,000 members, in October recommended requiring "strong and enforceable recordkeeping obligations" for all AI firms as part of a mandatory "rights-based licensing framework" to compensate authors and publishers.
How online platforms can use news content - and the rights of authors and publishers - remains an open legal question. One long-standing battle centers on Google and Meta displaying news content on their sites. In 2021, Australia proposed legislation that would force such platforms to compensate domestic news publishers. A similar law in Canada led Meta to block all news from Facebook's Canadian site.
Media Outlets Experiment
News organizations have not been sitting idly by as chatbot capabilities surge. Outlets such as the AP, the Guardian and Insider have been experimenting with the technology. The Times hired an editorial director to lead its artificial intelligence initiatives.
Some media firms have already gotten into hot water over their use of AI. Earlier this month, Sports Illustrated publisher Arena Group fired multiple executives after they allegedly oversaw the use of AI to generate content attributed to fake bylines. In November 2022, technology news site CNET began publishing AI-written stories, only to later find errors in more than half of them.