Google becomes the first AI company to be fined over training data

Guests attend the inauguration of a Google Artificial Intelligence (AI) hub in Paris on February 15, 2024.
Guests attend the inauguration of a Google Artificial Intelligence (AI) hub in Paris on Feb. 15, 2024.
Alain Jocard—AFP/Getty Images

Nearly five years ago, I reported on how France’s news publishers had gone to war with Google over the issue of “ancillary copyright” fees—payments for including snippets of article text in Google’s search results. I wrote that the media houses were unlikely to win. Well, they just did. And as with everything else these days, AI suddenly became part of the equation.

This morning, the French competition authority fined Google €250 million ($271 million) for failing to comply with commitments it made a couple years ago regarding how it would negotiate the fees with the French news outlets. Google already received a €500 million fine over the matter in 2021. It’s not arguing back this time, which has probably spared it an even higher fine.

I’m not going to go into the details of the revised negotiation process to which Google has agreed—it’s as dull as it sounds and, if you really want to dive in, here’s Google’s statement in French and English. The interesting part of this episode is about Google’s Bard AI, which these days goes by the name of Gemini.

As far as I’m aware, this is the first fine to be levied on an AI company at least partially because of its free-wheeling incorporation of everything it can grab into its training data.

According to the French Competition Authority (FCA) the fine takes account of the fact that Google “used content from press agencies and publishers to train its [Bard] foundation model, without notifying either them or the [FCA].”

As Google tells it, the FCA “does not challenge the way web content is used to improve newer products like generative AI.” Google claimed this issue is “already addressed” in Article 4 of the EU Copyright Directive, which provides an exception for text and data mining, and in the upcoming EU AI Act, which tells AI companies to respect the Copyright Directive (and to publish “sufficiently detailed” summaries of their training data).

But according to the FCA, the question of whether article-scraping AI companies qualify for that text and data mining exception “has not yet been settled.” It said Google had “at the very least” broken its commitment to be transparent in its commercial dealings with French news publishers, making this a fining matter.

Now, Google did recently launch a new control in the robots.txt file that web publishers use to send signals to Google’s web crawlers. The setting is called Google-Extended, and it’s supposed to let publishers opt out of having their data become Bard/Gemini training fodder without also having their articles disappear from Google Search and Google News.

But it only added that control at the end of September, more than two months after Bard’s European launch. During that period, French publishers effectively had to allow the unrestricted hoovering of their output into Bard if they also wanted it to appear in Search and News—which, remember, is how they then get to claim ancillary copyright fees from Google. That broke another of Google’s commitments, again contributing to today’s fine. The FCA also told Google to explain to publishers how the opt-out mechanism works.

So in summary, Google’s new control for publishers has belatedly fixed one of the issues with the unpaid incorporation of news articles into its AI training material, but the overall legality of this practice under EU copyright law remains an open question. It’s not hard to see why Google’s big rival, OpenAI, has started cutting licensing deals with European press publishers like Axel Springer and Le Monde.

One final note while we’re on the subject of Big Tech and European antitrust regulators: Margrethe Vestager, the European Commission’s competition chief, just told Reuters that Apple may face a probe over the €0.50-per-app-installation “core technology fee” it’s levying on developers who dare to use the third-party iOS app stores that Apple must allow under the new EU Digital Markets Act. “There are things that we take a keen interest in, for instance, if the new Apple fee structure will de facto not make it in any way attractive to use the benefits of the DMA,” she said.

Given that Microsoft and Meta have both complained to Vestager that the fee makes third-party app stores unviable, there’s a storm a-coming, and I do not hate to say I told you so. More news below—and by the way, the Data Sheet team offers our red-cheeked apologies for misspelling “Nvidia” in yesterday’s email subject line…

David Meyer

Want to send thoughts or suggestions to Data Sheet? Drop a line here.

NEWSWORTHY

Apple’s AI. Apple quietly published a research paper about its new generative AI model, which is called MM1, Wired reports. It can work with text and images and, while it’s relatively small compared to the likes of Gemini, experts quoted in the piece say its sophistication suggests Apple isn’t completely at sea on the AI-training front. The paper’s lead author also said Apple is “hard at work” on a successor.

Threads approaches fediverse. Meta’s integration of its Threads X rival into the fediverse, a system enabling interoperability with distributed social networks like Mastodon, is taking shape. The Verge reports on a video demo in which Instagram engineering director Peter Cottle shows off the mechanism through which a Threads user could enable fediverse sharing of their posts and explains risks like the impossibility of deleting a post on other fediverse platforms. Meanwhile, Meta CEO Mark Zuckerberg announced yesterday that Threads is now rolling out a trending topics feature in the U.S.

EU Galileo SpaceX deal. The EU and U.S. have agreed on the security aspects of a deal allowing the European Space Agency to use SpaceX’s Falcon 9 rocket to launch Galileo navigation satellites (Europe’s alternative to the U.S. GPS system). According to Politico, ESA staff will get constant launchpad access and first dibs on retrieving any debris from a failed launch. Europe has to use SpaceX to expand the Galileo constellation because the next generation of its own Ariane rocket system is delayed, and because longtime launch partner Russia is no friend these days.

ON OUR FEED

“I don’t see how America will be any more secure if the next owner of TikTok is a MAGA Trump crony backed by Saudi Arabia’s sovereign wealth fund.”

Sen. Ron Wyden (D-Ore.) tells Semafor that he’s not so keen on the idea of former Treasury Secretary Steven Mnuchin buying TikTok from ByteDance if legislation forcing a sale of the Chinese-owned social media company to a U.S.-based business becomes law. Mnuchin says he’s assembled a “combination of U.S. investors” for his bidding group, but Wyden notes that much of Mnuchin’s $2.5 billion investment fund is Gulf money.

IN CASE YOU MISSED IT

Why Microsoft’s surprise deal with $4 billion startup Inflection is the most important non-acquisition in AI, by Kylie Robison

What Nvidia’s new Blackwell chip says about AI’s carbon footprint problem, by Jeremy Kahn

Sam Altman is over GPT-4: ‘I think it kind of sucks’, by Chris Morris

Sam Bankman-Fried says he’s painted as a ‘depraved super-villain’ by prosecutors in FTX case and argues 50 years imprisonment is a death sentence, by Christiaan Hetzner

Intel wins $19.5 billion in CHIPS Act funding as ‘historic’ semiconductor spending spree heats up, by Dylan Sloan

The choice behind hotel CIO Brian Kirkland’s all-in bet on cloud, by John Kell

BEFORE YOU GO

Apple’s patent lobbying. After the fiasco around the Apple Watch’s blood oxygen-measuring capabilities—which Apple removed due to a patent violation that briefly saw U.S. imports of the devices blocked—the company has begun lobbying for changes to U.S. patent-enforcement rules. According to the New York Times, Apple wants the relevant enforcer, the U.S. International Trade Commission, to “put the public interest of a product ahead of a ban.” But intellectual property expert Adam Mossoff warned that “they’re trying to neuter a well-functioning court by closing its doors to Americans who have had their rights infringed.”

This is the web version of Data Sheet, a daily newsletter on the business of tech. Sign up to get it delivered free to your inbox.