How Tech Giants Cut Corners to Harvest Data for A.I.

0
28
How Tech Giants Cut Corners to Harvest Data for A.I.


The race to lead AI has become a desperate hunt for the digital data needed to advance the technology. To get this data, tech companies like OpenAI, Google and Meta cut corners, ignored company policies and discussed bending the law, according to a New York Times investigation.

At Meta, which owns Facebook and Instagram, managers, lawyers and engineers last year discussed buying publishing house Simon & Schuster to source long-form works, according to records of internal meetings obtained by The Times. They also advocated collecting copyrighted data from the Internet, even if doing so would result in lawsuits. Negotiating licenses with publishers, artists, musicians and the news industry would take too long, they said.

Like OpenAI, Google has been transcribing YouTube videos to collect text for its AI models, said five people with knowledge of the company’s practices. This may have violated the copyrights of the videos belonging to their creators.

Last year, Google also expanded its terms of service. One motivation for the change, according to members of the company’s privacy team and an internal message seen by The Times, was to give Google the ability to access publicly available Google Docs, restaurant reviews on Google Maps and other online material for more of the same access AI products.

The companies’ actions illustrate how online information – news, works of fiction, message board posts, Wikipedia articles, computer programs, photos, podcasts and film clips – has increasingly become the lifeblood of the booming AI industry. Developing innovative systems depends on having enough data to teach technologies to instantly produce text, images, sounds and videos similar to what a human creates.



Source link

2024-04-06 21:00:20

www.nytimes.com