Technology Essay: ‘The Unbelievable Scale Of AI’s Pirated-Books Problem’

THE ATLANTIC (March 20, 2025):

When employees at Meta started developing their flagship AI model, Llama 3, they faced a simple ethical question. The program would need to be trained on a huge amount of high-quality writing to be competitive with products such as ChatGPT, and acquiring all of that text legally could take time. Should they just pirate it instead?

Meta employees spoke with multiple companies about licensing books and research papers, but they weren’t thrilled with their options. This “seems unreasonably expensive,” wrote one research scientist on an internal company chat, in reference to one potential deal, according to court records. A Llama-team senior manager added that this would also be an “incredibly slow” process: “They take like 4+ weeks to deliver data.” In a message found in another legal filing, a director of engineering noted another downside to this approach: “The problem is that people don’t realize that if we license one single book, we won’t be able to lean into fair use strategy,” a reference to a possible legal defense for using copyrighted books to train AI.

‘…generative-AI chatbots are presented as oracles that have “learned” from their training data and often don’t cite sources (or cite imaginary sources). This decontextualizes knowledge, prevents humans from collaborating, and makes it harder for writers and researchers to build a reputation and engage in healthy intellectual debate.”

————————————–

One of the biggest questions of the digital age is how to manage the flow of knowledge and creative work in a way that benefits society the most. LibGen and other such pirated libraries make information more accessible, allowing people to read original work without paying for it. Yet generative-AI companies such as Meta have gone a step further: Their goal is to absorb the work into profitable technology products that compete with the originals. Will these be better for society than the human dialogue they are already starting to replace?

READ MORE

Alex Reisner is a contributing writer at The Atlantic.


Discover more from INTELLICUREAN

Subscribe to get the latest posts sent to your email.