A study revealed that nearly 50,000 Spanish works had been used illegally to train AI.
Photo Credit: Emiliano Vittoriosi via Unsplash
Along with the globalisation of AI comes many new problems and situations that the individual as well as governing bodies will have to learn how to navigate – when text, images, videos, and music can be generated at the push of a button, by anyone, and from anywhere, the implications on concepts such as copyright law, originality, and plagiarism is relevant in a wide array of creative fields, and the publishing industry is no exception. In particular, for AI to continue developing and improving, it must learn from human-made media pieces, language models, and text – in other words, it must be trained. But in Spain, a recent study found that books were being used to train AI – illegally pirated books, that is.
CEDRO, the Spanish Intellectual Property Rights Management Entity, has found that nearly 50,000 books and works, pertaining to at least 41,000 authors and 1,100 different publishers, were being used to train AI – practically the entire Spanish publishing industry, according to representatives of CEDRO.
This claim is based upon a report by the Danish Rights Alliance published in September 2024. According to this report, Libgen (a Russian pirating site) has been using pirated Spanish works to train AI models like OpenAI (also called ChatGPT and Meta). Libgen’s domain has been suspended throughout Europe and the website was shut down. However, users can still access it through mirror sites or VPNs that allow them to appear to be in a foreign country. Data was collected from publicly available documents and statements made during legal battles in the United States against large tech companies. Any company that uses Libgen for a service may be violating copyright.
Among affected authors are Almudena Grandes, Arturo Pérez Reverte, Fernando Aramburu, Dolores Redondo, Lorenzo Silva, María Dueñas, and Eduardo Mendoza. Some of the Spanish publishers affected include Grupo Planeta, Acantilado Anagrama Libros del Asteroide RAE.
Said President of the Federation of Spanish Editorial Guilds, Daniel Fernández, “They have been stealing from us for a long time.” Fernández adds that the integration of AI could be dangerous in particular for the publishing industry, as AI-generated pieces could be passed off as human-made if regulations are not put into place, and soon. Jorge Corrales is the Director of CEDRO. He called the current government regulations concerning AI as “totally deficient” and stated, “I don’t believe that legislation isn’t as quick as technology. I think what we have here is a failure to try.”
In spite of this, the EU and the government have taken some legislative steps in response to the globalisation. In May of last, there was the first regulation law on AI. Hopefully, more will follow. It is important to find a solution to copyright licensing that benefits publishers and other creatives. This will not mean turning a blind-eye to international AI usage, but rather adapting to it.
CEDRO spokesperson Carmen Cuartero said that the government should give authors and publishers tools and resources so they can manage their intellectual property. This will help them avoid the “copying culture”, which is growing by the minute.