I’ve complained that various publishing industry groups have been slow to respond to recent developments in AI, like LLMs. Over the last week, I’ve been tinkering with a series of “belief” statements that other industry folks could sign onto. I’m finally ready to start sharing a draft. Feedback welcome, but if you are here to discourage, move along. Not interested in a fight.
Where We Stand on AI in Publishing
We believe that AI technologies will likely create significant breakthroughs in a wide range of fields, but that those gains should be earned through the ethical use and acquisition of data.
We believe that “fair use” exceptions (and similar exceptions in other countries) with regards to authors’, artists’, translators’, and narrators’ creative output should not apply to the training of AI technologies, such as LLMs, and that explicit consent to use those works should be required.
We believe that the increased speed of progress achieved by acquiring AI training data without consent is not an adequate or legitimate excuse to continue employing those practices.
We believe that AI technologies also have the potential to create significant harm and that to help mitigate some of that damage, the companies producing these tools should be required to provide easily-available, inexpensive (or subsidized), and reliable detection tools.
We believe that detection and detection-avoidance will be locked in a never-ending struggle similar to that seen in computer virus and anti-virus development, but that it is critically important that detection not continue to be downplayed or treated as a lesser priority than the development of new or improved LLMs.
We believe that publishers do not have the right to use contracted works in the training of AI technologies without contracts that have clauses explicitly granting those rights.
We believe that submitting a work for consideration does not entitle a publisher, agent, or the submission software developers to use it in the training of AI technologies.
We believe that publishers or agents that utilize AI in the evaluation of submitted works should indicate that in their submission guidelines.
We believe that publishers or agents should clearly state their position on AI or AI-assisted works in their submission guidelines.
We believe that publishers should make reasonable efforts to prevent third-party use of contracted works as training data for AI technologies.
We believe that authors should acknowledge that there are limits to what a publisher can do to prevent the use of their work as training data by a third party that does not respect their right to say “no.”
We believe that the companies and individuals sourcing data for the training of AI technologies should be transparent about their methods of data acquisition, clearly identify the user-agents of all data scraping tools, and honor robots.txt and other standard bot-blocking technologies.
We believe that copyright holders should be able to request the removal of their works from databases that have been created from digital or analog sources without their consent.
We believe that the community should not be disrespectful of people who choose to use AI tools for their own entertainment or productive purposes.
We believe that individuals using AI tools should be transparent about its involvement in their process when those works are shared, distributed, or made otherwise available to a third party.
We believe that publishing contracts should include statements regarding all parties’ use, intended use, or decision not to use AI in the development of the work.
We believe that authors should have the right to decline the use of AI-generated or -assisted cover art or audiobooks on previously contracted works.
We believe that individuals using AI tools should be respectful of publishers, booksellers, editors, and readers that do not wish to engage with AI-generated or -assisted works.
We believe that publishers and booksellers should clearly label works generated by or with the assistance of AI tools as a form of transparency to their customers.
We believe that copyright should not be extended to generated works.
We believe that governments should craft meaningful legislation that both protects the rights of individuals, promotes the promise of this technology, and specifies consequences for those who seek to abuse it.
We believe that governments should be seeking advice on this legislation from a considerably wider range of people than just those who profit from this technology.
We believe that publishers and agents need to recognize that the present state of detection tools for generated art and text is capable of false positives and false negatives and should not be relied upon as the sole source of determination.
Some notes in response to comments:
AI-generated and -assisted are not specifically defined due to the fact that the technology is evolving and any definition would likely create loopholes. Broad terms are meant to encourage transparency and allow individual parties to determine whether or not such uses cross their individual lines. For example, one’s attitude towards AI-assisted research may be different for non-fiction vs. fiction.
In suggesting that AI developers should be required to provide reliable detection tools and that by emphasizing that detection should be a priority, we are stating that the people best equipped to understand and tackle the problem of detection–which can be tackled in several different ways–are the ones developing the AI tools that generate the output. (Some have even suggested that all output should be saved for detection purposes, but we’re not qualified to tell them how to solve the problem.) It’s a problem of the same scale (or even more complex) as generation. Instead, industry professionals have been quoted saying things like “get used to it” and have placed insufficient effort into addressing the problems they are creating. The pricing concern is to make sure that anyone can access it. It’s going to impact more than just the publishing world.
The statement on fair use has been amended to include equivalents in other countries since it is not an international standard. There are still court cases pending in the US that debate that fair use even applies in training data. We’re saying that we don’t think it should. We are not advocating the end of fair use, just saying it shouldn’t apply in this specific case. Anyone who says that the end of fair use protection for the data means the end of this technology is lying. There are already vendors producing “ethically-sourced” datasets. It may slow data acquisition and by extension development of some tools, but it does not end it.