To some authors it sounds scary, to some mysterious, to others, it’s boring. However we think about it, what’s ever more clear is that Artificial Intelligence (AI) has real implications for authors. Can indie authors harness the power of AI to make better books and reach more readers? Or does AI herald the death of our most fundamental right in law: author copyright? That is the question Orna Ross and the Alliance of Independent Authors aim to answer in the final part of our “Is Copyright Broken?” series: Artificial Intelligence and Author Copyright.
We’ve all seen too many movies about AI that give a completely erroneous impression so first off let’s eliminate any ideas we may have of an AI being a physical entity that has limbs, talks in a funny, robotic voice and is less emotional than us. Think of AI, instead, as being like an app on your phone: a kind of software that can analyze data and perform an action.
Traditional computer programs are tools that support the creative process. A word processing app is not so different to pen and paper, and just as pen manufacturers don’t own the copyright to words written using their pens, Microsoft does not own every essay or book produced in Microsoft Word. The copyright lies with the author who used the program to do the creative, generative work.
But an effective AI goes beyond being a tool, in this sense. By virtue of the data vast amounts of data it can absorb and process, it recognizes a pattern and creates an output that, up to recently, required human creative brains to perform. Like creative writing–story plotting, character creation, sensory description, lyrical language. And creative publishing–book categorization, and much more.
As an indie author, you are already living with AI, if you use Amazon algorithms to help your book buying, if you use your phone app’s automatic replies (“OK.” “See you then.” “That’s fine!”), if you use auto-text guesses by your word processor or email provider.
Many authors have already dived in much further, using AI translation tech to create first drafts of a foreign language text for a translator to work on, investigating voice AI to generate their audiobooks.
Our partners too are on the case. For example, book distributor PublishDrive uses an AI tool, Savant, for category determination, and intends to introduce more AI tools for authors.
AI is overtaking humans in its ability to absorb and process information and data. This has implications for author copyright.
Artificial Intelligence and Content Generation
Take poetry, often considered the ultimate in skill for a wordsmith. Poetry is now being produced by AI poet bots all the time. The Spotify app used an AI to turn song titles into Valentine’s Day poetry with results that satisfied lots of lovers. At BotPoet.com @scarschwartz and @benjaminlaird run what they call a “Turing Test for poetry” where as a reader, you have to guess whether the poem you’re reading has been written by a human or by a computer and choose “Bot or Not?” (The AskALLi team got the average score of is 5 out of 10 right).
The site was created to challenge our preconceptions of what poetry is, what creativity is, and whether it is, in fact, a uniquely human attribute. It seems not.
And it’s not just poetry. Screenplays have been written by an AI and films already made from screenplays written by an AI. An AI penned novel made it into the shortlist for a literary award in Japan. Artificial intelligence is already being used to generate works in journalism. At Jeff Bezos’s The Washington Post and other newspapers, AI is increasingly used to generated first draft sports journalism– no need now to have reporters at every little league sending–and other aspects of journalism where reports are formulaic.
AI is still some way off writing a coherent novel, as witnessed by the surreal “Harry Potter and the Portrait of What Looked Like a Large Pile of Ash“, the story the team at Botnik got when they fed the seven Harry Potter novels through their predictive text keyboard. JK Rowling need not panic, just yet.
The future is closer in Hollywood, where data analytics is now being used to predict hits and eliminate flop far more accurately than humans, and where companies are developing screenwriting AI. Nadira Azermai, founder of one such company, ScriptBook, says Deepstory, their AI, “is really is a co-creator.” Speaking to The Guardian newspaper, she envisioned an AI-supported “next-generation writers’ room”.
Whenever they don’t know where to head to for the next scene, they would have Deepstory create it. The engine takes into account everything that you’ve written, and it will deliver you the next scene, or the next 10 pages, or write it to the end…. [At this point in time] the consistency in writing stays for another 10 pages and then the AI becomes a bit crazy – sometimes it kills the protagonist for some reason – but it’s improving. Within five years we’ll have scripts written by AI that you would think are better than human writing.
With AI accuracy improving at an astonishing pace, it seems likely that Azermai is right and it’s easy to see how this capability could be helpful to an indie novelist too.
But how we get there raises many issues from a copyright perspective.
In order to generate text, AI has first to be fed somebody else’s words, so who owns the copyright to the ensuing works? Or to the process that creates them?
To take an example: ALLi could feed the content of this blog, which has been running since 2012, and has many millions of words on the topic of self-publishing into an AI and instruct it to generate all kinds of works. If we did do this, who would own the copyright? The members who’ve guest posted on the blog over the years or been interviewed by Howard Lovy in our Inspirational Author interviews? The podcast presenters whose words have been transcribed in our weekly podcast posts? ALLi? Or the person who created the AI that allowed the new post generation?
The state of copyright law in the UK, where we are located, means such words are possibly copyright free right now (see below) so we could sell or otherwise use them. Is this fair to our contributors? And what about the company whose AI allowed us to create the works? If you invest millions creating a system that generates story drafts which a large band of people find delightful, shouldn’t you get paid, along with any writers who use the tool?
And what effect does all this have on human authors, slaving over their first drafts, while the machines can spew out passable text at the press of a button?
Producing AI-generated text just got easier with the arrival of GPT-3 in June this year. GPT is the text generation system developed by OpenAI, a research and deployment lab based in San Francisco, California, dedicated to ensuring that “artificial general intelligence benefits all of humanity” (See their OpenAI Charter) and backed by Elon Musk and other high-profile tech entrepreneurs
Like its predecessors, GPT-2 and BERT, GPT-3 creates new text based on texts fed into it but while previous models required a large training dataset (thousands or tens of thousands of examples), GPT-3 can do its language tasks with much lower input. Dale, a coder and writer at Google Cloud AI explains:.
IT’S REALLY BIG. I mean really big. With 175 billion parameters, it’s the largest language model ever created (an order of magnitude larger than its nearest competitor!), and was trained on the largest dataset of any language model. This, it appears, is the main reason GPT-3 is so impressively “smart” and human-sounding.
But here’s the really magical part. As a result of its humongous size, GPT-3 can do what no other model can do (well): perform specific tasks without any special tuning. You can ask GPT-3 to be a translator, a programmer, a poet, or a famous author, and it can do it with its user (you) providing fewer than 10 training examples.
GPT-3 is currently in beta but it won’t be long before this giant automated plagiarism machine is unleashed. Author and podcaster and ALLi’s Enterprise Advisor, Joanna Penn, a leading voice on the copyright implications of AI for authors, is both excited about its potential and concerned about the ability of copyright law to keep up with developments.
Existing law is not fit for purpose in a world of Artificial Intelligence and tools like Open AI’s GPT3. We need to ensure that creators are rewarded for their original work when it is used to train future machine learning systems.
WIPO, the World Intellectual Property Organisation says “how the law tackles new types of machine-driven creativity could have far-reaching commercial implications.” We agree.
The first question is legal, deciding who owns the copyright in AI produced work. The second is practical: how shall we will enforce whatever the law agrees, in a digital world that’s global, when copyright is local?
Artificial Intelligence and Author Copyright: The Law
Conferring copyright in works generated by artificial intelligence was obviously not in the minds of those who originally standardized copyright law. As the law currently stands, creative works qualify for copyright protection if they are original–and most definitions of originality assume a human author.
While not (yet) specifically prohibited, most countries are resistant to the idea of non-human copyright. Spain and Germany state that only works created by a human can be protected by copyright. In the United States the Copyright Office registers an original work of copyrightable authorship “provided that the work was created by a human being”, relying on case law that findings that copyright law only protects “the fruits of intellectual labor” that “are founded in the creative powers of the [human] mind.” A recent Australian case (Acohs Pty Ltd v Ucorp Pty Ltd) came to similar conclusions. In Europe, the Court of Justice (CJEU) has also declared on various occasions that “originality” must reflect the “author’s own intellectual creation,” This is usually assumed to be a human author.
Recognizing the work that goes into creating a program or machine capable of generating other artistic and creative works, a few countries such as India, Ireland, New Zealand and the UK, have taken the approach that authorship belongs to the programmer. UK copyright law says: “In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken.”
WIPO agrees, saying that “granting copyright to the person who made the operation of artificial intelligence possible seems to be the most sensible approach, with the UK’s model looking the most efficient. Such an approach will ensure that companies keep investing in the technology, safe in the knowledge that they will get a return on their investment.”
But what about the authors whose work has fed the AI?
Joanna Penn paints the scenario:
Suppose I could feed the works of Stephen King, Dan Brown, John Connolly, Jonathan Mayberry and other favorite authors alongside my own books into an AI, then have it generate a first draft that is a sum of everybody’s words. I give it some prompts to come up with a new work and that generates a good first draft. I rewrite and edit. Who owns that book?
There’s no law against this right now, but ethically I would think that Stephen King et al. deserve some credit and money.
If there is no license, and no law to prevent such use, then natural language generation systems could write the next prize-winning or bestselling book based on the works of existing authors, for example, a mash-up of Margaret Atwood, Marlon James, and poet Raymond Antrobus, without those creators receiving recognition or financial reward.
Copyright in the expression, the use of words, not ideas. Shakespeare took his plots from stories that everybody knew, or real-life events, but what made them Shakespeare was the genius of his expression. By the time he had written up the story, it was something completely different from the original and today lines from Hamlet or Midsummer Night’s Dream have entered everyday language, but their source material is forgotten.
Right now, there is no copyright or licensing arrangement for using an author’s original writing as part of a machine learning model. We lack a consensus on how to judge the originality of a work essentially composed of random snippets of thousands, or even millions, of input works. Creative Commons advocate a cautious approach to copyright legislation, arguing that AI needs to be properly understood before any copyright implications can be addressed.
Meanwhile, words are already being generated and the ability of the machines to produce content “autonomously” with no direct human involvement, is increasing.
Is Artificial Intelligence Intelligent?
Steven Poole, author of Rethink: The Surprising History of New Ideas argues that AI is not intelligent, not “if by ‘intelligence’ we mean what we sometimes encounter in our fellow humans. [It’s] just using methods of statistical analysis, trained on huge amounts of human-written text… It can be eerily good, but it is not as intelligent as, say, a bee.”
But does that matter, if the results please the reader?
What Does It All Mean for Indie Authors?
AI is the single most important development facing humanity in the first half of the 21st century. It is now, already, our most powerful technology, and we need to understand it so we can surf the change rather than drowning in the deluge.
It all begins with how we talk about it. On one hand, we have commentators like Poole arguing that AI will never overtake human creators because it is not human.
Writing is not data. It is a means of expression, which implies that you have something to express. A non-sentient computer program has nothing to express, quite apart from the fact that it has no experience of the world to tell it that fires don’t happen underwater. Training it on a vast range of formulaic trash can, to be sure, enable it to reshuffle components and create some more formulaic trash. (Topics “highly represented in the data” of GPT2’s training database were Brexit, Miley Cyrus, and Lord of the Rings.) All well and good. But until robots have rich inner lives and understand the world around them, they won’t be able to tell their own stories.
This writer-centric argument seems to miss the point from a readers’ perspective (one man’s formulaic trash is another’s dearly beloved book) and it certainly misses the copyright point. At a practical level, this debate is not about authors versus AI as much as what happens when authors and other publishers with AI at their disposal?
We wrote about the plagiarism scandals that have rocked the author community is the second part of this series. When a human writer commits plagiarism, it is currently considered a serious matter. It is equally serious–but far harder to police –humans getting together to write a computer program that deliberately commits plagiarism. Or, as we see too often in tech announcements, AI rhetoric and hyperbole overlooking the human beings behind the works.
To AI evangelists, all the creations of human brainpower can be reduced to data, which can be combined and reproduced faster, better and in mindblowing quantity. It’s fast-paced exciting stuff but as we admire the doings of what looks like super-human technology, we need to remember that the ability to produce and reproduce this data depends on human labor and intelligence–those writing the works that form the training sets, those creating the tech.
Copyright law is what is known as passive law. It’s rarely invoked but the fact that it’s there is what allows publishing to continue and authors to make their living. Yes, copyright is broken if we think of it as a way of controlling AI, and keeping it in a box. But if we understand that it’s a guiding set of principles that allow trade to happen, it still has a role to play, even though there are those who will push the boundaries and break the rules.
It’s important for authors, especially indie authors, to have our voices heard, to be informed and to get involved in copyright decisions which could have a serious impact on our livelihoods and autonomy.