It’s Authors vs. OpenAI: Self-Publishing News Podcast with Dan Holloway and Howard Lovy

Today on the Self-Publishing News podcast: It's authors vs. OpenAI as lawsuits and petitions seek compensation for content scraped by ChatGPT. But, as News Editor Dan Holloway and News and Podcast Producer Howard Lovy report, the issue is not that simple. Also, USA Today relaunches its bestseller list, which is good news for indie authors. Howard and Dan discuss these and other stories making the news this month in indie publishing. 

Self-Publishing News: Authors vs. OpenAI

About the Hosts

Dan Holloway is a novelist, poet and spoken word artist. He is the MC of the performance arts show The New Libertines Earlier this year he competed at the National Poetry Slam final at the Royal Albert Hall. His latest collection, The Transparency of Sutures, is available on Kindle.

Howard Lovy has been a journalist for more than 30 years, and has spent the last eight years amplifying the voices of independent publishers and authors. He works with authors as a book editor to prepare their work to be published. Find Howard at howardlovy.comLinkedIn and Twitter.

Read the Transcripts: Authors vs. OpenAI

Howard Lovy: Hello and welcome to the July 2023 edition of Self-Publishing News from the Alliance of Independent Authors. I'm Howard Lovy, ALLi's News and Podcast Producer, and Book Editor at HowardLovy.com. Joining me is ALLi News Editor, Dan Holloway. Hello Dan, how are you?

Dan Holloway: Hi, I'm good. Here in Oxford, we seem to have escaped the heat that's over the rest of Europe. So, that's good.

Howard Lovy: Let's move directly to the news, and first I want to say, I guess we'll use just this disclaimer every time, I am real, I am not AI, this is my real voice talking to you. How about you, Dan, are you real?

Dan Holloway: I am, as I'm recording this. What you put out, given that you're the podcast editor, I cannot similarly vouch for necessarily, but the input is mine.

Howard Lovy: Right, got it, and that brings us to our first topic, which are lawsuits against OpenAI, the company that runs ChatGPT.

Let me quote from the great Dan Holloway in his latest column. “Protecting the livelihoods of rights holders is a worthy agenda. As an author, I wholly support the notion that I should be able to make a living, should enough people want to buy my work. The assumption that it is the job of the law to protect my living from technology, an industry that provides a living for others, of course, sits less comfortably.”

So, that's a strong statement, and we'll get to that, but first let's talk about the lawsuits against OpenAI and what they mean.

Dan Holloway: Yeah, where to start?

So, there's all sorts that's happening against AI. I'll start with the FTC investigation, because that's the most recent thing. I guess this is of interest to start with, because the FTC obviously isn't an industry body. So, it's more neutral and people might see it's not just authors going on about AI again, as I'm sure some people might think.

The FTC has complaints about the way that AI is allegedly leaking data and giving misinformation.

So, those are two concerns that we see a lot about in the media. It seems that there are cases where people's personal data are being included in answers provided by ChatGPT, and this suggests that it is actually being trained on personal sensitive data.

Howard Lovy: By sensitive personal data, is that stuff that's already out there on the internet or is it somehow hacking into things that aren't supposed to be out there?

Dan Holloway: I don't think there's any suggestion that it's hacking. There's all sorts of questions about what we mean by the public domain again, isn't there? This is one of the oldest debates in the book, when is something out there in the public domain and what is out there in the public domain?

Yeah, it's giving answers. If someone were to ask about, I don't know, a particular health condition, for example, then it might give answers based on someone's personal blog of their personal experiences and identify them to a wide audience that they might not have intended, for example.

There are concerns over data security. There are also obviously concerns over misinformation and the lack of checks and balances. So, that's what the FTC is looking at.

But the first of those, which is what it's been trained on, that goes to the heart of what authors are bringing lawsuits against OpenAI in relation to.

So, two authors in particular, Mona Awad and Paul Tremblay, are alleging that ChatGPT, it demonstrates by its answers that it must have been trained on their books, and they haven't given permission for it to be trained on their books. So, it's a very similar issue about what constitutes something that you can legitimately train and AI on. What constitutes a dataset that's okay to use in this way?

Howard Lovy: Now, they can prove that it actually trained on their specific work. Is ChatGPT spitting out their exact words?

Dan Holloway: Yeah. So, this comes to the heart of a really tricky question, which seems to be based on what kind of summary ChatGPT is able to give of novels. So, it is able to summarize their novels in ways that they claim it couldn't do if it's only looked at things that are what they would say are, again, in the public domain. So, it can produce really detailed accounts of their stories in a way that you couldn't get off someone's essay, for example, or off someone's blog, or someone's review. It would have to have read the books themselves, and yet they haven't given permission to read the books themselves.

So, one of the things this leads to is the question of, how do you know what's a reasonable summary to be able to give of something you have only read secondary materials on.

So, as you know, I work in the university, so this is essentially a vast part of my job, and as a student, it was one of the things we faced all the time.

I remember this was, for want of a gross generalization, one of the differences between the kind of essay you write as an undergraduate and the kind of essay you write as a graduate. It's one that's based on secondary materials as opposed to one that's based on primary materials. You can write really good essays based on secondary materials that really make it look as though you've read the actual work.

As someone who spent a lot of my undergraduate life doing exactly that, I am sure if I didn't, either my tutors were too polite to say anything or they didn't notice, or they just got fed up because everyone did the same thing. But you can write really good essays based on not having access to the original material.

So, it's strange, but it's also going to be very interesting to see what the courts decide.

Howard Lovy: Not only that, but do they have to prove that their livelihood was impacted by this?

Dan Holloway: No, they just have to prove that it was stolen. They just have to prove that something was used that they didn't give permission to, that should have sought permission.

Howard Lovy: Right, interesting. I was concerned with something that I was writing that it might be too similar to another book. So, I asked ChatGPT to write me a summary of this other book, and it did it, along with the ending. So, now I'm wondering, did ChatGPT do that legally, or what did it base it on?

Dan Holloway: Did it stitch things together from hundreds of other things to come up with this overall picture, or had it read the book without permission?

Howard Lovy: Yeah, I don't know. It's what happens in that black box that nobody's certain what happens.

Dan Holloway: Yeah. I mean, the people feeding the information are sure, and this leads on to the really interesting bit, which is that ChatGPT claims that book knowledge comes from the corpora of publicly available, unpublished books. So, this is a really interesting claim. There's, I think they call them Corpus 1 and Corpus 2, and they're unpublished titles, and they are used to train ChatGPT in all the kinds of things that you would get from just giving it access to generic manuscripts, so idiom and plot, and so on.

And one of the people who worked on the producing those corpora has claimed that a lot of the books involved come from Smashwords, which is obviously of interest to us because a lot of indies use Smashwords, and they claim that the Smashwords free books make up a large part of that. training data set.

So, there's all sorts of issues there around if something is free, then what does that mean for copyright? Is it in the public domain, because anyone could have downloaded it for free, and then what can they do with it? So, it's really interesting.

The UK media have slightly got the wrong end of the stick, as always. They claim that these books on Smashwords equate to unpublished works. That because they were available for free, that means that they are unpublished, which is, as far as I'm aware, a strange definition of the word published. It's clearly interesting that things that are freely available or available for free are being treated as the same.

Howard Lovy: Right. It's available for free for you and me, but not for OpenAI. It doesn't make a lot of sense.

Dan Holloway: Yeah, so it comes down to this question of what copyright is again. Is it, I'm making it freely available for you to read, but I'm not necessarily making it freely available for you to do other things with.

I mean, it's enshrined in the principle of creative commons, for example, the non-commercial attribution, whether you can alter it, and so on. So, it's a really interesting question, what can books that are available for free, or have been downloaded, or have been bought, what can you do with them?

But if they've been using books from Smashwords, then that's quite a serious allegation.

Howard Lovy: Right. Now, meanwhile mainstream authors are up in arms and are signing petitions, urging AI companies to stop using their work without permission. There seems to be a consensus among many authors that this is all a bad thing, but that's not necessarily the whole story.

I don't know, I'm of two minds about the whole thing. What's your take on it?

Dan Holloway: Yeah. So, the Writers Guild of Great Britain have just published a think-piece on this as well, which I'm reporting on this week's column. It's full of things. Yes, the demands they're making make sense; they're exactly what you would expect in this content. They want transparency about what is being used to feed AI. They want human checks and balances to make sure that it's being used properly. The central argument of it seems to be, and this is another thing that strikes me as slightly odd, that the AI will never be able to replicate the truly human, and that's always strikes me as a strange thing to put into an argument piece like this. That AI can do very important things and very valuable things, but it will never be able to replicate human originality. That doesn't seem to fit with, therefore we ought to control it and we ought to stop it doing various things. It seems like there's a bit of a muddle with what people are saying about it.

Howard Lovy: I don't know if you've listened to Joanna Penn and Orna Ross who had a recent AskALLi Advanced Self-Publishing podcast talking about nothing but AI, and they talked about it, it becomes a writer's tool. It's not going to write your book for you, but it's going to prompt you. It's going to help you.

I've used it in a similar way to get things going, but ultimately, what comes out the other end is something original from me not from AI.

Dan Holloway: I understand that, but the problem then though, is what happens when AI does produce something truly original?

I mean, it may, or it may not, but if it does, then it seems that by making your central argument that it can never be truly original you're holding a massive hostage to fortune. Because if it does then come up with something original, that's cut your argument from underneath you. So, it seems like we want a better reason to say that AI needs to be controlled than simply saying it's never going to do what humans can do.

Howard Lovy: Right, exactly. Yeah.

Dan Holloway: So, if we're talking about human careers, then that's fine. By all means, we need to argue about the value of what humans do. Again, I'm not sure that law courts are the way to do it, but that should be where the focus of the argument is, that there is something essentially valuable about a human contribution, but I'm not sure that we can say that's down to the content produced.

Howard Lovy: Oh, I see. Interesting. You sound a lot like my son who's 19 years old. He's in art school now and he's arguing the same thing. It may not be able to write your book for you yet, but future generations probably will.

Dan Holloway: Yeah, I think future generations, in a couple of months’ time. I don't think it's, like decades away. I think it's really quite close to the time when it will be able to do that, and if we've been relying on the argument that it could never do that.

The problem is that we've seen it with art already. Literally within the space of a year, we went from, this is rubbish, it'll never do anything of any artistic value to, oh my goodness, this is amazing. And we want to be careful that the argument still stands at the end of the process.

Howard Lovy: Oh, I'm old enough to remember when digital cameras first came out and the photography community said, no, that'll never replace a real film.

So, it's a matter of incorporating the technology. So, do you think all these thousands of writers signing these petitions are a little behind the times, or what do you think?

Dan Holloway: People are obviously going to be worried about their livelihoods, and there is something valuable about having a society full of humans that do creative things. How we protect that is the question. I wouldn't argue that it shouldn't be protected, it's how we protect it that becomes the question.

I mean, this is one of the things that's behind the Hollywood writers’ strike, a very interesting piece I report on this week as well, suggesting that's really going to shake things up, and find out what happens legally because obviously there is so much more money in Hollywood than there is in publishing.

Howard Lovy: Right, but strangely the actors and writers don't seem to be making a lot of the money. They call it Hollywood accounting; every movie seems to be losing money.

Dan Holloway: Yeah, I don't think that's any different from publishing, is it? Somehow, it's always the people who produce the stuff that make the money.

Howard Lovy: All right. Well, is there anything else we want to say about AI before we move on to our next topic?

Dan Holloway: No, I will leave it there. It's very interesting to see what the lawsuits do. But yeah, I think that there is an infinite value in what humans do, and having a creative society, but we need to be more creative about how we protect creativity, rather than just holding back a flood that isn't going to be held back.

Howard Lovy: Right. So, let's move on to another top story today, which is on the recent revival of the USA Today Bestseller list, which is of significant interest to independent authors and booksellers. The list's return was influenced by groups such as ALLi and may present a shift toward more inclusive data sources and better representation of indie authors.

Dan Holloway: Yeah, it's just a nice, good news story, I think. ALLi were part of the move to have it reinstated, and what's really nice is it collects data from all over the place. It's online sales, it's bookshop sales, and they're doing all sorts of partnerships. So, there is a partnership with an actual local bookshop and they're partnering also with bookshop.org, which is the platform that lets people buy books online and support their local indie bookstore. So, you can buy books through the USA Today Bestseller list and support your local bookstore while doing so, which doesn't feel like a bad news story.

Howard Lovy: Right, and indie authors can legitimately claim that they are on a bestseller list and not, like you mentioned in your column, a sub-sub-sub list on Amazon.

Dan Holloway: Yeah, it's a big list and it would be great to see it filled with indie authors.

Howard Lovy: Well, I hear from the background music that it's time for a tech corner, and we're not going to talk about AI this time. We're going to talk about Threads, which is the new social media offering from Meta.

I am among the Twitter users who grew fed up with the glitchiness of Twitter, not to mention the questionable opinions of Elon Musk. So, I opened up a Threads account but then I left it there, largely unused, because I just don't have the time or the patience to devote to another social media platform.

So, tell me more, Dan, about what Threads is, and also the Zuckerberg versus Musk, figurative or literal, cage match.

Dan Holloway: Yeah, have we talked about the cage fight before?

Howard Lovy: I don't think we have, no.

Dan Holloway: Yeah, so Mark Musk are apparently getting in a cage fight. I mean, it's the sort of thing that is classic Silicon Valley, large ego. In the UK, it's Oxbridge, there's a certain kind of Oxbridge politician who does this stuff. In America, it seems to be a certain kind of Silicon Valley entrepreneur, and they just operate by different rules that seem to be involved.

Howard Lovy: Yeah, they haven't grown up, they're like little boys.

Dan Holloway: Yeah, somebody should write a parody, except you couldn't parody it, which is exactly the same as politics come in this country.

So, Facebook started its own, sort of Twitter-killer, with Threads, a hundred million users signed up within the first few days. Twitter have got very angry about this; they claim it's a rip off and they have threatened to sue Meta for starting it unless they make it demonstrably different. They claim that they've hired Twitter engineers to produce it, that a lot of the functionality is the same.

Howard Lovy: They hired former Twitter engineers because Elon fired a lot of them when he first took over.

Dan Holloway: Yeah, so they were all looking for jobs.

I've never used it, but I'm also finding that Twitter has become increasingly difficult to use. The increasing emphasis on Twitter blue on subscribers just feels slightly stifling, and that leads to, oh, yes, sorry.

Howard Lovy: Also, the racist and antisemitic content has really increased since Elon took over, and that's the problem I have with Twitter, is I'm very much involved with Jewish Twitter, which, of course, is just infested with antisemites interrupting our conversations, and there's no guarantee that threads won't turn into that, too. So, I'm devoting more time to my Substack newsletter, where we can discuss these issues with those who want to have a serious discussion.

Dan Holloway: Yeah, I think this subscription model where you choose, in a very limited way, with whom you have these conversations is going to become more popular. It might make it harder to find an audience, but you'll get much more fruitful discussions out of them.

Howard Lovy: Right. So, yeah, I've been trying to figure out personally where to go. I know I want to de-emphasize Twitter because it's just turning into a cesspool, but I'm not sure what to do about it.

Dan Holloway: You will have seen this week as well that, interestingly they are now going to start revenue sharing from adverts with creators. So, people who sign up to Twitter Blue, this is the latest move to try and get people to sign up to it, will get to share in ad revenue. Whether that will compensate for the subscription fee, who knows. This comes the same week that there was a story on the BBC website saying that ad revenue has halved on Twitter, and ad industry representatives have said that they find the whole thing about restricting tweets to be really rather odd. As an advertiser, you don't necessarily want to limit the number of people who can see your advert.

Howard Lovy: To me, it feels like an excuse. I think they're having some serious engineering problems there, and they're using that as an excuse to scale back everybody.

Dan Holloway: Yeah, it feels like an odd use of 44 billion dollars, I have to say.

Howard Lovy: So, what about you, Dan? How are you filling your social media needs? Are you still heavily on Twitter or are you moving to something else?

Dan Holloway: I'm not, no, I'm not really very much at all. It's hard to know where to go, and it's hard to know how to build an audience without also attracting the kind of attention that you, to be honest, you don't want to spend all your time getting embroiled in.

Howard Lovy: Right, exactly. It's a waste of time.

I know that in addition to being an author you're also an advocate for disability rights, and so you need some sort of public platform.

Dan Holloway: You do, but you also get, like you say on Twitter, you get the kind of people that you really don't want to be interacting with and that makes it really hard. So, I end up, I guess I spend time on LinkedIn these days, which isn't really necessarily what I want to be doing. I don't know what the answer is. If someone gives me 44 million or billion dollars, I'm sure I'd come up with a better answer than the ones we've got.

Howard Lovy: They can do that. Just look you up at selfpublishingadvice.org, and I'm sure you can tell them your bank account information.

Well, thank you as always, Dan, for your insight into the news this month. Of course, we can always catch up on the latest in your column at selfpublishingadvice.org, and I'll see you next month, Dan.

Dan Holloway: See you next month. Take care.


Author: Howard Lovy

Howard Lovy is a novelist, nonfiction author, developmental book editor, and journalist. He is also the news and podcast producer for the Alliance of Independent Authors. You can learn more about him at https://howardlovy.com/


