Remember last week when I virtually popped a champagne cork at having gone a whole seven days without an AI story? Well, er, the reverie is over, with this OpenAI paper. But I hope the absence has left you wanting because this is a story you really need to read.
There are several foundational issues in the way large generative AI models are trained. One is consent. One is attribution of rights to the resulting work. But, given the financial impact, the recent Society of Authors survey showed AI is already having on creators’ incomes, a really big one is compensation. If you are going to train your AI on my work, how are you going to pay me?
OpenAI has entered into a series of financial agreements with publishers in exchange for the right to use their creative material to train AI models such as ChatGPT. But at the end of last month, they published a paper hinting at a longer-term model for an ongoing payment model with creators.
The paper is called “An Economic Solution to Copyright Challenges of AI.”
I have ended this piece with the paper’s abstract, quoted in full. It captures a lot about technology’s understanding of the problem, as well as the mechanism of the proposed solution. TechCrunch breaks down what the cooperative game theory model means in essence. The long and short of it is that, for any output, the aim is to calculate which inputs contributed most to it and split any compensation between them.
What does the OpenAI model really mean?
The simplistic way I think of this model is like the paint mixers you get in hardware stores. You ask for a certain color of paint. The mixer has a set of base colors (variants of red, blue, and yellow or, digitally, RGB/CMYK) from which it mixes your request. Different colors are made by mixing different amounts of each base element. At a very simplistic level, we can think of the creative inputs to an AI model in terms of those base colors, and the compensation package being proposed as being based, for each output, on the proportion of each base ingredient in that output.
Of course, you might say (I am but a humble reporter. I never would!) that thinking of creative work in the same way as primary paint colors doesn’t quite capture the intrinsic value of the artistic process.
Generative artificial intelligence (AI) systems are trained on large data corpora to generate new pieces of text, images, videos, and other media. There is growing concern that such systems may infringe on the copyright interests of training data contributors. To address the copyright challenges of generative AI, we propose a framework that compensates copyright owners proportionally to their contributions to the creation of AI-generated content. The metric for contributions is quantitatively determined by leveraging the probabilistic nature of modern generative AI models and using techniques from cooperative game theory in economics. This framework enables a platform where AI developers benefit from access to high-quality training data, thus improving model performance. Meanwhile, copyright owners receive fair compensation, driving the continued provision of relevant data for generative model training. Experiments demonstrate that our framework successfully identifies the most relevant data sources used in artwork generation, ensuring a fair and interpretable distribution of revenues among copyright owners.
Find Out More:
That’s still not a good solution, though better than not having one at all. Essentially the AI company is monetizing MY content on my behalf and giving me a share. But how would they know how much more I could have made with real visitors on my content? I could potentially make much more than what they would compensate me with.