Harvard and Google Release AI Training Dataset with Public Domain Books, Raising Copyright Questions: Self-Publishing News with Dan Holloway
Any process that improves by being trained on a set of materials will only ever get as good as the materials it’s trained on will allow. That’s as true of machine learning algorithms as it is of human beings. This week’s news that Harvard will release a dataset of 1 million volumes for AI training highlights efforts to address inequities caused by the need for high-quality training data.