Do we need government oversight for AI’s use of data?

By Jamie Dobson, founder of Container Solutions and author of ‘Visionaries, Rebels and Machines’

Since James Watt’s steam engine and its pivotal role in the Industrial Revolution, the appearance of various technologies—including the harnessing of electricity, Henry Ford’s moving assembly line, the microprocessor, the Internet, DNA technology, and mobile phones—have kicked off their own revolutions. But the Industrial Revolution is still the biggie – the most bang-for-your-buck we’ve had in terms of shifting human civilisation.

Until now.

Government oversight for AI’s data use?

Artificial Intelligence is coming for James Watt’s crown. And AI’s revolution will be very different. Unlike previous technological revolutions that primarily transformed industries reliant on physical labour, AI’s impact extends to intellectual and creative domains previously considered uniquely human.

AI’s appetite for data

Modern AI systems learn by digesting vast quantities of human-created content. They are sophisticated pattern-recognition systems trained on billions of examples of human creativity and knowledge.

Initially, tech companies trained these models on publicly available data, but as models grew more sophisticated, they required ever more data. Companies expanded their harvesting to include copyrighted content, paywalled articles, and private repositories. And that’s a problem for creators relying on compensation for their efforts, skill and talent. Additionally, apart from not being properly paid for their existing work, that work is being used to train the very systems that could soon replace them.

Currently, most jurisdictions have no specific regulations governing how companies can use publicly available data for AI training. This regulatory vacuum has allowed AI developers to operate under a take-first-ask-later approach, creating multi-billion-dollar technology platforms using content they didn’t create or license.

As governments worldwide grapple with these challenges, several regulatory approaches are emerging:

Opt-in or Opt-out Models

The simplest solution could be to create a system for opting content in or out of AI training models. In theory, this could be quick to implement with minimum complexity. Yet, given that some models are already being trained on copyrighted content (which should already be a legal “opt-out”), it might not be particularly effective.

For businesses, an opt-out system offers fewer obstacles to AI development but creates long-term legal uncertainty. An opt-in system provides clearer legal boundaries but potentially slower access to training data.

Data Rights and Compensation Models

Similar to how music and literary rights work, content creators could receive compensation when their work is used for AI training. This could be done on an ad-hoc basis, like music streaming, or through government distribution via a digital tax.

Collective licensing: Creators register with collecting societies that negotiate with AI companies and distribute payments based on usage. This model exists in music with performing rights organisations such as PRS in the UK, ASCAP and BMI in the USA, GEMA in Germany or SACEM in France.
Data dividend: A tax or fee on AI companies based on their data usage, with proceeds distributed to creators. This resembles public lending rights systems in countries like the UK, Canada, and Australia, where authors receive payments when libraries lend their books.
Direct licensing: Individual negotiations between major content producers and AI companies, with standardised terms for smaller creators.

AI as a Public Resource

Some experts advocate treating advanced AI systems like public utilities or natural monopolies. This would work similarly to electricity companies, for example, where the national grid is seen as a natural monopoly and the government implements certain standards and expectations for managing it as a public resource.

Private companies would continue developing AI, but under enhanced regulatory oversight
Transparency requirements would include regular audits and public reporting
Universal access provisions would ensure broad distribution of benefits
Price controls or licensing requirements would prevent monopolistic practices

This approach draws from how telecommunications, electricity, and other essential services are regulated in many countries. It acknowledges both the innovation potential of private enterprise and the public interest in fair, accessible AI systems.

Transparency and Technical Safeguards

Any potential regulatory framework will require some level of transparency and technical safeguards to ensure that AI is not operating as a black box. We need to know how the algorithms are fed and on what data to ensure that creators are fairly compensated and we aren’t introducing systemic biases into what will become ubiquitous technology. In other words, whatever system is chosen, it needs to be tracked and policed to ensure compliance.

The publishing industry could offer a useful approach. Copyright registration systems and ISBN standards create a framework for tracking and attributing written works. Similar systems could be developed for AI training data, creating both accountability and the technical infrastructure for fair compensation.

Conclusion

The current regulatory vacuum around AI’s use of data cannot persist indefinitely. Whether through government regulation, industry self-regulation, or landmark legal cases, new frameworks for managing AI’s relationship with human creativity must emerge.

The businesses that thrive won’t be those extracting maximum short-term value from unregulated data harvesting, but those building sustainable models that respect and reinforce the creative ecosystem upon which AI ultimately depends.

ABOUT THE AUTHOR

Jamie Dobson is the founder of Container Solutions, and has been helping companies, across industries, move to cloud native ways of working for over ten years. Container Solutions develops a strategy, a clear plan and step by step implementation helping companies achieve a smooth digital transformation. With services including Internal Developer Platform Enablement, Cloud Modernisation, DevOps/DevSecOps, Site Reliability Engineering (SRE) Consultancy, Cloud Optimisation and creating a full Cloud Native Strategy, companies get much more than just engineering know-how. Jamie is also author of ‘The Cloud Native Attitude’ and the recently published ‘Visionaries, Rebels and Machines: The story of humanity’s extraordinary journey from electrification to cloudification’. Both are available from Amazon and good bookstores.

https://www.container-solutions.com/

See more breaking stories here.