OpenAI’s data hunger raises privacy concerns

Last month, OpenAI came out against a yet-to-be enacted Californian law that aims to set basic safety standards for developers of large artificial intelligence (AI) models. This was a change of posture for the company, whose chief executive Sam Altman has previously spoken in support of AI regulation.

The former nonprofit organisation, which shot to prominence in 2022 with the release of ChatGPT, is now valued at up to US$150 billion. It remains at the forefront of AI development, with the release last week of a new “reasoning” model designed to tackle more complex tasks.

The company has made several moves in recent months suggesting a growing appetite for data acquisition. This isn’t just the text or images used for training current generative AI tools, but may also include intimate data related to online behaviour, personal interactions and health.

There is no evidence OpenAI plans to bring these different streams of data together, but doing so would offer strong commercial benefits. Even the possibility of access to such wide-ranging information raises significant questions about privacy and the ethical implications of centralised data control.

Media deals

This year, OpenAI has signed multiple partnerships with media companies including Time magazine, the Financial Times, Axel Springer, Le Monde, Prisa Media, and most recently Condé Nast, owner of the likes of Vogue, The New Yorker, Vanity Fair and Wired.

The partnerships grant OpenAI access to large amounts of content. OpenAI’s products may also be used to analyse user behaviour and interaction metrics such as reading habits, preferences, and engagement patterns across platforms.

If OpenAI gained access to this data, the company could gain a comprehensive understanding of how users engage with various types of content, which could be used for in-depth user profiling and tracking.

Video, biometrics and health

OpenAI has also invested in a webcam startup called Opal. The aim is to enhance the cameras with advanced AI capabilities.

Video footage collected by AI-powered webcams could translate to more sensitive biometric data, such as facial expressions and inferred psychological states.

In July, OpenAI and Thrive Global launched Thrive AI Health. The company says it will use AI to “hyper-personalise and scale behaviour change” in health.

While Thrive AI Health says it will have “robust privacy and security guardrails”, it is unclear what these will look like.

Previous AI health projects have involved extensive sharing of personal data, such as a partnership between Microsoft and Providence Health in the United States and another between Google DeepMind and the Royal Free London NHS Foundation Trust in the United Kingdom. In the latter case, DeepMind faced legal action for its use of private health data.

Sam Altman’s eyeball-scanning side project

Altman also has investments in other data-hungry ventures, most notably a controversial cryptocurrency project called WorldCoin (which he cofounded). WorldCoin aims to create a global financial network and identification system using biometric identification, specifically iris scans.

The company claims it has already scanned the eyeballs of more than 6.5 million people across almost 40 countries. Meanwhile, more than a dozen jurisdictions have either suspended its operations or scrutinised its data processing.

Bavarian authorities are currently deliberating on whether Worldcoin complies with European data privacy regulations. A negative ruling could see the company barred from operating in Europe.

The main concerns being investigated include the collection and storage of sensitive biometric data.

Why does this matter?

Existing AI models such as OpenAI’s flagship GPT-4o have largely been trained on publicly available data from the internet. However, future models will need more data – and it’s getting harder to come by.

Last year, the company said it wanted AI models “to deeply understand all subject matters, industries, cultures, and languages”, which would require “as broad a training dataset as possible”.

In this context, OpenAI’s pursuit of media partnerships, investments in biometric and health data collection technologies, and the CEO’s links to controversial projects such as Worldcoin, begin to paint a concerning picture.

By gaining access to vast amounts of user data, OpenAI is positioning itself to build the next wave of AI models – but privacy may be a casualty.

The risks are multifaceted. Large collections of personal data are vulnerable to breaches and misuse, such as the Medisecure data breach in which almost half of Australians had their personal and medical data stolen.

The potential for large-scale data consolidation also raises concerns about profiling and surveillance. Again, there is no evidence that OpenAI currently plans to engage in such practices.

However, OpenAI’s privacy policies have been less than perfect in the past. Tech companies more broadly also have a long history of questionable data practices.

It is not difficult to imagine a scenario in which centralised control over many kinds of data would let OpenAI exert significant influence over people, in both personal and public domains.

Will safety take a back seat?

OpenAI’s recent history does little to assuage safety and privacy concerns. In November 2023, Altman was temporarily ousted as chief executive, reportedly due to internal conflicts over the company’s strategic direction.

Altman has been a strong advocate for the rapid commercialisation and deployment of AI technologies. He has reportedly often prioritised growth and market penetration over safety measures.

Altman’s removal from the role was brief, followed by a swift reinstatement and a significant shakeup of OpenAI’s board. This suggests the company’s leadership now endorses his aggressive approach to AI deployment, despite potential risks.

Against this backdrop, the implications of OpenAI’s recent opposition to the California bill extend beyond a single policy disagreement. The anti-regulation stance suggests a troubling trend.

OpenAI did not respond to The Conversation’s request for comment before deadline. Läs mer…