Data obfuscation

I came across this article on New Scientist (link here) about using AI to predict when people will die, their personality traits, and job suitability. I personally feel strong objections about the use-case, but regrettably, although unsurprisingly, the researchers show strong predictive power.

The opportunity for businesses to leverage equivalent capability to manipulate and discriminate against individuals is vast, and I am not confident that regulations will be effective in stopping the widespread misuse of such capability.

If we consider the data we share, or is collected on us, such as LinkedIn, Rewards Programs, Digital Purchases, Search History, Social Media etc, then the possible insights that can be captured are tremendous, powerful, and staggering in their potential for misuse.

Even if not misused by those who collect it, the potential for it to be hacked and misused by bad actors will never go away. This leads to a need for scalable strategies to counteract the collection of privileged personal data, and hinder the growth of AI use-cases for personally predictive outcomes.

I think there may be an opportunity to deploy data obfuscation at scale, to disperse fake, but plausibly realistic data into these data-sets. Making it indistinguishable from real data makes the real data worthless for training AI.

If we consider a use-case for an enterprise, imagine a product/service where we use AI to generate highly accurate, plausible, but ultimately fake data, and disseminate the data publicly before a hack might happen. This reduces the attractiveness of the company as as target for hackers, as any real data that might be hacked would be impossible to distinguish from the fake data when distributed publicly. It would be identical to 1000 other copies of the data out there, but no one would know what was real and what was fake.

I think there are both products for individuals as well as for enterprises which could be developed.

Update

The financial impact of data loss is huge for large enterprises Optus saw their brand value drop by $1.2B and they put aside $140M to cover the cost of the breach impact. (link 1, link 2) Medibank saw their brand value drop by $1.8B and saw $46M in expenses. (link 1, link2)

This product could significantly reduce this loss of value for shareholders, and cost to customers.

Potentially it could be bundled with a cyber insurance product? As a way of reducing insurance premiums? Perhaps partnering with an insurance provider is a way to acquire customers.

Update 16 Jan 2023

I have come across some great resources on Row Conditional Transaction Generative Adversarial Networks. These models look to replicate entire data bases, maintaining the relationships across tables and child/parent keys.

There are specifically two repositories which have great tools that are capable for training entire tables and databases with very little pre-processing. ydata-synthetic (link) and SVD (link)

Comments

Leave a Reply Cancel reply