CybersecurityManagementSecurity Enterprise ServicesSecurity Leadership and ManagementLogical SecuritySecurity & Business Resilience

Minimizing data breaches from human error with randomized re-representations

By Hector Leano

artificial intelligence AI graphic — *Image via Pixabay*

The accelerated adoption of artificial intelligence (AI) technology in businesses has heightened data risks, with more volumes of data in transit and AI-assisted black hat actors, amongst other reasons. Interestingly, the primary risk is not intricate hacking techniques but human error.

The 2022 Cost of a Data Breach Report from IBM and the Ponemon Institute finds that human error accounts for 21% of data breaches. Human error can include a wide range of mistakes, from configuration errors in technology systems to simple mishaps like sending sensitive information to the wrong person.

Imagine if we could secure data by making it unreadable to humans.

It's time to look at modern solutions that go beyond traditional encryption or firewall protection. A determined bad actor can use advanced analytics to unlock conventional encryptions, tokens and data masking techniques. Data needs to be sent to different parties for AI to run on, which involves giving up data ownership across firewalls. Moreover, AI models are distinctly susceptible to specific attacks aimed at extracting sensitive data by AI-assisted black hat parties.

One solution gaining momentum in the realm of AI data security is Randomized Re-Representations, a new concept that ensures data remains indecipherable to humans, significantly reducing the risk of breaches caused by human errors.

The human factor in data security

Despite the best defensive mechanisms in place, unintentional mishandling or deliberate leaks pose a constant threat to data security. This is particularly problematic when data needs to be shared with third parties for purposes such as machine learning.

In such cases, sensitive data often leaves the realm of the owner's control in plain-text format, making it vulnerable to potential breaches. For instance, human data labelers in 2022 inadvertently leaked sensitive user images from test robot vacuums, highlighting the need for solutions that can reduce the impact of such leaks.

Taking control with randomized re-representations

Randomized Re-Representation is a promising technology developed to tackle this issue head-on. It transforms raw data into a randomized format that is intelligible only to a specific AI model, rendering the data useless to another human or machine if leaked. This is a one-way, irreversible transformation that cannot be decrypted.

For instance, consider the infamous Equifax data breach that exposed the personal information of millions due to human error. If the exposed data had been in the form of Randomized Re-Representations, the breach's impact would have been drastically diminished.

Insisting on non-plain-text data

To effectively mitigate the impact of data breaches, it's crucial to insist on data never appearing in a plain-text form, even when encrypted or decrypted for operation. As an increasing number of businesses utilize Large Language Models (LLMs) or Foundation Models, maintaining data ownership while sharing data for training or fine-tuning these models is crucial. Some third-party solutions employ Randomized Re-Representation, ensuring your data remains under your control and never appears in a plain-text format. This novel concept guarantees the protection of your data even during transfer, providing an added layer of security.

AI and machine learning: Minimizing human involvement

You may question, "Don't humans need to interpret the data eventually?" Yes, perhaps when a regulator or authority needs to inspect the data. Otherwise, AI and machine learning play pivotal roles in reducing human involvement in data interpretation. These systems can be trained to decipher and analyze data, removing the need for human involvement and minimizing data leak risks. For example, some companies are already leveraging AI to create Randomized Re-Representations of sensitive data, thus enabling safe and secure data analysis.

Future prospects

Adopting advanced data security models like Randomized Re-Representation is not without challenges, but with a combination with existing data governance and controls, businesses can create a robust data security environment that addresses the human risk factor effectively.

Human error represents a significant data security concern. By converting data into a format that is indecipherable to humans but readable by specific AI, we can address a large share of data security threats. This technology's implementation hints at a promising shift in data security approaches — one that acknowledges and tackles the human risk factor head-on.