Every day, all sorts of businesses you might not even know are scooping up information on you. In an increasingly digitalized world, your private data has become a precious commodity. Data is the fuel which powers profitability.

To better regulate the use of personal data and protect citizens, the European Union adopted the General Data Protection Regulation (GDPR), which came into force on 25 May 2018. In the UK, the GDPR is tailored by the Data Protection Act 2018. Non-EU businesses with offices in Europe, or who hold or process data coming from Europe, also need to be fully appraised of GDPR.

The digital revolution has made it easier for companies to collect insights on their markets to better understand their clientele's behavior. But it has also paved the way for potential abuses, creating a climate of suspicion. How can AI earn the public’s trust?


Two years of GDPR: Tension between AI and data protection laws

GDPR impacts AI R&D because machine learning requires more complex data than other technologies. Big data means big privacy problems. The new regulatory constraints have disrupted the mindset of data scientists and design engineers. In the past, their mission was solely to find innovative ways to harness big data using the most appropriate algorithms. Now, questions of transparency, and the representativeness of the dataset used to train AI have become more challenging.

Data management is not just a B2C concern. In B2B, it is a far-reaching challenge because it impacts the entire supply chain. For instance, a software firm spots a need: businesses want to understand customer behavior and adapt accordingly. And so the software maker develops an artificial intelligence solution. They feed the AI massive amounts of customer data and teach the machine to recognize patterns in the way customers act. But where does this data come from? Is it reliable? Is it representative of the target population? If not, the machine will learn wrong and make biased recommendations. In turn, the software user will make misguided decisions, and at the end of the chain, the affected individuals may file complaints for discrimination or breach of GDPR.

This is actually the case of Spanish supermarket chain Mercadona, currently under investigation for use of an AI facial recognition system to keep out undesirable shoppers, such as people who have been convicted of shoplifting. Here we see that the B2B relationship between the AI vendor AnyVision and the supermarket has affected the B2C relationship between the supermarket and its customers.

With AI, predictive analytics can go beyond describing customer behavior, to predicting how customers will behave in the future. Curating the data to accomplish this is a crucial job, especially under GDPR. Develop a unique, representative, high-quality dataset, and you deliver added value, edge out the competition, and build customer trust. But use a distorted dataset and opaque processes, and there will be a knock-on effect of lost trust throughout the supply chain, with reputational, legal, and financial consequences.

Compliance is a great opportunity to increase trust with your users

What good is brilliant R&D, if AI product roll-out is held up by a GDPR compliance review? To develop and deploy an AI application that is GDPR-compliant, you need to integrate the regulatory requirements at every stage of R&D.

In fact, an optimal AI pipeline has three stages:

  1. collect, annotate, and harmonize data;
  2. explore, select, and validate AI models;
  3. test and monitor models.

Beyond gathering the most suitable data to train the AI platform, and understanding the limitations of the AI models, this process is intended to ensure GDPR compliance.

In the UK, the GDPR regulatory body is the Information Commissioner's Office (ICO). The ICO has set out seven GDPR principles, of which fairness and transparency are the most important challenges for the R&D pipeline.

Facial recognition technology, for example, is believed to be inherently unfair. It often proves to be biased in terms of race and gender. AI learns from the good or bad data we feed it (so-called garbage in, garbage out). In other words, if R&D train the machine predominantly on photos of white male faces, it gets better at recognizing white men, not dark-skinned women.

In business to government (B2G), IBM and Amazon decided to withdraw their AI facial recognition systems, over concerns of unethical use by law enforcement during Black Lives  Matter protests. The companies feared losing the public’s trust.

In B2B, biased outcomes could occur where non-representative data is used by an AI tool to recommend more favorable discounts to certain types of enterprises, or deny a business loan based on skewed credit data.

The way ahead is to better explore and understand the data used to train artificial intelligence systems. R&D teams need to ask the right questions. “What are the distributions of the data subsets?” For personal data, “How representative is the data in terms of gender, race, and age?” For corporate data, “How representative is the sampling in terms of industries and business size?”

Furthermore, can the same AI model be used for all data subsets? Or should each identified subset have their own models? These questions are crucial to determine whether AI models can be trusted by all stakeholders, and which models need to be further refined.

Under Article 12 of the GDPR, users have a right to know how their data is being processed. What’s more, this information must be easily accessible in clear and plain language. Users have a right to object if they do not approve of the purpose for which their personal information is being used.

This poses a quandary for data scientists and design engineers accustomed to thinking of AI as a black box, a system whose inputs and operations are invisible to the user. In AI, algorithms analyze millions of data points in ways that users cannot comprehend, and data scientists struggle to explain.

The lesson learned for R&D teams is: build transparency into your processes, and prove your organization’s commitment to accountability. This leads to a new way of thinking about technical choices. For instance, linear regression, Bayesian, and tree-based AI model behaviors are more traceable and transparent than deep learning or multi-layer neural network algorithms.

Complying with strict regulations is always daunting. But GDPR constraints are also a business opportunity, if built into the design process early on. The AI pipeline should facilitate thorough testing before deployment, and monitoring after deployment. Better quality data; greater protection of personal information; and fairer, more transparent processes all add up to a compliant pipeline. This leads to greater trust and greater business opportunities –winning outcomes for all stakeholders!