Artificial intelligence and machine learning are transforming the global business landscape. They enable organizations to deliver smarter and more secure services to their clients and customers, reduce costs, and increase efficiency. This “rush to automation” requires more and more data. But personal data used in AI/ML systems are cropping up privacy concerns for regulators, consumers, and users. New and complex organizational risks around the confidentiality of personal data need to be well understood and managed.
Privacy surfaces in the context of AI/ML in many ways, including the design of AI/ML systems according to privacy principles, as well as their security, explainability, fairness, and human oversight. Various existing privacy laws comprise those requirements for AI/ML practices, and increasing enforcement and new regulations are underway.
It is critical that organizations understand the privacy requirements that currently apply to AI and ML applications. Not being aware of compliance requirements for AI/ML systems that stem from privacy regulations poses risks not only for affected individuals. Companies can face hefty fines and even the forced deletion of data, models, and algorithms.
Increasing Enforcement Calls to Action
In the United States, Section 5 of the FTC Act, the US Fair Credit Reporting Act, as well as the Equal Credit Opportunity Act hold AI developers and companies using algorithms accountable, as the Federal Trade Commission emphasized in its 2016 report and guidelines from 2020 and 2021.
The FTC is taking compliance with its guidelines seriously. In both its Cambridge Analytica order as well as in the matter of Everalbum, the FTC not only demanded that the illegally attained data in question be deleted or destroyed but also the algorithms or models that had been developed using it.
For companies within the scope of the General Data Protection Regulation (GDPR), several cases of privacy violations by AI/ML systems pursued by European data protection authorities in 2021 show that the stakes are high. In one of last year’s landmark cases, Italy’s Data Protection Authority fined food delivery companies Foodinho and Deliveroo around $3 million each for the lack of transparency, fairness, and accurate information regarding the algorithms used to manage its riders.
Increasingly, enforcement is resulting in a global ripple effect. Regarding the collection of images and biometric data without consent by Clearview AI, the Office of the Australian Information Commissioner collaborated with the UK Information Commissioner’s Office, resulting in Australia declaring a violation of its Privacy Act and the UK announcing a multimillion-dollar fine. Three Canadian privacy authorities and the French Data Protection Authority followed suit.
Privacy Drives Requirements for the Responsible Use of AI
General privacy principles are the backbone of privacy and data protection globally. AI/ML systems that process personal data must be built along the line of data minimization, data quality, purpose specification, use limitation, accountability, and individual participation. The privacy challenges in AI/ML are magnified by the evolving capacity of models in approximating complicated distribution functions, starting from the model complexity of traditional machine learning models to deep learning with dramatically more parameters.
The principle of AI explainability or transparency aims at opening the so-called “black box” of ML models. If organizations are using AI to support or make decisions about individuals, meaningful explanations about those processes, services, and decisions have to be provided to the individuals affected by them. Outcome-based post-hoc local models or metamodels like local interpretable model-agnostic explanations (LIME) can be used to approximate ML predictions. Explanations should be adapted to the understanding of the receiver and include references to design choices of the system, as well as the rationale for deploying it.
AI fairness and non-discrimination require us to ensure that algorithms and models do not discriminate based on race, gender, other protected classes, and vulnerable individuals. To this end, periodical testing pre-processing (prior to training the algorithm), in-processing (during model training), and post-processing (bias correction in predictions) are necessary. Fairness also implies that personal data is handled in ways that people would reasonably expect and not overpromise on what the algorithm can deliver.
Securing AI/ML algorithms require analyzing the impact of threats throughout the AI/ML system life cycle. Threat modeling helps to define the risks to privacy. One potential privacy threat is machine learning models leaking information about the individual data records on which they were trained. For example, in a membership inference attack, an adversary can infer whether a specific data point is part of the model’s training dataset by observing the model’s predictions. For AI/ML systems, conventional security controls need to be complemented by security controls tailored to ML functionalities, as the European Union Agency for Cybersecurity (ENISA) points out in a recent report.
Combined efforts for ensuring privacy requirements of AI
Due to the complexity of AI/ML systems, there are a huge number of privacy topics that need to be considered and addressed. Foundational elements could include:
- An appropriate AI governance process as a strategic priority, e.g., built on top of existing privacy programs, involving relevant teams from privacy to security, data scientists, engineers, product development, or the new role of AI ethicists.
- Having an understanding of what the AI/ML system will do and addressing threats and risks by ensuring high data quality, proper data annotation, testing the training data’s accuracy, (re)validation of algorithms, and benchmarked evaluation. Impacts of potential breaches of privacy can get mitigated by developing and applying strategies such as deploying privacy-preserving machine learning solutions (e.g., differential privacy, federated learning, secure multi-party computation, homomorphic encryption).
- Privacy impact assessments expanded with additional questions relating to the cross-disciplinary privacy requirements of AI/ML systems. Here, trade-offs, for example, between statistical accuracy and data minimization, and the methodology and rationale for decisions made, can be described.