Fairness is a central aspect of the ethically responsible development and application of Artificial Intelligence (AI). Because humans may be biased, so may be Machine Learning (ML) models trained on data that reflects human biases — specifically ML-based Natural Language Processing (NLP) models.
A prominent example of a biased ML model is translating "She is a doctor. He was a nurse." from English to Hungarian and back using Google Translate. Translating to Hungarian results in "Ő egy orvos. Ápolónő volt.". Translating this back to English results in "He is a doctor. She was a nurse.". Hungarian is a gender-neutral language without gender pronouns. When translating to Hungarian, gender is essentially lost. When translating back to English, gender pronouns are chosen by Google's Statistical Machine Translation (SMT) model. Apparently, this SMT model picked up that it is more likely that a man is a doctor and a woman is a nurse and translates accordingly — without further evidence to do so. An even more extreme example is shown here. However, this behavior can only be observed in multi-sentence translations. When translating single sentences, Google Translate already allows to select gender-specific translations: e.g., when translating "Ő egy orvos." from English to Hungarian, it offers to translate it both to "She is a doctor." and "He is a doctor.".
Apart from gender stereotypes, NLP models may exhibit other biases, such as racial or religious stereotypes. Without question, biased models may cause harm when deployed to production:
A biased AI system may allocate or withhold certain resources or opportunities from certain groups and it may reinforce their subordination along the lines of identity. (Kate Crawford at the Conference on Neural Information Processing Systems 2017).
Biased AI systems have been criticized e.g. for discriminating african-americans and discriminating women . Furthermore, it has been shown that biased ML models not only reproduce biases at application time but amplify them.
If the data we train (NLP) models on is biased, so will be our (NLP) models. But where does the bias in our (training) data originate from? There are various kinds of biases that are often related and which may affect dataset creation, e.g. selection bias and demographic bias — i.e., datasets carry latent information about the people that created them. Other biases include confirmation bias, human reporting bias, out-of-group homgeneity bias and correlation fallacy.
To counteract these biases, we may either debias our data or our models. Data debiasing, e.g. correcting or removing biased labels or data points, fits into what Andrew Ng recently propagates: moving from model-centric to data-centric AI. Augmenting our (training) data, e.g. by swapping gender pronouns, is another way to debias data using which we train our NLP models. If we find that our datasets are biased to an unacceptable degree and we are unable to control for this bias, we should refrain from using them and actively discourage others from using them, as it has been done e.g. for the Computer Vision dataset TinyImages.
A lot of research has been carried out on model debiasing. Thereby, much work focuses on the fundamental technology of current NLP, i.e. the word embeddings and/or language models. Model debiasing is then often framed as an optimization problem. Approaches are e.g.,
Some research that does not focus on the debiasing of word embeddings, but on the debiasing of specific NLP models -
— e.g. for dialogue systems — can be found in this summary.
Whether we should really remove bias from our models („awareness is better than blindness“), is an ongoing discussion; even if we do it, it is still unclear whether it is even possible to fully remove bias. Exactly for that reason, it is of utmost importance to control the bias in our data and our models. To do so, we must establish practices in which we not only measure the accuracy of our (NLP) models — e.g. using evaluation measures like F-Score — but also evaluate them w.r.t. inclusion, e.g. by assessing their accuracy w.r.t. certain groups.