AI’s Ethical Dilemma – An Unexpectedly Urgent Problem
Summary: In just the last 10 months based only on facial characteristics deep learning has been used to predict who is a criminal and who is gay. These are rigorous, peer reviewed studies published in academic journals. How should this knowledge be used and how will the public react?
I am a solid technology optimist but in just the last 10 months data scientists have used deep learning AI to predict not only sexual orientation but also whether someone is a criminal??
These two studies conducted by well-meaning data scientists are exactly the uses of AI that will cause the public to wonder how much leeway data science should be given for applications that potentially impact everyone in society. In brief, those two studies are:
- Predicting Criminality: A peer reviewed study released out of China last November that reported 89.51% accuracy in identifying criminals from non-criminals based on facial recognition alone. (Automated Inference on Criminality Using Face Images, Xiaolin Wu, McMaster Univ. and Xi Zhang, Shanghai Jiao Tong Univ. Nov. 21, 2016).
- Predicting Sexual Orientation: A peer reviewed study from Stanford just this month reporting 91% accuracy in distinguishing between gay and heterosexual men (83% for women) also based solely on facial recognition. (Deep Neural Networks Can Detect Sexual Orientation from Faces, Yilun Wang, Michal Kosinski, Stanford University, Sept. 12, 2017).
It should require no explanation why these applications of deep learning are socially controversial.
Are There Other Areas of AI About Which We Should Be Worried?
If you follow the popular press and read comments about the public’s concerns about AI they tend to be clustered around two thoughts.
- AI enabled systems will develop ‘opinions’ that cause them to be bigoted or biased in some way not to our advantage.
- AI enabled systems will exceed our own grasp of knowledge of the world and take actions, once again not to our advantage.
For those of us here on the wizard’s side of the curtain we know these are not legitimate concerns. AIs cannot ‘know’ more than the information that we give them. Their grasp of systems, or more broadly of ‘reality’ cannot exceed their training data.
Take for example the debacle of Microsoft’s early chatbot Tay. In 2016 the version implemented in Japan was wildly successful. In the US, some practical jokers (to give them the benefit of the doubt) started communicating with Tay pro-Hitler and wildly permissive sexual comments from which she ‘learned’ that this was the correct way to interpret the world. Microsoft had to take Tay down within 16 hours of her introduction. Tay was a victim of her training data, not a bot formulating an independent opinion.
Similarly reinforcement learning combined with image recognition has resulted in AIs that win at Go and can regularly beat humans at a full range of Atari games (not to mention this is the technology behind your self-driving car). It is true that reinforcement learning can result in algorithms that can perform better than humans in certain systems, but those AIs cannot reach out independently to learn beyond the realms we provide for training. In fact, change their sensors or their actuators and they can’t adapt. They have no imagination.
What Is the Real Source of Risk?
These commonly misunderstood memes are not a source of risk or ethical conflict. Where ethical conflict actually arises is from the true and productive capabilities of deep learning in image and speech recognition.
In a sense this is like atomic energy in the 50’s. There is the promise of so much good that can result from speech and image recognition AI systems, but now we see that they can also be weaponized. A society that utilizes deep learning systems to identify and penalize individuals for their supposed criminality or their sexual orientation could almost instantly change the public’s opinion about the value of our most promising areas of innovation.
It is easy to project forward to other potential societal abuses such as making hiring decisions or allocating health care resources based on yet unachieved deep learning analysis of our DNA.
About These Specific Studies
The Criminality Study
This is a peer reviewed study from a major institution conducted by well qualified researchers. The data science and techniques utilized appear sound. Read our original analysis of this experiment here.
It was based on facial image recognition from 1,856 ID photos that satisfy the following criteria: Chinese, male, between ages of 18 and 55, no facial hair, no facial scars or other markings known to be convicted criminals of both violent and non-violent crime, compared to ID photos of 1,126 non-criminals with similar socio-economic profiles.
The data science and techniques utilized appear sound. Wu and Zhang built four classifiers using supervised logistic regression, KNN, SVM, and CNNs with all four techniques returning strong results, the best by the CNN and SVM versions.
Since it is widely understood that CNNs can sometimes be deceived and may focus on factors not intended, the researchers also used the technique of introducing gaussian noise into the images with only about a 3% fall off in accuracy.
In summary, the study was conducted with rigor. Wu and Zhang make no comment on the implications of the study beyond the data science.
The Sexual Orientation Study
Jokes about the general population’s ability to determine sexual orientation of others by sight abound but in this carefully controlled study Yilun Wang and Michal Kosinski created a control group that showed that human judges could determine preference by sight only 61% of the time for men and 54% for women. Not that much better than a coin toss.
However, their deep neural net classifier when shown five images for each individual could correctly classify men 91% of the time, and women 83% of the time.
The paper shows a rigorous approach using 130,741 images of 36,630 men and 170,360 images of 38,593 women between the ages of 18 and 40 obtained from dating web sites and who self-reported their orientation as gay or heterosexual. The study was limited to US located Caucasians.
The deep neural net utilized was a program called VGG-Face that had previously been trained on 2.6 million faces to recognize unique individuals based on 4,096 unsupervised attributes. The classifier was a simple logistic regression with dimensionality reduction via singular value decomposition (SVD).
Unlike the Chinese team, Wang and Kosinski specifically conclude that their study “exposes a threat to the privacy and safety of gay men and women”.
Privacy versus Convenience
One of the great emerging conversations in this area is about privacy versus convenience. There is a small but vocal minority of citizens who object to giving their data away on-line or to being tracked in the real world by camera, phone, or other mechanism. The fact though is that this huge amount of data provides a level of convenience for all of us never before imaginable.
Not only do our applications understand and show us only what is statistically likely to please us, but this also dramatically increases the effectiveness of advertising, reducing the cost of goods sold. The great majority of us would miss this if it were gone.
But on two counts we need to have a deeper public conversation about this.
Correlation versus Causation: Particularly in the criminality study there is no attempt to discriminate between correlation and causation. We do not live in the Matrix. We are not going to arrest people based on this correlation. But it is fair to ask how our policing agencies will respond if they have this capability.
The Importance of Error Rates: Statistical methods will always have error rates that we on the practitioner side can quantify pretty readily. If the error results in us seeing an ad in which we’re not interested then no harm. If a false positive causes us to be classified as a potential criminal or discriminated against for sexual orientation that is quite another matter.
How Pervasive Tracking Has Become: Most of us know and accept that our browsing is tracked and that our phone also provides our physical location stored by data services. Not many of us are equally aware of the vast amount of facial recognition video footage that is created daily by private and government sources.
Private companies and police agencies are also using license plate scanners, including those located on Ubers, taxis, and other non-official vehicles that average a dozen scans per day for each of the 250 million cars in the US. Data that is then sold to law enforcement as well as private data marketers providing a uniquely accurate picture of our physical travels during each day.
Where Do We Go From Here?
Like the scientists on the Manhattan Project, no one is suggesting that these rigorous studies should not have been conducted. Like all scientific studies they will need to be confirmed by other researchers.
What we are learning is that there are features hidden in our faces which our deep learning techniques are better at detecting than humans are. In fact it is the accuracy and insight of deep learning that we value.
For the time being we need to have a conversation about how much we are tracked, particularly without our knowledge or agreement. In the near future we may also need to have a conversation with our government about what applications of that data are acceptable and which are not or risk a public backlash that could derail the use of our best new techniques.
About the author: Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist since 2001. He can be reached at: