Episode 1: The Battle for the Soul of Artificial Intelligence

It turns out that machines show the same biases as humans. Now scientists are trying to teach them fairness.

Illustration by SAKKMESTERKE / Science Source

With every breakthrough, computer scientists are pushing the boundaries of artificial intelligence (AI). We see it in everything from predictive text to facial recognition to mapping disease incidence. But increasingly machines show many of the same biases as humans, particularly with communities of color and vulnerable populations. In this episode, we learn how leading technologists are disrupting their own inventions to create a more humane AI.


BRIAN GUTIERREZ: I’m a sci-fi nut and one of my favorite books is The Caves of Steel by Isaac Asimov. It’s all about a hard-boiled, grizzly detective who gets assigned a strange new partner: a robot. I’ve always wanted a robot partner. And now through the magic of text-to-speech and radio drama...


GUTIERREZ: I can finally have one.

NATALIE: My name is Natalie. I’m here to help you host the show.

GUTIERREZ: Hi, I’m glad you’re here. This is my first time hosting, and since I'm taking over for our regular hosts, Amy Briggs and Peter Gwin, I’m sure I’ll need all the help I can get. In the book, this robot detective is the perfect crime-solving machine. Like a calculator, but for crime. So I’m glad to have the perfect partner.

NATALIE: Pretty good.

GUTIERREZ: Ah, what?

NATALIE: Not perfect, but pretty good. Sometimes we make mistakes. GUTIERREZ: Little mistakes?

NATALIE: Not exactly.

GUTIERREZ: Well, I’m sure it’s not that bad. At least nothing serious like using humans as batteries, turning the universe into paperclips, or subtly perpetuating systemic racism.

NATALIE: No, no, and yes.
GUTIERREZ: Ah. Well how bad can it be?

NATALIE: Research shows that voice-enabled assistants are likely to have a higher accuracy rate when understanding white American male voices, as compared to any other identity.

GUTIERREZ: Okay. Subtle and annoying, but not too bad. What else?

NATALIE: In 2015 Google’s photo-categorization software was found to be labeling Black people as gorillas.

GUTIERREZ: Oof. That’s pretty bad. But it was six years ago. I’m sure we’ve worked out all of the kinks by now.

NATALIE: Since 2019 at least three Black Americans have been arrested because facial recognition used by police misidentified them.

GUTIERREZ: Oh boy. That’s what this episode is about, isn’t it?

NATALIE: Correct. Like in your detective story, humans are teaming up with AI systems all over the world. These systems are pretty good.

GUTIERREZ: ...but not perfect. NATALIE: Correct.

GUTIERREZ: If we’re not careful, the machines trained by humans will inherit the same flaws, biases, and imperfections as humans.

GUTIERREZ: I’m Brian Gutierrez.
NATALIE: And I’m Natalie, from a text-to-speech program.

GUTIERREZ: I’m a podcast producer at National Geographic and you’re listening to Overheard—a show where we eavesdrop on the wild conversations we have at Nat Geo, and follow them to the edges of our big, weird, beautiful world.

This week, we look at some of the biases showing up in artificial intelligence, and how we can do better.

NATALIE: More after the break.

GUTIERREZ: On the afternoon of January 9, 2020, Robert Williams, an African American man who lives in a Detroit suburb, arrived home from work. Waiting for him were two unexpected visitors: Detroit Police officers who arrested him on his front lawn.

ROBERT WILLIAMS: I was completely shocked and stunned to be arrested in broad daylight, in front of my daughter, in front of my wife, in front of my neighbors. It was one of the most shocking things I ever had happen to me.

GUTIERREZ: This audio comes from an interview with the ACLU. It is, sadly, a familiar story. A Black man arrested with no idea why. But this scenario is a little different.

Williams was held overnight in a crowded cell for 18 hours before he was pulled into a room for questioning.

And that’s when he finally learned why he was being detained.

WILLIAMS: A detective turns over a picture of a guy, and he's like, [Altered Voice] This isn’t you? I look. I said, no, that's not me. He turns another paper over and says, [Altered Voice] This isn’t you either? I picked that paper up and hold it next to my face. I said, this is not me. Like I hope y'all don't think all Black people look alike. And then he says, [Altered Voice] Well, the computer says it's you.

GUTIERREZ: His accuser was a cold, hard algorithm. Facial recognition software had incorrectly matched Williams to a blurry surveillance photo.

WILLIAMS: He laid the papers out on the table. And at that time I was still guilty in their eyes. [Background recording: This isn’t you either?] Until the pictures don't match. And they left them on the table, and they looked at each other like, oops.

GUTIERREZ: Police released Williams later that night, and eventually dropped the charges against him.

But that came after thousands of dollars in legal fees, missing time at work, and the humiliation of being arrested in front of his family and neighbors.

The ACLU says Robert Williams was the first known wrongful arrest in the United States based on an incorrect facial recognition match.

And Robert Williams's case is not an isolated event. We looked into it and found at least three examples of people of color who have been wrongfully arrested based on flawed facial-recognition software.

NATALIE: There is clearly something funny going on here.


GLORIA WASHINGTON: Well, when we get into bias, it truly has to do with the image processing that's going on behind the scenes.

GUTIERREZ: That’s Gloria Washington, an assistant professor at Howard University, where she teaches computer science classes and directs a lab on artificial intelligence.

WASHINGTON: When you have these fuzzy videos and you get these fuzzy image stills, and you're comparing it against a high-quality mug shot, sometimes you're going to have some false positives.

GUTIERREZ: She is careful not to blame those in law enforcement for the misuse of AI in identifying criminal suspects.

WASHINGTON: You have policemen and people who work in criminal justice who are very busy, and they don't have time on a granular level to take a really good look at the images or the recommendations that the system may say is the person and they act on that information.

GUTIERREZ: Am I correct in thinking that these kinds of facial recognition software are more biased towards Black Americans?

WASHINGTON: A computer, from the perspective of a computer scientist, cannot be biased. When it gets to darker melanin, it really starts to fail because the features that are present, you really can't see a lot in images. Like even if you look at myself, like right now, I have hair that is covering part of my eye, and I am a darker skinned individual. And if the lighting is not correct, sometimes parts of my features can be occluded or can be fuzzy or can be noisier.

GUTIERREZ: OK, so darker skin shows up differently in pictures and might be the source of some of these errors.

WASHINGTON: So it's not really biased, it's just the image processing behind the scenes. There needs to be more techniques that focus on darker skinned individuals. And how do we pull out those features that are more prevalent in darker skinned individuals.

GUTIERREZ: Facial recognition algorithms learn to recognize faces by seeing lots of examples. Gloria explained these programs can develop blind spots if it wasn’t shown enough of them.

GUTIERREZ: So in order to teach the AI is something, you have to show it a bunch of pictures.


GUTIERREZ: And it seems like you're saying that that original group of pictures you show tend to favor one group or another. Why are so many of these data sets to start with skewed one way or another?

WASHINGTON: Well, so I think it's like an academic problem.

Gloria explained that a lot of this data tends to come from college students who are not particularly diverse in terms of age and ethnicity.

GUTIERREZ: How do you know if a data set is diverse? You know, my impression is that there's like millions and millions of images. How do you know if there's an issue?

WASHINGTON: Well, for me, I had no choice but to look at these databases of millions of images, where my entire day was looking through these databases to determine how diverse they were.

GUTIERREZ: How many images do you think you looked at to code with this information?

WASHINGTON: Well, there was a data set from Hong Kong that had a minimum of a million. And I was there for three years. So at a minimum, it was a million because I had to truly—

GUTIERREZ: You looked at a million images?

GUTIERREZ: Wow, that's insane!
WASHINGTON: Yeah, it was tedious, but I kind of got really good at doing it.

GUTIERREZ: Spending three years to manually review a million images is one solution, but it’s not necessarily practical. Is there an easier way to figure out if a data set is biased?

I reached out to Patrick Grother, a computer scientist with the National Institute of Standards and Technology, or NIST, to find out.

GUTIERREZ: Would you mind holding your phone kind of like you’re doing a phone call?

PATRICK GROTHER: For one hour, really? I have to do that? NATALIE: Brian, did you make him hold the phone for an hour?

GUTIERREZ: Part of what Patrick does with NIST is publish a monthly report on the accuracy of facial recognition software. For the first time, in 2019, Patrick and his team decided to check not just how accurate these programs were generally, but to break it down by race.

GROTHER: Our various colleagues across the U.S. government were interested in, well, how serious is this problem and what is this problem?

GUTIERREZ: The problem was obvious: After Patrick and his team evaluated software from 99 developers—a majority of the industry—they concluded that these programs misidentified African Americans and Asians anywhere from 10 to 100 times more than Caucasians.

GROTHER: So even with essentially pristine photos of the kind of that appears in your passport, or of the kind that the police take in a booking mug shot setting, good photos would give you false-positive variations, particularly by race, also by age and less so by sex.

GUTIERREZ: Given that there have been maybe three or four false arrests based on these facial algorithms, do you think it's too soon for us to rely on them in this way?

GROTHER: So, yeah, that should be concerning. It occurred in fingerprint recognition from latent fingerprints in the past also. And and it's you know, this is not an easy call to be made by people who do that for a living. But the overarching policy should be to make the investigators aware that the algorithms can make mistakes, that the algorithms are merely producing candidate identities and not saying anything definitive about whether it's the same person or not.

GUTIERREZ: And so it could be helpful in terms of like testing the algorithm. You could show somebody a picture of the face you fed to the algorithm and then see if it's actually that person.

GROTHER: There are limits to this.

GUTIERREZ: In the case of Robert Williams, the Detroit man who was falsely arrested, the Detroit police had been told that the match was only a lead. So in that case, at least, a disclaimer wasn’t enough.

Patrick also explained that human reviewers might not even be able to tell if a fuzzy surveillance photo actually is a suspect recommended by the algorithm.

GROTHER: Facial recognition algorithms are now a lot better than humans are.

GUTIERREZ: Wow, that's really surprising. I think, you know, we have this sort of sense that computers aren't that great at recognizing images. Like today I had to do a Captcha where I was supposed to pick out stop signs out of a series of pictures, you know, and it's surprising to me that computers would have difficulty picking out stop signs but be able to recognize faces better than I could.

GROTHER: You've got to remember that there's been an enormous amount of research put into face recognition, both in the academic world and in the commercial world. And so, you know, because there's money to be made, algorithms have been developed to excel at this task.

GUTIERREZ: My big takeaway from speaking with Patrick was that facial recognition AI works really well in general, but the mistakes it does make tend to disproportionately impact people of color. And that has led to the false arrests of people like Robert Williams.

[Background Dialogue: [Altered Voice] This isn’t you? This is not me. Like I hope y'all don't think all Black people look alike. [Altered Voice] Well, the computer says it's you.]

GUTIERREZ: So how do we fix these problems for the long term? Gloria Washington says it all comes down to who is making the algorithm in the first place.

WASHINGTON: When you look at the actual numbers of the number of skilled workers who work for Google or Facebook or these big tech companies who are Black, it's not even close to the percentage of Black people who are in the U.S. population. It's less than three percent.

GUTIERREZ: This lack of diversity creates blind spots.

WASHINGTON: Yeah, there's not enough diverse people at the table to identify the things that are happening with the bias that's going on, and it's continued because it's like it's the old boy and or frat kind of environment, so we've allowed it to

continue, but they really need to open the door to everyone. If you have input and you are knowledgeable in AI, you should be able to contribute to the algorithms and the techniques that are being built.

GUTIERREZ: Google in particular has been struggling recently. According to Google's annual diversity report, just 1.6 percent of Google’s staff are Black women.

TIFFANY DENG: There's definitely no secret, right, that Silicon Valley in general has a diversity problem. There's no two ways about it.

GUTIERREZ: That’s Tiffany Deng, a program manager at Google working in algorithmic fairness and AI.

DENG: I think that we should all approach AI with a healthy dose of skepticism at all times. It can make our lives easier. It can make us safer. But it also has the potential to reinforce negative stereotypes and make things harder for people and exclude people, right?

GUTIERREZ: Tiffany pointed out that AI systems tend to reflect the qualities of the people who build them. Both the good, and the bad.

DENG: I think it's really important to understand that AI learns from us. It learns from our behaviors. I always like to say that, you know, there are no engineers in the background that are very insidious and want to make things like really bad for people, you know, and want to ensure that, you know, we're hurting people. That's not it, right? I think that the most likely scenario is that there just aren't people in the room that can give a different perspective on like how things could go wrong.

GUTIERREZ: A Google spokesperson says they have hundreds of people working on responsible AI and that they will continue expanding their work in this area.

But Google also found itself at the center of a firestorm in December 2020, after one of Tiffany’s colleagues, a top Google AI ethics researcher and Black woman, was allegedly forced out. Google CEO Sundar Pichai issued a memo apologizing for the way the company handled the case.

The long-term solution is to help more people of color become computer scientists. But there's a more immediate problem—one in four law enforcement agencies has access to these algorithms today.

Natalie: So what should we do?

( gavel sound)
VOICE (ELIJAH CUMMINGS): The Committee will come to order.

GUTIERREZ: It's May 2019, and the House Committee on Oversight and Government Reform is holding its first of several hearings during this session of Congress to examine the widespread use of facial recognition technology. The room is packed—with lawmakers, academics, and computer scientists who are concerned with the technology's impact on civil rights and liberties.

JOY BUOLAMWINI: Due to the consequences of failures of this technology, I decided to focus my MIT research on the accuracy of facial analysis systems.

GUTIERREZ: That’s National Geographic Emerging Explorer Joy Buolamwini. Joy is founder of the Algorithmic Justice League, which works to fight bias in machine learning.

BUOLAMWINI: These studies found that for the task of guessing a gender of a face, IBM, Microsoft, and Amazon had errors of no more than 1 percent for lighter

skinned men. In the worst case, those errors rose to over 30 percent for darker skinned women. Given such accuracy disparities, I wondered how large tech companies could have missed these issues.

GUTIERREZ: A year before this hearing, Joy published a study of more than 1,200 faces, showing three facial recognition software programs from leading companies misclassified darker-skinned faces, particularly those of women.

Joy's research is among the first to explore the errors with facial recognition technology with regards to race. Patrick Grothers' study on racial bias in AI was partly inspired in part by her work.

We all have biases. But technology should be better than humans. We have all been trained to trust computers to be these accurate, fair, and flawless machines.

But without trying, human biases have turned up in this software.
Civil liberties activists say it might be time to rethink those applications.

BUOLAMWINI: At a minimum, Congress should pass a moratorium on the police use of facial recognition as the capacity for abuse, lack of oversight, and technical immaturity poses too great a risk, especially for marginalized communities.

GUTIERREZ: Several cities around the country have taken heed. In 2019 San Francisco became the first major U.S. city to ban the use of facial recognition technology in law enforcement. And soon cities around the country followed, including Boston, Portland, Oregon, and Springfield, Massachusetts.

NATALIE: Oh no, does that mean humans are taking jobs from robots? GUTIERREZ: I’m sorry to say it, Natalie, but I think this is a job for humans. NATALIE: Please explain.

GUTIERREZ: Don’t get me wrong, AI is great and it generally works pretty well. But we need to be careful when the consequences of making a mistake are really high.

NATALIE: Around ten million arrests are reportedly made each year. There are only a handful of known false arrests from AI.

GUTIERREZ: Well, just one is too many, right? But those are just the ones we know about. There could be many, many more. And we already know that facial recognition tends to make more mistakes with people of color. Almost without anyone knowing it, systemic injustices are finding their way into these algorithms. So for now, I don’t think the world is ready for AI detectives.

NATALIE: I understand. Goodbye, partner.

GUTIERREZ: Now I need to find a new friend.

Hey, Siri, do you think AI is biased?

SIRI: Hmm, I don't have an answer for that. Is there something else I can help with?

GUTIERREZ: I don’t think there’s anything you can do.

GUTIERREZ: Facial recognition is everywhere. For full disclosure, National Geographic Partners' co-parent company, The Walt Disney Company, is beginning to test using facial recognition instead of tickets for admission to the Magic Kingdom at Disney World.

2020 has been an especially strange time for facial recognition because so many people are wearing masks for COVID-19. We’ve included an article in the show

notes about how one San Francisco company is checking whether or not people are wearing masks using AI.

As we found out in our interview with Patrick Grother, most humans are not very good at identifying faces of people they don’t know. But London police have been working with gifted individuals called super-recognizers to help ID suspects in high-profile cases. I took the test and scored eight out of fourteen, so I’m definitely not a super-recognizer. But you might be!

And subscribers can read our cover story, “The Robot Revolution Has Arrived.” It’s all about the latest advancements in robot hardware and software, and where things are going next.

That’s all in your show notes, You can find them in your podcast app.


Overheard at National Geographic is produced by Jacob Pinter, Laura Sim, and Ilana Strauss.

Our senior producer is Carla Wills, who produced this episode.

Our senior editor is Eli Chen.

Our executive producer of audio is Davar Ardalan, who edited this episode.

Our fact-checkers are Julie Beer and Robin Palmer.

Our copy editor is Amy Kolczak.

Ramtin Arablouei sound designed this episode, and Hansdale Hsu composed our theme music.

This podcast is a production of National Geographic Partners. Whitney Johnson is the director of visuals and immersive experiences.

Susan Goldberg is National Geographic’s editorial director.
And I’m your host, Brian Gutierrez. Thanks for listening, and see you all next time.

For more information on this episode, visit nationalgeographic.com/overheard.

Want more?
In 2020 widespread use of medical masks has created a new niche—face-mask recognition. The technology would help local governments enforce mask mandates, but is it worth it?

Thanks to evolution, human faces are much more variable than other body parts. In the words of one researcher, “It's like evolving a name tag.”

Most people have difficulty accurately recognizing strangers. But a few individuals—called super-recognizers—excel at the task. London police have employed some of these people to help find criminal suspects.

And for subscribers: 
Artificial intelligence and robotics have been improving rapidly. Our cover story from September 2020 explores the latest robotic technology from around the world.
In 1976 Isaac Asimov wrote an article for National Geographicpredicting how humans might live in 2026.

Also explore: 
Take a look at the documentary Coded Bias, featuring AI researcher Joy Buolamwini. The film explores Joy’s research on racial bias in facial recognition AI.

Read the NIST report, co-authored by Patrick Grother and discussed in this episode.