It is clear that there are moral considerations to the use and collection of data. Vin Vashishta argues that data scientists have an ethical burden akin to a psychologist.
Amid the outrage and backlash about data privacy in the recent months, data scientists have largely stayed out of the conversation. We remain in the background of many applications and social media platforms.
Facebook’s grey areas of user privacy and Cambridge Analytica’s psychographic profiling are airing in public while hundreds of other examples of the same behaviours carry on.
What’s our obligation to the users behind the massive datasets we analyse for insights? What are reasonable levels of security or appropriate uses of personal data?
What insights can we sell, and what ones do we have no right to even pursue? Ask 20 data scientists these questions and you’ll get 20 different answers.
In most organisations making their first strides into data science, the only people who really understand what the data is being used for are the data scientists themselves.
While model outputs flow into several different systems, where the data comes from, how it’s transformed into the raw materials of customer or user profiling and what security measures protect it from theft are typically siloed within the data science team.
Data ethics is firmly in the hands of data scientists, and that won’t change any time soon. Ethics plays a large role in the training of psychologists and sociologists.
However, data scientists, who have access to powerful tools that can dissect how people think with an eye towards influencing their behaviours, don’t get a single hour of ethics training in most programs.
That’s a problem we need to fix if we are to avoid a constant stream of breaches and questionable uses of data or models.
What does ethics mean for data scientists?
Look at the terms and conditions of most applications that collect your data. You’ll see some disclose what kinds of data they collect. You’ll see a high-level description of who they share data with. What’s glaringly absent is what they – and those who they share your data with – will use your data for.
Would you sign up for a loyalty rewards programme at the grocery store if you knew we mined that data to determine your healthcare premiums or make credit decisions?
The people who create the terms of service are lawyers with little guidance from data scientists on the ethical consequences of data gathering and distribution. Most companies aren’t even aware that they need to consult their data science team about potential ethical issues.
That’s just one piece of the ethical tapestry around data science. Customers have more awareness around the ethical sourcing of their coffee than they do around the ethical use of their personal data. However, that’s changing, one breach and scandal at a time.
Ethics for data science answers a few key questions.
Is your data ethically sourced? Do you only collect data with the consent and full knowledge of the person providing the information?
Is the data aggregated and shared ethically? Is personal data secured? Are measures taken to verify that the data provided is complete and accurate? Do those providing personal data understand who that data is shared with and for what purpose?
Are models being developed and deployed ethically? What’s the oversight process for model evaluation with regards to accuracy and fairness? What’s the process for determining fair use of models in critical decision-making?
Why aren’t we teaching AI ethics?
It’s one thing to have an algorithm deciding what ads to show you, and quite another to have one decide if you get a medical procedure or if you’re a criminal.
There are decisions we don’t want algorithms making unilaterally. Right now, the people deciding how data is used and algorithms are deployed spend little to no time discussing the potential ethical implications.
That’s mostly due to training and education. Data scientists aren’t heartless, though we’re often portrayed that way. We want to build amazing things, but we know math and code better than the ethical implications of model development.
Academia, along with the growing number of certifications and boot camps, has a large part to play in solving this problem. The next batch of data scientists needs AI ethics to be a requirement for graduation. Sending them out without a solid foundation in ethics is irresponsible, and we’ve seen where it leads.
Policy discussions start on college campuses. That’s the other role academia plays in the AI ethics puzzle. As universities decide how to shape the curriculum, they also tend to shape policy decisions at the same time.
They won’t just be teaching students, but also lawmakers who are under increasing pressure to regulate the wild west of data science.
Why aren’t we teaching AI ethics? Same reason most emerging technologies don’t: we look at tech as morally agnostic and beyond subjectivity.
While the machine doesn’t play favourites, it isn’t exempt from moral implications. As soon as it learns from or impacts a person, morality enters the equation.
It’s only responsible that we consider the implications of AI on people before we rush headlong into building the new – that’s the challenge facing this and the next batch of data scientists.
Vin Vashishta is founder and chief data scientist at V-Squared Data Strategy Consulting.