“I’m communicating with my friends on Facebook, and indicate that I love a certain kind of chocolate. And, all of a sudden, I start receiving advertisements for chocolate. What if I don’t want to receive those ads?” asked U.S. Sen. Bill Nelson, during Facebook CEO Mark Zuckerberg’s congressional hearing last April. Nelson’s questions points to the lack of individual choice within data mining. But it only refers to the surface level of data exchange, not considering the much deeper and convoluted data mining occurring in every digital move you make and have ever made.
Between the Facebook scandals of the past year and the new app MoviePass, it seems that data mining is on everyone’s minds—from the senators who grilled Zuckerberg to the random citizen (me) who’s wondering what exactly data mining is and what it means for me.
Before I researched the complexities of data mining, I imagined it as a matrix-like system where my online information was like a collection of vapor-like green numbers floating in an infinitely large database. Turns out, I wasn’t too far off. To understand the scope of data mining, you have to start with those green numbers: your digital footprints. With every transaction or interaction you engage in, you leave a digital signature. This sounds deceivingly insignificant until you consider the number of interactions you have in which you input personal data of some sort. If you have any social media profile, go to school, or have a bank account, you are consistently giving those entities your data. You also leave a digital signature whenever you Google anything, buy something on Amazon, fly on an airline, use Google Maps … you get the idea. Maybe if you never touch a computer and only buy items with inherited blocks of gold that weren’t catalogued, you might be able to evade the system. Yet even then, it’s unlikely you’ve never been to the hospital (Hello, birth certificate!) or had a friend who put you into the digital system somewhere. Basically, there’s a close to zero percent chance that you don’t have many, many, many data signatures. And that’s not even considering your phone’s microphone. (Bad news: it’s absolutely listening to you.)
Sure, you consciously use your credit card and choose what you post on Instagram. You know the data you freely give the internet, so what is the harm in other people using it? But really, what you put in is not exactly what comes out. It’s as though I tell my friend Jane that I’m unhappy with the dating scene and suddenly Jane is setting me up with her cousin in art school out of town and I have to drive an hour out to go to some new-age art show. I didn’t ask Jane to set me up, I just told her I didn’t like the dating scene. Data mining is the Jane of the virtual world; it takes what you give it and extrapolates much more from that. Data mining isn’t what you put in the system, it’s what the system guesses and connects from what you gave it to find meaning that you didn’t intend. The only difference is that data scientists are much more accurate than Jane.
Data scientists work for companies to gather large quantities of individuals’ data signatures and discover various patterns. The objective is to organize the data so as to better understand consumers and boost the company’s product. Data science is both descriptive and predictive: it has to both understand what works best, as well as predict what will work better. There are many methods of data mining, but they all have this same overarching agenda. To discuss all of the various types of data mining would turn this article into a novel, and probably a pretty bad one since data is … well, boring, and I can barely spell the word “algorithms,” let alone understand their function.
There’s one branch of data mining that’s definitely worth thinking about right now: associated learning. This is what Jane was doing before: observing that people who like thing x are likely to like thing y, or that people who buy thing x are likely to buy thing y. When you buy a bathrobe and then a pair of fuzzy slippers pops up on your “Recommendations For You” list, it’s because Amazon knows that people who bought a robe have repeatedly bought the slippers. Netflix does the same when it recommends you movies “based on titles you’ve watched.”
Sure, it all still sounds pretty innocuous. How nice of Jeff Bezos to try to make your life easier! And instead of having to sort through movies on Netflix, you’re given options you’re more likely to enjoy. No more endless scrolling and additional thinking. But what about when the association is “People who supported candidate x are likely to vote for candidate y”? Where does the line between convenience and invasion become blurred?
The line between data mining as a tool and as an ethical breach is almost impossible to define. At this point, the common rhetoric to identify unethical data mining is “you know it when you see it,” which complicates suing a company for privacy violation or uncovering the extent to which a company tracks its consumers beyond simple retail. We’re still living in a legal system meant for the past world, not the current digital age we live in.
“Our greatest problem as a society is to come up with a way of regulating this environment, that is changing really rapidly, and preserve the best of the internet and digital world while dealing with these insane excesses, because it’s really being used in tremendously malevolent ways,” remarked Political Science Professor Juan Lindau, who teaches a block entitled “Secrecy, Surveillance, and Democracy.” The course explores the evolution of the state’s surveillance of its citizens, other populations abroad, and the ways in which that surveillance is accounted and unaccounted for within our democracy.
“The government wants to know everything possible about you, but it’s not trying to manipulate you politically. These engines are trying to manipulate you; they’re very sophisticated systems of propaganda with very serious political implications,” said Professor Lindau, touching on the dangerous capacity of large private entities like social media. The potential ability social media has for swaying opinions and invading individuals’ privacy all boils down to its methods of data mining. Facebook is the best site to recognize and understand the scope of this potential.
First of all, Facebook has more data on you than you think. You already know that it has all of the profile information you input, but you probably don’t know that there is a “Facebook Pixel” which is used by sites other than Facebook. The pixel is an invisible tracking software that non-Facebook sites have; allowing Facebook, and the site in use, to track a consumer's likes, dislikes, and activities without their consent. And this isn’t a few hundred sites that have the Facebook pixel—it’s millions. Facebook takes that information to generate targeted ads for you, a marketing strategy that makes it the billion-dollar company it is. That’s why you can see ads for other sites you visited or would likely visit on Facebook’s side feed.
In 2016, Winston Smith from Missouri noticed this and sued Facebook and seven cancer institutes, accusing Facebook of violating privacy by tracking his browsing on cancer sites and research of treatment options. Since Facebook makes millions from its medical advertisements, it’s not an implausible argument in the least. Yet the judge dismissed the case, due to the data policy agreement that Facebook users agree to when creating a profile. Legally, Facebook didn’t do anything wrong. But Smith and the other plaintiffs didn’t feel it was right that Facebook profited off their medical conditions. It was legally valid, but was it ethical?
The U.S. government draws the ethical line when the question of data mining and political influence cross wires. Zuckerberg was called in to testify in front of Congress in April in response to the Cambridge Analytica scandal, when it was discovered that the UK-based voter profiling firm had purchased the detailed personal information of up to 87 million Facebook users from a researcher who had told users he was collecting it for academic reasons. Cambridge Analytica used that information to create politically targeted ads, which experts believe may have allowed them to influence the 2016 election.
When working with Ted Cruz, Cambridge Analytica separated his potential voters into various psychological profiles and then targeted them accordingly. For example, they sent different messages to his “timid traditionalist” profiles than to his “temperamental” profiles; all based on the potential voters’ previous political leanings. Cambridge Analytica has the ability to accurately understand a user’s political tendencies, because every digital click is documented and used to create detailed voter profiles.
Mark Turnbull, managing director to Cambridge Analytica’s political division, said, "We just put information into the bloodstream of the internet ... and then watch it grow, give it a little push every now and again … like a remote control. It has to happen without anyone thinking, 'that's propaganda,' because the moment you think 'that's propaganda,' the next question is, ‘who's put that out?’”
Since, according to a 2016 study by the Pew Research Center, nearly half of Americans get their news from Facebook, the implications of the Cambridge Analytica scandal are huge. Zuckerberg’s congressional hearing illustrated just how much information a large company like Facebook has on its users and how easily that information can get into the wrong hands. Data mining isn’t only about the personal invasion, but how efficiently that information can be employed to target individuals with misinformation.
Platforms like Facebook were never intended to be the news organizations they’ve become. Professor Lindau touched on this, explaining, “People like Zuckerberg and the holders of these other platforms, like [Jack] Dorsey at Twitter, need to acknowledge that what they have are not merely platforms. They’re not these neutral vehicles that deliver content, but that they are, increasingly, news organizations.” According to Lindau, this means that Facebook and Twitter can’t pretend to be neutral and escape the journalistic demand for verifying information and the way its’ used. As Lindau put it, “you have a responsibility to curate your content. To establish its veracity, to fact check it; you’re privy to the information that you’re imparting.”
It’s not hard to go into your Facebook settings to see the categories Facebook assigns to you based on what you’ve clicked on, at least on the surface level. Since the Cambridge Analytica scandal, Facebook has colored-coded the categories to make its data settings easier to follow, although it’s still pretty overwhelming. I wasn’t surprised by what I saw when I looked into mine: it mostly consisted of liberal-leaning news organizations, Airbnb, Spotify, and other apps connected to my Facebook that I frequently use. This is what I expected to see, but that’s only part of the picture. Facebook is showing what I’ve already said yes to: the apps I gave consent to connect to Facebook. It’s not showing that it has my entire search history to sell to (or be stolen by) groups like Cambridge Analytica. And why would it? It’s out of Facebook’s hands and, furthermore, its business platform.
It’s important to note that Facebook isn’t the only large company profiting from extensive data mining; most companies do. In 2012, Target got into legal trouble for predicting and revealing women’s pregnancies. The company would analyze customers’ purchases and send them coupons for goods they’d likely buy at certain stages of their pregnancy. One man from Minneapolis went into his nearby Target to complain that his high school daughter was getting coupons for maternity clothes, which he felt was inappropriate. The manager called a few days later to apologize, but the father apologized instead; apparently his daughter hadn’t yet told him that she was pregnant.
There’s also the new app MoviePass, where users pay $9.95 per month for three free movie screenings. You’re saving money even if you view just one movie a month, so it’s no shock that it’s already up to 3 million subscribers and is expected to reach 5 million before the end of 2018. In fact, the deal is so good that it seems like it’d inevitably go bankrupt, but company executives maintain that it will turn a profit.
But the reason it’s so cheap is that MoviePass aims to profit not from individual revenue, but from consumers’ data, which it gleans from tracking their location. MoviePass tracks a member’s physical location to and from the movies so it can gather intel on what kinds of movies individuals see, at what times, and where they’re likely to go before and after. On a local scale, MoviePass can sell this data to nearby restaurants and businesses to boost sales. On a bigger scale, it can sell the data to big studios to change how they market movies to specific users. In order to use MoviePass, consumers agree to monetize themselves—but at least MoviePass acknowledges it.
When signing up for Facebook, tweeting, or entering a Google search, individuals are giving companies the right to monetize them, even if they don’t realize it. Individuals are getting a free service and, in return, giving their complete permission for companies to use and profit off of their data. Big companies like Facebook and Target track individuals’ behavior on the internet to extrapolate habits and patterns that create a very specific profile for each person who’s plugged in—a very accurate profile it uses to send you targeted ads of retail you’ll most likely buy, candidates you’ll most likely vote for, and news you’ll most likely believe.
Unless you were to completely detach from the internet, which is unrealistic for anyone who isn’t Ron Swanson, it’s hard to avoid being data mined. The other option would be to pay for things like social media, so one could insist on client protocols. Yet that option is also pretty improbable; Americans want to maintain their access to free commodities, and entities like Facebook want to maintain their extensive revenue. Big data isn’t going anywhere. The lack of knowledge and concern about the issue can change. But do people even care?
To find out, I asked Colorado College students whether they care if companies like Facebook targeted them with ads based off of the large quantities of data it’s gathered. Most people agreed that it’s a greater convenience and that the content of their own data is too inconsequential to matter.
“I mean, I’d rather get ads for things I’m interested in than not,” said Lo Wall, which seemed to be the resounding popular response to my question.
“It doesn’t bother me, because I’m not doing anything that interesting. So if it wants to collect data on me no one will use it,” said Bridget O’Neill. General opinion holds that people’s lives are too routine and trivial for companies to care about—and furthermore, if companies employ that data to target you, then maybe it’s just helpful. “I’m insignificant. Why would they care?” asked Emily Klockenbrink.
But not everyone is completely secure in that mindset; some ponder the dangerous trajectory of big data companies. Zac Schulman is worried that “companies will get to the point where they have a superior understanding than the consumers themselves; that they don’t realize they’re being manipulated.” Soon, the manipulation of who we vote for or what we buy might be so subtle that we don’t realize it’s happening.
Therein lies the frightening truth: data mining is becoming so accurate and subtle that it has the potential to know you better than you know yourself. Remember how Target could predict women’s pregnancies? That was in 2012, when Obama was running for re-election and the iPhone 5 had just dropped into the market. Today, more and more studies are coming out about data mining potentially (and already, in some cases) being used as diagnosis surveillance. For example, what if your phone could pick up your speech patterns to detect early onset Alzheimer's? Data mining is becoming so astute at analyzing behavior that it’s likely able to notice abnormalities before you can. In this theoretical future, my phone or computer could know I have a brain tumor before my doctor.
If one thing’s clear, it’s that data mining isn’t going anywhere. In some cases, maybe that’s fine—I like that Netflix recommends me movies—but do we care that the database is becoming more intelligent and accurate at knowing our behavior and desires than our own selves? Or that we may not be able to distinguish how we are being targeted, and whether we are being manipulated? We may not worry now, but we have to pay attention to the evolution of big data, or we’ll find ourselves the Sims of Zuckerberg without even knowing it.