Data is a flawed method to discern tragedy

This is a graphic design of the word “opinion” in a speech bubble. The background is purple and there are various shapes surrounding the speech bubble.

Data is incorporated into every facet of life, from elections and sports to stock management and medical research. It can be used to compare and rate objects, from the acceptance rates of colleges to the followers and net worth of individuals. 

Clive Humbly, British mathematician and Chief Data Scientist of Starcount famously said, “Data is the new oil,” suggesting data is the new essential resource. Furthermore, data science is an incredibly attractive major for employers, as companies believe future success relies on harnessing and manipulating mass amounts of data.

The appeal of data is its connotation of truthfulness, caused by its association to math. There’s a lack of ambiguity to arithmetic, for example — two plus two must equal four and four must be less than five. Considering data turns any amount of information into specific numbers, it applies uniform math principles to measure everything. Oftentimes, data is considered an indisputable argument, a nugget of concentrated certainty. 

However, there is a hidden danger in inputting everything into an Excel spreadsheet. There’s something wrong with taking a tragedy and quantifying it. As data is the art of comparison and valuation, let’s consider two statistics that have been compared frequently — coronavirus deaths in the United States and 9/11 casualties, which are 402,000 and 2,977 people lost, respectively. The number of people lost to COVID-19 is roughly 135 times that lost in 9/11, or equivalent to a 9/11-sized attack every 2.7 days of the past year. 

Perhaps 9/11 feels more tragic because it had an external cause and a discernible organization to blame, while many coronavirus deaths are often the product of mass infrastructural incompetence resulting in a less easily identifiable culprit of responsibility. Although deaths from 9/11 seem to have greater visibility, tragedy from the coronavirus is easily invisible to those who haven’t experienced it directly. With this in mind, it is faulty to compare and quantify the loss of life between both causes. 

Another fault of data is its heavily manipulable nature. According to the Bureau of Justice Statistics, Black offenders committed 52% of the homicides between 1980 and 2008 while only being 13% of the population. Thirteen out of 52 is a favorite statistic of white supremacists, used to justify police brutality and racial supremacy and portray Black people as criminals by nature. The number is referenced so frequently that it is now considered a numeric hate symbol by the Anti-Defamation League. 

Technically, the statistic is correct, although, it fails to address the circumstances that caused it: racially targeted policing, institutional discrimination, and poverty, and instead reduces an enormous issue to a single biased number. In fact, a study of Cleveland neighborhoods from 1990 to 2000 found crime to be racially invariant, and instead cited poverty as most substantially correlating with crime rates. Ultimately, decreasing poverty in neighborhoods appears to similarly reduce violent crime in white and Black neighborhoods alike.

Clearly, there are many complications with data, particularly its creation. In her book, Raw Data is an Oxymoron, Lisa Gitelman, a media studies professor at New York University, writes that data is not a natural resource and is instead a cultural one. She remarks that even if data is authentic it can be “cooked,” in its collection and scope. Anyone has the capacity to create data and doctor their findings to their intentions, and they have no obligation to include the context essential to their study. Still, all data is treated identically, like gospel.

In both the 9/11 to coronavirus comparison and Black crime statistics, context is omitted because data can’t capture the same emotion as witnessing a Boeing 767 crash into a building filled with people or the relationship between a Black man and the police in the United States. 

Perhaps there’s no way for people to comprehend tragedy and inequality at such a massive scale, but is data the solution? It certainly has countless uses, but innumerable shades of context are lost when transforming the world into a data set. Some things are unable to be described by a number.