"How did data get to be so ‘big’?"
Think of a single piece of data. A piece of data, as generally understood in the analog world, is typically a small thing. How did it get so “big”? To help visualize “big data,” take those individual pieces of data that your business collects about your customers, employees and vendors…and start multiplying. By a lot. The estimate of data generated worldwide is in the neighborhood of 13 zetabytes (ZB) by 2016 and 44ZB by 2020. That is a “big” number.
Although there is no one legal definition of “big data,” a May 2014 White House report put more definitional context around the term. The White House report defined “big data” as data that has one of three characteristics: data that is so large in volume, diverse in variety or moving with such velocity that “traditional modes of data capture and analysis are insufficient.” The fact is that “big data” has always existed, but with the rise in computational power and the increase in technology, we have unprecedented access to it and the ability to derive meaning from it. The White House report notes that with “declining cost of collection, storage and processing of data, combined with new sources of data like sensors, cameras, geospatial and other observational technologies, [it] means that we live in a world of near-ubiquitous data collection.” When you contemplate every aspect of daily life captured and stored daily, that is “big.”
The onslaught of all of this data, in and of itself, is not extraordinary. What is extraordinary is that with computing power now available, this data can be mined in ways previously impossible. With the application of algorithms to multiple data sets, patterns and anomalies emerge and, in the right hands, are incredibly valuable. The marketing industry can now glean shopping patterns and habits from their target customer and focus their dollars in a way that maximizes a return on investment. The government can now identify potential terrorist activities from communications records and prevent potential attacks. As the White House put it, “Unprecedented computational power and sophistication make possible unexpected discoveries, innovations, and advancements in our quality of life.”
So what then is all the fuss about? Isn’t “advancement in the quality of our life” a good thing? As with most things in life, there are trade offs.
One concern is the lack of consumer transparency with respect to big data. With big data, a company’s ability to provide transparency is compromised. Data sets are generated by employers and businesses that are innocuous by themselves, but combine those data sets with more data sets from other contexts and connections and cross-references reveal insights. The result is that data about each of us is bought and sold among data brokers every day.
The resulting data sets are constantly mined for insights into our behaviors, habits, likes and dislikes, much of which the average consumer would likely consider personal in nature. Some call this the “creep factor” of big data. Kelly Dilworth, writing for CreditCards.com, published “12 Creepy Details Data Collectors Know About You,” which details twelve common attributes that marketers often know, such as income, whether you are good with money, your love life, your race, ethnicity and religious affiliation, your hobbies and interests, your health concerns, your purchasing history and where you have vacationed. Creepy, indeed.
It is true with the current state of data privacy regulations, much of this data is deidentified. But what does deidentification mean? Just stripping my name and address? What about my IP address and mobile device ID? Plus many in this field will tell you that reidentification is possible at some level. If you strip data of all information and characteristics that may potentially be used for re-identification, it then ceases to be useful. In addition, there is the concept of a “mosaic effect.” The data may not personally identify me by name, but may identify an individual who could only be me given the traits and characteristics. As a consumer, I don’t care if someone knows my name, which is already public. I may care more if a stranger knows my marital status, my childrens' names, my political affiliations, my religious affiliation and my habits and patterns, which are far more personal.
The other issue with big data is being called the “asymmetry of power.” The big data have’s and have-not’s, if you will. Companies and governments that have access to big data clearly have advantages over those who do not. Who controls access to the data? Practically speaking, it takes resources to access big data, which results in the consolidation of data among the entities with the resources required to get it. How can a small business owner compete with the company that has access to this kind of marketing tool, and how do you handle the potential unfair market advantage that results?
The question of ownership is not far behind. From a copyright perspective, data itself is not copyrightable and in that sense cannot be “owned.” Nonetheless, given the rise in the number of data brokers, and the proliferation of data licenses abounding in the field, it is obvious that there is a proprietary element to big data.
The question gets particularly hairy when there are two parties involved in generating the data. Sensors on farm equipment used by farmers generates data — is the data owned by the farmer or the equipment manufacturer? What about data collected by cars? Is the data from my car mine or the dealer’s or the manufacturer’s? These questions remain unresolved and will continue to be hashed out in the industry and, inevitably, the legal system.
Finally, big data creates the ability for what the White House report calls “perfect personalization.” If data brokers and marketing companies have a complete profile on the preferences and predispositions of the market segment to which I belong, those companies can target very specific ads to me that I am most likely to want or need. The risk is that this kind of “profiling” is just that — it is profiling, which opens the door for discrimination in the types of products, services and indeed pricing that I receive. Plus, what happens if my employer discovers that I have a statistically high probability of being pregnant? Can that be used to deny a promotion? Or can my insurance company raise my rates based on that data? Big data necessarily draws on statistical generalizations and presumptions that can feed into discriminatory behavior based on any other characteristic or trait that an individual may possess.
Indeed this “perfect personalization” can create a separate reality for each of us. If my profile can be perfected to the point that I am only presented with information and materials that I want to see, my own assumptions may stand unchallenged. In a society driven by democratic ideals, is my ability to sway opinions and vote with my feet — or my dollars as is often the case — necessarily diminished by the information to which I am exposed, which someone else chooses for me?
The potential exists to create a virtual bubble for each consumer, presenting that consumer only with the information and opportunities that algorithms indicate have a statistically high probability of being valued. Although marketers will salivate at the opportunity to have such targeted opportunities, the broader social implications and potential abuses cast shadows over the benefits.
With such risk for abuse, it is not surprising there have been calls to regulate big data. Bills have been introduced into Congress with the lofty aim of doing so. The Federal Trade Commission held hearings Sept. 15, 2014, on big data, which included calls from FTC Commissioner Julie Brill to both Congress and industry to better regulate big data.
The result is that regulation appears to be on the horizon, but it is yet unclear what that would look like. Based on draft legislation, it would appear that the regulations are likely to regulate the data brokerage industry, requiring more of the transparency, notice and choice that is the linchpin of the present-day data privacy laws. However, given recent trends any such legislation is more likely to come out of state legislatures rather than the federal government.
It also remains to be seen whether industry self-regulation will develop. The concept of big data has grown beyond the precepts of online behavioral advertising, but there is precedent within the online advertising space for self-regulation. As a result, there is fertile ground for self-regulation as well.
Whether regulation comes from state or federal governments or industry self-regulation, undoubtedly conversations will continue to take place as to the benefits and risks of big data and its proper role within modern society.