Data Protection vs Data Collection
Sometimes I feel like I’m the only CIO type guy left on Earth who has a problem with what I refer to as the over collection of data, and the resulting “data analytics” boom of the past few years. I hear about new initiatives all the time involving “data lakes” and the use of lower cost storage to capture EVERYTHING – not just customer data, but every bit, byte, packet, and moving or static piece of information that an organization generates on a continuous basis whether there is a need for this data or not. Collect it anyway!! What's the downside, now that cost has been eliminated as an issue?
I find this trend disturbing. Especially since my professional focus in recent years has been increasingly on cybersecurity, privacy, and protection of data.
For example, universities of late have decided that if they just have enough data, or they use all the data they already have collected more effectively, they can model and predict student success. In my opinion, nothing could be further from the truth. Institutions tend to attract students of a certain demographic, who already have predictable rates of success. There’s nothing wrong with trying to help students raise their performance, but doing it through mass data analytics is wrong, impersonal, and does not improve things for the long run. In fact, I would surmise it makes things worse! A huge part of success in college comes from learning to navigate the process, how to target good courses and professors based on interaction with other students, and “finding your own way” vs. a “guided pathway”.
I'm going to go out on a limb here, because I know this is totally contrary to current beliefs in higher ed. But in a sense, the whole notion of "measuring student success", as increasingly required by accrediting bodies and institutions of higher learning, is really an impossible task, and is not solved by collecting and analyzing more data. We all know of students who did horrible in college, goofed around for a few years, and then bloomed into something amazing and became millionaires or famous artists or just plain decent parents and citizens. The data would have suggested otherwise, at the point of analysis upon graduation (or withdrawal). Similarly, we all know of students who ace every course, who come from great backgrounds socially and economically, and who wind up as failures in life at some point after entering the real world post graduation. How do you measure those success / failure stories in a way that collecting more data can help?
Edward Snowden, who some of you may love and some may loathe, hit the problem on the head a few months ago.
“The problem isn’t data protection; the problem is data collection”
Data is collected constantly these days, with or without permission. It doesn’t really matter if permission is given. Often that’s a requirement of using services that an organization provides, whether its government, education, or business. These large collections of data are prime targets for criminal hackers and foreign governments to exploit, or to be used by nefarious companies like Cambridge Analytica to attempt to influence national elections and the course of history.
Is this progress?
When I got into the IT field many years ago, it was called “data processing”. Yes, we processed data. We collected data to process, and then processed it and delivered reports to a limited set of eyes in the organization. We did NOT attempt to draw inferences about customer behavior, or influence behavior using this data. In fact, it would have been thought highly inappropriate to do so. At the time there was a strong privacy movement, to protect individuals from their personal information being collected and used against their will. Maybe my perspective is flawed a bit since I worked for a public university and state agencies? In other words, corporate entities would have done this if they knew how, but didn't? In any case, there was no such thing as a "data analytics" office anywhere that I am aware of, beyond the normal predictive modelling stuff for budgets, forecasts, and performance. And certainly not the analysis of customers / students en masse.
It’s hard to turn the clock back, but in my opinion those were better times, with better people in charge who had better integrity and felt a sacred duty to protect privacy. Compare that to today, when every CEO (or Provost) wants to know everything they can about their customers in a bald-faced attempt to get them to buy more, do more, perform better, change behavior, etc., etc. They think it's essential to competition and success in their market sector. However, It's wrong, it's dangerous, and it exposes individuals to enormous financial, physical, and societal risk.
We need to get back to the idea of minimum data collection, with intense protection of the collected data, and an absolute right to privacy. The EU is way ahead of the US in this regard with the GDPR initiative and the “right to be forgotten”, especially for web search references. So, maybe the pendulum is swinging back again? It will be interesting to watch how it plays out over the next 3 – 5 years as US legislation and practices accommodate the new EU standards. But, as Snowden said, the EU standards are about protecting data and privacy, and they still fall short of preventing the collection of it in the first place, which is where the problem lies. In the case of universities, it will take a few brave actors to put a stop to the madness, and get themselves back to the core mission of a university - teaching & learning, and helping students succeed on their own merits.