Was the Wired magazine right back in 2008: Is big data the end of models and theory-driven science as we know it? I don’t agree with the technophilian fundamentalism offered in the article but big data almost certainly posseses a disuptive power that will blend into current science practices in ways no one can predict today. Article quotes:
“Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database. Now Google and like-minded companies are sifting through the most measured age in history, treating this massive corpus as a laboratory of the human condition.”
“At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. It forces us to view data mathematically first and establish a context for it later. For instance, Google conquered the advertising world with nothing more than applied mathematics. It didn’t pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right. Google’s founding philosophy is that we don’t know why this page is better than that one: If the statistics of incoming links say it is, that’s good enough. No semantic or causal analysis is required.”
“Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show.”
Interesting perspective of how the data deluge could transform science. While the complete book is available for download check out the linked review first.
I find insipiring:
- The four-stage historical science model — from (1) experimental to (2) theory to (3) computation to (4) data-driven. I don’t think the shift is as radical as prosposed but it’s a nice concept to reflect on the new science opportunities emerging with technology
- The vision of more open research embracing data and findings from various fields.
The review kicks off with a comment on data deluge that points right at the heart of its challenge:
Gathering data is so easy and quick that it exceeds our capacity to validate, analyze, visualize, store, and curate the information.
scientists have misled themselves into thinking hat if you collect enormous amounts of data you are bound to get the right answer. You are not bound to get the right answer unless you are enormously smart.