Great study on the opportunity and challenges of Big Data. While every industry can benefit from big data analytics some will more than others:
Quotes from the article:
Over time, we believe big data may well become a new type of corporate asset that will cut across business units and function much as a powerful brand does, representing a key basis for competition. If that’s right, companies need to start thinking in earnest about whether they are organized to exploit big data’s potential and to manage the threats it can pose. Success will demand not only new skills but also new perspectives on how the era of big data could evolve—the widening circle of management practices it may affect and the foundation it represents for new, potentially disruptive business models.
Big data ushers in the possibility of a fundamentally different type of decision making. Using controlled experiments, companies can test hypotheses and analyze results to guide investment decisions and operational changes. In effect, experimentation can help managers distinguish causation from mere correlation, thus reducing the variability of outcomes while improving financial and product performance.
MIT’s Erik Brynjolfsson shares insightful implication of big data in this 9 min interview. My notes:
- Many revolution in science had been preceeded by revolutions in measurements.
- Big data requires a cultural change of how companies make decisions: Use data to learn instead to confirm. Vulnerability of being open to data. Different kind of confidence.
- Big data allows to go beyond financial measures, e.g. customer satisfaction instead profitability.
- big data = nano data. Not just more, but more detailed data.
- BI is the natural evalotution/add-on to an ERP system. Improve operations instead just managing it.
McKinsey recently published on the “Second Economy”. Below two quotes on how it is defined and it what sense it acts intelligently. (This pragmatic definition of intelligence is extremly helpful, because it acknowlege that a service like Google Search is intelligent even though it will - hopefully - never posses consciousness.)
Interestingly business analytics is a much bigger player in this second economy as in the visible one. Think credit scoring, price optimiation or campaign optimization.
Quotes from the article:
If I were to look for adjectives to describe this second economy, I’d say it is vast, silent, connected, unseen, and autonomous (meaning that human beings may design it but are not directly involved in running it). It is remotely executing and global, always on, and endlessly configurable. It is concurrent—a great computer expression—which means that everything happens in parallel. It is self-configuring, meaning it constantly reconfigures itself on the fly, and increasingly it is also self-organizing, self-architecting, and self-healing.
There’s a parallel in this with how biologists think of intelligence. I’m not talking about human intelligence or anything that would qualify as conscious intelligence. Biologists tell us that an organism is intelligent if it senses something, changes its internal state, and reacts appropriately. If you put an E. coli bacterium into an uneven concentration of glucose, it does the appropriate thing by swimming toward where the glucose is more concentrated. Biologists would call this intelligent behavior. The bacterium senses something, “computes” something (although we may not know exactly how), and returns an appropriate response.
Great opportunity for Social Network Analysis: According to Forrester only 4% of European online users are responsible for 80% of all so-called influence impressions.
Predictive analytics is one of the most valuable elements of analytics. Yet, not every vendor claiming to offer analytics is strong in predictive analytics.
One example is Business Objects (BO) which was aquired by SAP to complement it’s analytics portfolio. Despite its claim to be a leading analytics vendor BO a closer look reveals it has little to offer in advanced analytics. Its “Business Objects Predictive Workbench” brochure (see cover below or full document here) demonstrates that what you really get is IBM’s SPSS Modeller. This is consistent with BO’s 2007 reseller agreement which SAP just renewed.
There is nothing wrong with bundled third party software and SPSS isn’t a bad choice. But I don’t think communication is appropiate. It should be more transparent that advanced analytics with BO requires a totally different piece of software (which typically disturbs user experience, adds integration challenges and complicates maintanance) and that “analytics” is for BO what others just call, well, reporting.
Gartner’s Technology Hype Cycle is famous for tracking enthusiasm, disillusionment and eventual realism that accompanies each new technology and innovation.
It’s 2011 version features both Big Data and Predicitve Analytics. Advanced Analytics is mentioned as a key technology driver:
Note that Predicitve Analytics is already in matured state while Big Data is just approaching the hype. What and when will be the delusion of Big Data?
Stanford University is a mecca of statistical learning. They train outstanding graduats such as the Google founders and many other innovators. They also produce extremly valuable resources such as the field’s standard textbook (available online for free) and many free online training courses (check datawrangling for a slightly outdated but impressive list of science courses).
They now offer an online class on Machine Learning for students worldwide for free. It’s more than videos: if you actively participate you will even be certified. If you are new to the field and can invest the time: don’t miss this opportunity.
The teaser video is worth seeing. It features a lot of visual impressions from actual AI application.
How to build an recommendation engine that scales to million of users and items. Amazon.com researches reveal there apporach at “Amazon.com Recommendations”.
This is a worthy read. Linden, Smith and York list different approaches to the recommendation challenge:
- collaborative filtering is fine for small to medium databases but forces compromising on quality as you scale up
- cluster models are computational effective but fail to produce relevant insights
- item-to-item collaborative filtering as used by amazon.com
While none of these is described in detail the authors do a great job in putting these approaches into practical context. And it shows: good solutions don’t need to be complicated.
Data Miner’s most popular website kdnuggets currently runs a poll on “What term do you currently prefer for describing the activity of analyzing data and finding useful patterns”.
The current stats show there is still lot of confusion:
In IT the playing field is leveling, opening space for small teams and fresh talents. Only a few years ago all of the big budgets would go to the big players. For applications, “apps” have changed this. Individuals can develop software within days and sell it to millions. The same is true for data mining where affordable software and hardware has significantly lowered the entry costs for adhoc data mining projects (not for productive enterprise use though).
A proof point is kaggle which features analytical competitions allowing data miners from around the world solving analytical challenges for cash. Currently four competitions are active:
While four is a small number, the magnitude and impactof the competitions is breathtaking. The Heritage Health Price will reward 3M$ to the team that predicts how many days a patient will spend in a hospital in the next year and thus adressing the problem that
71 million individuals in the United States are admitted to hospitals each year, according to the latest survey from the American Hospital Association. Studies have concluded that in 2006 well over $30 billion was spent on unnecessary hospital admissions.
Is this the next (r)evolution of both open data and open-source tools: open analytics?
Another analytics competition site ist http://www.crowdanalytix.com.
Data Mining is hip. There are uncounted books, articles, case studies etc. Not so Forecasting - even though it’s absolutely critical in many business applications and a highly active research area.
Often the combination of different fields matters: I currently work with retailers on price optimization and its just amazing to see what is possible combining forecasting with statistics/data mining and operations research to generate regular, promotion and markdown prices.
Good resources for newcomers to the forecasting art are rare. An exception is the book Forecasting: Methods and Applications from Makridakis, Wheelwright and Hyndman (1998). It’s carefully organized, has many practical examples, and just the right amount of equations. No wonder it has almost 2000 citations according to Google scholar.
This excellent resource, which was worth its $134 in the past, is now being made available online for free here. Great opportunity for forecasting becoming known to a broader audience.
Data is a vital raw material of the information economy, much as coal and iron ore were in the Industrial Revolution. But the business world is just beginning to learn how to process it all.
The current data surge is coming from sophisticated computer tracking of shipments, sales, suppliers and customers, as well as e-mail, Web traffic and social network comments. The quantity of business data doubles every 1.2 years, by one estimate.” —Mining of Raw Data May Bring New Productivity, a Study Says - NYTimes.com
When using data mining to predict customer behavior it is common practice measuring performance by campaign “lift”: how much better will my target group proposed by the model respond compared to a random selection.
This is highly suboptimal according to a recent white paper from Eric Siegel, a central figure in the global data mining community:
“Standard response modeling is designed to maximize the wrong thing: response rate. This is not an appropriate measure of a direct marketing campaign’s success since it does not match the objectives of the business. Instead, incremental impact – the additional revenue resulting from the campaign that would not have come without it – is central to evaluating the campaign’s true ROI.”
His suggested cure, uplift modeling, has been around for some years, but it seems still to be rarly used. Recommended read for everyone using predictive analytics in marketing.