April 2013
1 post
January 2013
3 posts
![]()
Search has become strangely intimate, a trusted friend pointing you in the right direction […] We once used search engines to look for information, now we use search to find us — what once seemed transactional now seems an extension of ourselves.
[…] in the search of the future that Singhal and his masters of disambiguation are constructing in Mountain View, Google will understand that these things are not simply matching sequences but that they are “things” with an internet life and place and history of their own.
[…] “I don’t think we’ll ever get to the semantic web as it was envisioned — detailed labelling and descriptions of web pages by humans — but we are getting closer to its goal: deep descriptions and understanding of the web, through artificial intelligence and natural language understanding.”
[…] “the future of search is verbs.” People, the argument goes, want search to do things, not just suggest things. With the Knowledge Graph, Google is building a world-historical collection of nouns. But will it help book a restaurant table? Or the cheapest flight? As synonymous as search is with Google, much of our search activity now occurs on apps.
[…] As Battelle notes, “the largest issue with search is that we learned about it when the web was young. When the universe was complete, the entire web was searchable,” he says. “Now our digital lives are utterly fractured — in apps, in walled gardens such as Facebook, across clunky interfaces. Reuniting our digital lives into one platform that is searchable is, to me, the largest problem we face today.”
How can Yahoo! use modeling techniques to make me read their news? Well, the model takes some predictor variables, information about me, in order to build a profile of the user currently browsing their site. This information has been gathered by the website about the readers, and include data such as my location, time in my place and even my gender and my age if I’m a registered Yahoo! user. Other information about my habits online is also useful for the algorithm, particularly the places I’ve visited when I visited Yahoo! in the past, and the stories, the news I’ve already seen today. With this information, the model builds a particular profile around me. Yahoo! can then present a ‘Today Page’ specially fitted for me, by matching my profile with the optimized news and headlines that people like me are clicking on. Don’t forget, the point is to make you click! In order to find the perfect “Today Page” for you, Yahoo! analyzes more than 13 million different combinations of headlines, news, images, stories, photographs and even positions on their website displayed every day. They will show only one from those millions, the one that optimizes the clicks in people like you.
![]()
#socialnetworks #analytics project at US department of defense. Quote:
Defense is seeking ways to predict the future by monitoring Twitter, blogs and news, and determining the “frequency of contacts between nodes or clusters.” As networks grow larger and more complex, researchers have found it harder to monitor group behavior. ONR also wants researchers to discover networks that could be hidden within networks, and how information and money flows through a community.
[…]
Officials also want tools that fuse and assimilate multiple, incomplete data sets on agriculture, weather, terrain, demographics and economic indicators to find patterns. ONR is especially interested in ways to comb text-based information to provide more nuanced views of how groups, such as terrorists, operate by extrapolating the “stated values and beliefs that motivated behaviors of interest,” “community structure and clusters of social networks” and the level of “emotional support expressed towards topics or persons.
May 2012
1 post
January 2012
3 posts
November 2011
9 posts
Great study on the opportunity and challenges of Big Data. While every industry can benefit from big data analytics some will more than others:

Quotes from the article:
Over time, we believe big data may well become a new type of corporate asset that will cut across business units and function much as a powerful brand does, representing a key basis for competition. If that’s right, companies need to start thinking in earnest about whether they are organized to exploit big data’s potential and to manage the threats it can pose. Success will demand not only new skills but also new perspectives on how the era of big data could evolve—the widening circle of management practices it may affect and the foundation it represents for new, potentially disruptive business models.
…
Big data ushers in the possibility of a fundamentally different type of decision making. Using controlled experiments, companies can test hypotheses and analyze results to guide investment decisions and operational changes. In effect, experimentation can help managers distinguish causation from mere correlation, thus reducing the variability of outcomes while improving financial and product performance.
MIT’s Erik Brynjolfsson shares insightful implication of big data in this 9 min interview. My notes:
- Many revolution in science had been preceeded by revolutions in measurements.
- Big data requires a cultural change of how companies make decisions: Use data to learn instead to confirm. Vulnerability of being open to data. Different kind of confidence.
- Big data allows to go beyond financial measures, e.g. customer satisfaction instead profitability.
- big data = nano data. Not just more, but more detailed data.
- BI is the natural evalotution/add-on to an ERP system. Improve operations instead just managing it.
Puzzled decision trees and others top visualization and descriptive statistics: Noone I know would do predicition without those two basic exploration steps.
McKinsey recently published on the “Second Economy”. Below two quotes on how it is defined and it what sense it acts intelligently. (This pragmatic definition of intelligence is extremly helpful, because it acknowlege that a service like Google Search is intelligent even though it will - hopefully - never posses consciousness.)
Interestingly business analytics is a much bigger player in this second economy as in the visible one. Think credit scoring, price optimiation or campaign optimization.
Quotes from the article:
If I were to look for adjectives to describe this second economy, I’d say it is vast, silent, connected, unseen, and autonomous (meaning that human beings may design it but are not directly involved in running it). It is remotely executing and global, always on, and endlessly configurable. It is concurrent—a great computer expression—which means that everything happens in parallel. It is self-configuring, meaning it constantly reconfigures itself on the fly, and increasingly it is also self-organizing, self-architecting, and self-healing.
[…]
There’s a parallel in this with how biologists think of intelligence. I’m not talking about human intelligence or anything that would qualify as conscious intelligence. Biologists tell us that an organism is intelligent if it senses something, changes its internal state, and reacts appropriately. If you put an E. coli bacterium into an uneven concentration of glucose, it does the appropriate thing by swimming toward where the glucose is more concentrated. Biologists would call this intelligent behavior. The bacterium senses something, “computes” something (although we may not know exactly how), and returns an appropriate response.
Neat introduction on factor analysis. Little math and lots of concepts and motivation of this classic tool.
September 2011
15 posts
Great opportunity for Social Network Analysis: According to Forrester only 4% of European online users are responsible for 80% of all so-called influence impressions.

Predictive analytics is one of the most valuable elements of analytics. Yet, not every vendor claiming to offer analytics is strong in predictive analytics.
One example is Business Objects (BO) which was aquired by SAP to complement it’s analytics portfolio. Despite its claim to be a leading analytics vendor BO a closer look reveals it has little to offer in advanced analytics. Its “Business Objects Predictive Workbench” brochure (see cover below or full document here) demonstrates that what you really get is IBM’s SPSS Modeller. This is consistent with BO’s 2007 reseller agreement which SAP just renewed.
There is nothing wrong with bundled third party software and SPSS isn’t a bad choice. But I don’t think communication is appropiate. It should be more transparent that advanced analytics with BO requires a totally different piece of software (which typically disturbs user experience, adds integration challenges and complicates maintanance) and that “analytics” is for BO what others just call, well, reporting.

Interesting (German) article about the future of BI. Key statements:
- BI is the only remaining differentiator in Enterprise IT. ERP and CRM are commodity.
- By 2012, 40% of BI spendings will be with system integrators due to their experience about industry specific requirements according to Gartner
- By 2012, 30% of analytical applications will run In-Memory according to Gartner
- By 2013, more than 30% of BI functionality will be used mobile according to Gartner. MobileBI will open new opportunity for niche players to cater new user groups and user groups
- Othe hot topics are Big Data and Analytics
Interesting pitch and I truly believe combining economy of scale and larger flexibility is key requirement for analytics. But I don’t think architecture is critical in the transition as exisiting solutaions are already quite capable. The harder challenge is empowering the organization to share and leverage data and knowledge in a transparent and efficient manner.
Gartner’s Technology Hype Cycle is famous for tracking enthusiasm, disillusionment and eventual realism that accompanies each new technology and innovation.
It’s 2011 version features both Big Data and Predicitve Analytics. Advanced Analytics is mentioned as a key technology driver:

Note that Predicitve Analytics is already in matured state while Big Data is just approaching the hype. What and when will be the delusion of Big Data?
Source: memeburn
Predicting consumer prices: Great business idea. Can’t imagine how they analytically manage to control for all the external factors though.
Stanford University is a mecca of statistical learning. They train outstanding graduats such as the Google founders and many other innovators. They also produce extremly valuable resources such as the field’s standard textbook (available online for free) and many free online training courses (check datawrangling for a slightly outdated but impressive list of science courses).
They now offer an online class on Machine Learning for students worldwide for free. It’s more than videos: if you actively participate you will even be certified. If you are new to the field and can invest the time: don’t miss this opportunity.

Big Data is one of the trends fueling the ongoing rise of analytics, aka data science. During the last months the O’Reilly Radar blog has been an authoritve voice on this trends. They now published their articles as an ebook.
![]()
How to build an recommendation engine that scales to million of users and items. Amazon.com researches reveal there apporach at “Amazon.com Recommendations”.
This is a worthy read. Linden, Smith and York list different approaches to the recommendation challenge:
- collaborative filtering is fine for small to medium databases but forces compromising on quality as you scale up
- cluster models are computational effective but fail to produce relevant insights
- item-to-item collaborative filtering as used by amazon.com
While none of these is described in detail the authors do a great job in putting these approaches into practical context. And it shows: good solutions don’t need to be complicated.
Data Miner’s most popular website kdnuggets currently runs a poll on “What term do you currently prefer for describing the activity of analyzing data and finding useful patterns”.
The current stats show there is still lot of confusion:

Exciting NYT article about the transition of search into a decision engine (“dinner for two on Friday and movie after”), framed as a competitive battle between Microsoft and Google. Agree: Any search company making measurable progress in this massive task will fully dominate the market. Quotes:
- “Search is still essentially a Web site finder.” Mr. Lu says. “It’s all nouns. But the future of search is verbs — computationally discerning user intent to give them the knowledge to complete tasks.”
- “There is so little context in current search, and what Microsoft is trying to do is present users with context and structure, more a map of the world of information instead of just ranking it”
In IT the playing field is leveling, opening space for small teams and fresh talents. Only a few years ago all of the big budgets would go to the big players. For applications, “apps” have changed this. Individuals can develop software within days and sell it to millions. The same is true for data mining where affordable software and hardware has significantly lowered the entry costs for adhoc data mining projects (not for productive enterprise use though).
A proof point is kaggle which features analytical competitions allowing data miners from around the world solving analytical challenges for cash. Currently four competitions are active:

While four is a small number, the magnitude and impactof the competitions is breathtaking. The Heritage Health Price will reward 3M$ to the team that predicts how many days a patient will spend in a hospital in the next year and thus adressing the problem that
71 million individuals in the United States are admitted to hospitals each year, according to the latest survey from the American Hospital Association. Studies have concluded that in 2006 well over $30 billion was spent on unnecessary hospital admissions.
Is this the next (r)evolution of both open data and open-source tools: open analytics?
Another analytics competition site ist http://www.crowdanalytix.com.
Data Mining is hip. There are uncounted books, articles, case studies etc. Not so Forecasting - even though it’s absolutely critical in many business applications and a highly active research area.
Often the combination of different fields matters: I currently work with retailers on price optimization and its just amazing to see what is possible combining forecasting with statistics/data mining and operations research to generate regular, promotion and markdown prices.
![]()
This excellent resource, which was worth its $134 in the past, is now being made available online for free here. Great opportunity for forecasting becoming known to a broader audience.
August 2011
1 post
Math-loving traders are using powerful computers to speed-read news reports, editorials, company Web sites, blog posts and even Twitter messages — and then letting the machines decide what it all means for the markets.
![]()
July 2011
16 posts
Data is a vital raw material of the information economy, much as coal and iron ore were in the Industrial Revolution. But the business world is just beginning to learn how to process it all.
The current data surge is coming from sophisticated computer tracking of shipments, sales, suppliers and customers, as well as e-mail, Web traffic and social network comments. The quantity of business data doubles every 1.2 years, by one estimate.
” —Mining of Raw Data May Bring New Productivity, a Study Says - NYTimes.comA gentle reminder that maximizing forecasting accuracy may not be economical:
Accuracy is largely determined by the nature of what is being forecast — its “forecastability.” Even costly and heroic efforts may not yield the level of accuracy management desires. Instead of squandering resources in pursuit of the perfect forecast, organizations should seek forecasts as accurate as can reasonably be expected (given the nature of what is being forecast) and to do this as efficiently as possible.
A 100% non-technical motivation of big data opportunities and challenges.
![]()
The facts behind the social media hype. Inspiring presentation from Hubspot.
Bold prediction about open source big data technology expanding it’s influence beyond Web startups to traditional Enterprise IT:
Hadoop is the ultimate trojan horse in enterprise IT. It strikes at the heart of business — the data — in a way that adds value immediately, while setting the stage for viral growth in the future, connecting the two ecosystems and the technological and cultural levels.
Whether you agree or not — great read for its many references for emerging big data technologies.
![]()
Interesting Stanford prototype for interactive online data preparation. Guesses transformation rules from your selections in the spreadsheet. Would this also work for subsequent steps of an analytics process such as training data mining models?

When using data mining to predict customer behavior it is common practice measuring performance by campaign “lift”: how much better will my target group proposed by the model respond compared to a random selection.
This is highly suboptimal according to a recent white paper from Eric Siegel, a central figure in the global data mining community:

“Standard response modeling is designed to maximize the wrong thing: response rate. This is not an appropriate measure of a direct marketing campaign’s success since it does not match the objectives of the business. Instead, incremental impact – the additional revenue resulting from the campaign that would not have come without it – is central to evaluating the campaign’s true ROI.”
His suggested cure, uplift modeling, has been around for some years, but it seems still to be rarly used. Recommended read for everyone using predictive analytics in marketing.
Distinguished Researcher Bishop is a highly respected contributor to the field of machine learning. While many of his texts are dense with math, his free chapter an graphical models is a clear and visual introduction to this emerging topic.
Great update from forecasting expert Paul Goodwin on Winter’s exponential smoothing method . Despite more sophisticated methods available I see many companies using exponential smoothing due to its simplicity and adaptability. One thing leaves me wondering after reading the article: Why did it take 50 years to develop the referenced simple extensions such as windsoring outliers?
April 2011
7 posts
![]()