You are here

The Rise of Big Data Spurs a Revolution in Big Analytics


Norman H. Nie

Growth in Data Volumes

"Big Data" refers to datasets more than a terabyte in size gathered by the public and private sectors. Conventional database software cannot effectively capture, store, manage, and analyze these large datasets, despite hardware-related innovations.

Organizations collect a broader variety of data in a shorter time, creating an immense growth in data volume. The McKinsey Global Institute, in 2011, found organizations capture trillions of bytes of information annually. More than 235 terabytes of data was collected in April 2011 alone.

We now also have the computing power and the ability to store, retrieve, and analyze such data affordably. Innovations, including the availability of commodity multi-core processors and distributed computing frameworks, enable software such as R to effectively use this new hardware. Other technological innovations now enable huge datasets to be processed when the challenges had been previously deemed insurmountable.

Advances in Analytics and Visualization Through the Evolution of Second-Generation, Big Analytics Platforms

Traditionally, only clustered observation samples were used for statistical analysis as a result of the existing technologies' limitations. Sampling compromised the analysis' accuracy and depth.

Recent advances in second-generation Big Analytics platforms such as Revolution R Enterprise have improved analytical performance. This software is optimized for use with massive datasets, and offers data visualization techniques and improved analytical capabilities.

The potential impact of Big Analytics is not trivial. McKinsey estimates analytics could reduce national spending on healthcare by 8%; could increase U.S. retailers' margins by 60%; and improve European governments' efficiency by US$149 billion.

Open-Source R Plus Revolution Analytics: A Duo of Disruptive Forces in Big Analytics

Revolution R Enterprise is based on the R open-source programming language and designed expressly for statistical analysis. It has emerged as the de facto standard for computational statistics and predictive analytics. An abundance of R-based tools can be downloaded freely.

Despite R's many advantages, one challenge is its in-memory computation model, which limits the data that can be analyzed to the amount of available RAM.

Revolution Analytics' Revolution R Enterprise provides a highly efficient file structure and data-chunking capabilities for processing big datasets on a single server or in a distributed environment. It also includes multi-threaded external-memory algorithms and works with large-scale data warehousing architectures.

Business and Strategic Advantages of Analyzing Big Data

The rise of Big Data provides an opportunity to revolutionize data analytics. Google's MapReduce, a programming model designed for processing large datasets, served as a first step towards Big Data processing. Other new platforms for Big Analytics have emerged.

There are five specific advantages of moving towards Big Analytics.

Move Beyond Linear Approximation

Small datasets typically limit a user's ability to make accurate assessments. Prior technologies limited users to using linear approximation models, which make causal inferences between two separate variables. This is commonly used for forecasting and financial analysis. However, using linear models for non-linear relationships limits the user's analytical accuracy.

Today, more sophisticated models that are accurate and precise are available. Manipulating a continuous variable into discrete, categorical variables, for example, allows users to observe how specific values influence the outcome, allowing users to better determine the exact relationship between data points. Tools like Revolution R allow greater user accuracy and precision to make better predictions and assessments.

Data Mining and Scoring

Techniques such as data mining and statistical modeling allow for vastly improved predictability and scoring. These techniques can handle hundreds of thousands of coefficients based on millions of observations that can be applied to analyses such as credit worthiness and trading volumes. As more companies adopt cloud-computing strategies, predicting peak cloud loads will become another critical use.

Big Data and Rare Events

Big Data Analysis vastly improves the user's ability to locate and analyze those rare events that might escape detection when working with smaller datasets. One such example is casualty insurance claims analysis for predicting rare events, such as costly claims related to natural disasters. Newer, predictive analytics can radically reduce the cost and time of data handling and helps businesses more accurately predict rare events.

Extracting and Analyzing "Low Incidence Populations"

Big Data Analysis also helps users extract and analyze cases with low incidence populations, which can be difficult to locate within a large dataset. Examples of low incidence populations include rare manufacturing failures or treatment outcomes for patients with rare diseases.

Locating low incidence populations is even more difficult when using data samples. Having only a few observations makes predictive assessments and forecasts difficult. New hardware and software systems utilizing parallel computing are capable of generating insights beyond traditional analytics. 23andMe, a personal genomics and biotechnology company, uses data analysis for accelerating scientific discovery. Big Data Analysis allows users greater predictive and analytical power to insure accuracy.

Big Data and the Conundrum of Statistical Significance

Users can move towards meaningful and accurate analyses by using Big Data Analysis. Statistical significance was once necessary because it was difficult to analyze entire datasets. This method uses smaller, representative population samples that are tested against chance and are generalized. The ability to analyze the entire population of a dataset eliminates the need for statistical significance. Second-generation analytical tools like Revolution R Enterprise enable individuals to conduct more accurate, reliable, and useful analyses for real-world decision-making.

Conclusion: The Future is Big Analytics

Big Analytics offers the opportunity for greater innovation. Access to larger datasets with powerful tools means greater accuracy, transparency, and predictive power. Data scientists and statisticians can now experiment with large datasets, discovering new opportunities for their organizations.

Increased access to Big Data; access to affordable, high-performance hardware; and the advent of analytical tools provides greater incentives for stakeholders to invest in Big Analytics. Data analysts need no longer rely on traditional analytic methods. New technologies provide organizations the means to analyze large data sets quickly and cost-effectively.

About Norman H. Nie

Norman H. Nie was president and chief executive officer Revolution Analytics. He co-invented the Statistical Package for Social Sciences (SPSS) while a Stanford University graduate student and co-founded a company around it in 1967. He served as the president and chief executive officer of SPSS through 1992 and as its chairman of the board of directors from 1992 to 2008. Nie is professor emeritus at the University of Chicago and Stanford University. Among his many other professional achievements, he was named a fellow of The American Academy for the Arts and Sciences. Nie was educated at the University of the Americas in Mexico City; Washington University in Saint Louis; and at Stanford University.

About Revolution Analytics

Revolution Analytics delivers advanced analytics software used by leading organizations for data analysis, development, and mission-critical production needs. The company fosters the R community's growth by providing free academic licenses for Revolution R Enterprise.