Big Data Analysis with Revolution R Enterprise

Revolution R Enterprise for Big Data Analysis and Predictive Analytics

Revolution Analytics has taken the popular R language to unprecedented new levels of capacity and performance for statistical analysis of very large data sets. Using the built-in RevoScaleR package, R users can process, visualize and model terabyte-class data sets in a fraction of the time of legacy products – without requiring expensive or specialized hardware.

Harvard Business Review on Big Data

RevoScaleR: Big-Data Statistical Analysis with Revolution R Enterprise

Import “Big Data”: import your largest data sets from ASCII, SAS, SPSS, relational databases or data warehouses into R, without being constrained by memory limitations.

Powerful “Data Step”: Use the power of the R language to select records, transform variables, and sort and merge data. Thanks to scalable, out-of-memory parallel processing, there’s no need to leave the Revolution R environment to quickly prepare Big Data for analysis in R.

High-performance file-based analytics: Revolution Analytics’  scalable, high-performance XDF  data file format optimizes the process of streaming data from disk to memory, dramatically reducing the time needed for statistical analysis of large data sets.

Big-Data statistical algorithms make use of all available computing resources for high-performance analysis, without data size limitations. Revolution R Enterprise includes distributed, multi-threaded implementations of the following algorithms, with more planned for future updates:

Descriptive Statistics and Cross Tabs on very large data sets

  • Basic summary statistics of data.
  • Quantile approximations
  • Cross Tabulations (standard tables and long form)
  • Pairwise cross tabulations
  • Marginal summaries of cross-tabulations.
  • Chi-squared Test and Fisher's Exact Test
  • Kendall's Tau Rank Correlation Coefficient
  • Risk Ratio and Odds Ratio on a two-by-two objects.

Statistical Modeling on very large data sets

  • Multiple Regression
  • Stepwise Regression
  • Covariance and correlation matrices
  • Sum of squares and cross-product matrices
  • Generalized Linear Models
    • All Exponential Family Distributions including
      • Binomial
      • Gamma
      • Gaussian
      • Inverse Gaussian
      • Poisson
    • Standard Link Functions including
      • Cauchit
      • Identity
      • Log
      • Logit
      • Probit
    • Tweedie Distributions
    • User defined distributions and link functions
  • Receiver Operating Characteristic (ROC) computations
  • Predictions for fitted models.
  • K--Means Clustering.
  • Classification and Regression Tree
Revolution's rxLinMod Out-of-memory, multi-threaded algorithms in Revolution R Enterprise are faster and more scalable than corresponding functions in open-source R.

Learn more in these video demonstrations:

Distributed Computing for clusters, grids and the Cloud. Deploy the power of a Windows-based Microsoft HPC Server cluster, or a Linux-based grid managed with Platform LSF. Revolution R Enterprise Server makes it easy to cut down the computation time for Big Data analytics simply by scaling with compute nodes. And with Microsoft HPC Server, you can seamlessly transition computations from local resources to the Azure Cloud.

In-Database Analytics: When data locality is critical, bring Revolution R to your data for massively scalable analytics. Use RevoConnectR for Hadoop to distribute R computations across Hadoop nodes with the power of Map-Reduce.  Or use Revolution R Enterprise for IBM Netezza for in-database analytics using the power of the IBM Netezza data warehouse appliance.

Learn More about Big Data Analysis with Revolution R Enterprise:

General overview:

Technical details:

Case Studies

Buy Now Get More Information Free Academic Subscription