R Lanugage Feature: Analytics

This document describes the capabilities of the open-source R language included with Revolution R Enterprise. Unless otherwise noted below, these capabilities are current for R 2.14.2 and Revolution R Enterprise 6.0. Additional capabilities not included in the standard open-source R distribution are indicated as follows:

* Requires Revolution R Enterprise
** Requires Revolution R Enterprise for IBM Netezza
*** Requires additional open-source community packages from CRAN  

Basic Mathematics

  • Complex arithmetic
  • Computation of orthogonal polynomials
  • Cross products
  • Cumulative products and sums
  • Exponential functions
  • Hyperbolic functions
  • Kronecker products on arrays
  • Logarithms (any base)
  • Logical operators
  • Matrix operations including
    • median polish of a matrix
    • QR decomposition
    • singular value decomposition
    • spectral decomposition
  • Symbolic and algorithmic derivatives
  • Trigonometric functions

Basic Statistics

  • mean
  • standard deviation
  • variance
  • median
  • quantile
  • correlation
  • cross tabulations

Probability Distributions

Density, quantiles, probability and simulation for:

  • Beta
  • Binomial
  • Birthday
  • Chi-squared
  • Empirical cumulative distribution
  • Exponential
  • F Distribution
  • Gamma
  • Geometric
  • Logistic
  • Lognormal
  • Negative Binomial
  • Normal
  • Poisson
  • Student’s t
  • Tukey’s studentized range distribution
  • Uniform
  • Weibull
  • Wilcoxon signed rank distribution
  • Wilcoxon rank sum distribution

Big Data Analytics *

  • Memory efficient storage
  • External memory algorithms
  • Input/Output
    • rxImport: Creates an ‘.xdf’ file or data frame from a data source (e.g. text, SAS, SPSS data files).
    • rxTextToXdf: Creates an ‘.xdf’ file from a delimited text file.
    • rxDataFrameToXdf: Creates an ‘.xdf’ file from a data frame.
    • rxXdfToText: Creates a text file from an ‘.xdf’ file.
    • rxDataStep, rxXdfToDataFrame, rxReadXdf: Reads an ‘.xdf’ file into a data frame.
    • rxGetInfo: Retrieves header information from an ‘.xdf’ file or summary information from a data frame.
    • rxSetInfo: Sets a file description in an ‘.xdf’ file or a description attribute in a data frame.
    • rxGetVarInfo: Retrieves variable information from an ‘.xdf’ file or data frame.
    • rxSetVarInfo: Modifies variable information in an ‘.xdf’ file or data frame.
    • RxXdfData: Creates an ‘.xdf’ data source object.
    • RxTextData: Creates a text data source object.
    • RxSasData: Creates a SAS data source object.
    • RxSpssData: Creates an SPSS data source object.
    • RxOdbcData: Creates an ODBC data source object.
    • rxOpen: Open a data source for reading.
    • rxReadNext: Read data from a data source.
    • rxClose: Close a data source.
  • Data Manipulations/Data Step
    • rxDataStep: Transform and subset data in ‘.xdf’ files or data frames.
    • rxSort: Multi-key sorting of the variables an ‘.xdf’ file or data frame.
    • rxMerge: Merges two ‘.xdf’ files or data frames using a variety of merge types.
    • rxSplitXdf: Split an input ‘.xdf’ file or data frame into multiple ‘.xdf’ files or a list of data frames.
  • Descriptive Statistics and Cross Tabs
    • rxSummary: Basic summary statistics of data.
    • rxQuantile: Computes approximate quantiles for .xdf files and data frames without sorting
    • rxCorCov: Calculate the covariance, correlation, or sum of squares / cross-product matrix for a set of variables
    • rxCrossTabs: Formula-based cross-tabulation of data.
    • rxCube: Alternative formula-based cross-tabulation designed for efficient representation.
    • rxMarginals: Marginal summaries of cross-tabulations.
    • as.xtabs: Converts cross tabulation results to an xtabs object.
    • rxChiSquaredTest: Performs Chi-squared Test on xtabs object.
    • rxFisherTest: Performs Fisher's Exact Test on xtabs object.
    • rxKendallCor: Computes Kendall's Tau Rank Correlation Coefficient using xtabs object.
    • rxPairwiseCrossTab: Apply a function to pairwise combinations of rows and columns of an xtabs object.
    • rxRiskRatio: Calculate the relative risk on a two-by-two xtabs object.
    • rxOddsRatio: Calculate the odds ratio on a two-by-two xtabs object.
  • Statistical Modeling
    • rxLinMod: Fits a linear model to data.
    • rxCov: Calculate the covariance matrix for a set of variables.
    • rxCor: Calculate the correlation matrix for a set of variables
    • rxSSCP: Calculate the sum of squares / cross-product matrix for a set of variables
    • rxLogit: Fits a logistic regression model to data.
    • rxRoc: Receiver Operating Characteristic (ROC) computations using actual and predicted values from binary classifier system
    • rxGlm: Fits a generalized linear model to data
    • rxPredict: Calculates predictions for fitted models.
    • rxKmeans: Performs k-means clustering.
    • rxDTree: Fits a classification or regression tree to data.
  • Utility Functions
    • rxOptions: Gets or sets RevoScaleR-specific options.
    • rxGetOption: Retrieves a specific RevoScaleR option.
  • Basic Graphing Functions
    • rxHistogram: Creates a histogram from data.
    • rxLinePlot: Creates a line plot from data.
    • rxLorenz: Computes a Lorenz curve which can be plotted
    • rxRocCurve: computes and plots ROC curves from actual and predicted data
  • Distributed Computing
    • RxComputeContext: Creates a compute context.
    • RxHpcServer: Creates a Microsoft HPC Server compute context.
    • RxAzureBurst: Creates an Azure Burst HPC Server compute context.
    • RxLsfCluster: Creates an IBM Platform Computing LSF compute context.
    • RxForeachDoPar: Creates a compute context for rxExec using one of the foreach dopar back end.
    • RxLocalParallel: Creates a local compute context for rxExec using the 'parallel' package as back end.
    • RxLocalSeq: Creates a local compute context for rxExec using sequential computations.
    • rxSetComputeContext: Sets a compute context.
    • rxGetComputeContext: Gets the current compute context.
    • rxGetAvailableNodes: Get all the available nodes on a distributed compute context.
    • rxGetNodeInfo: Get information on nodes specified for a distributed compute context.
    • rxPingNodes: Test round trip from end user through computation node(s) in a cluster or cloud
    • rxExec: Run an arbitrary R function on nodes or cores of a cluster.
    • rxGetJobStatus: Get the status of a non-waiting distributed computing job.
    • rxGetJobResults: Get the return object(s) of a non-waiting distributed computing job.
    • rxGetJobOutput: Get the console output from a non-waiting distributed computing job.
    • rxGetJobs: Get the available distributed computing job information objects.
    • rxLocateFile: Get the first occurrence of a specified input file in a set of specified paths.
  • In-database execution with IBM Netezza **
  • Hadoop connectivity

Machine Learning

  • Cluster Analysis
    • K-means
    • Hierarchical clustering
  • General tree structures
  • Neural Networks
  • Trees and Recursive Partitioning

Optimization and Mathematical Programming

  • General purpose optimization
  • Linear constrained optimization
  • Linear programming
  • Nonlinear programming
  • One dimensional optimization
  • Optimization using PORT routines

Signal Processing

  • Convolutions
  • Fast Discrete Fourier Transform
  • FFT
  • Filters
  • Holt-Winter filtering
  • Kalman Filtering
  • Wavelets***

Simulation and Random Number Generation

  • The default RNG is the Mersenne-Twister algorithm.
  • Other generators include
    • Wichmann-Hill
    • Marsaglia-Multicarry
    • Super-Duper
    • Knuth-TAOCP
    • Knuth-TAOCP-2002, as well as
    • user-supplied RNGs.
  • normal random number algorithms :
    • Kinderman-Ramage
    • Ahrens-Dieter
    • Box-Muller
    • Inversion (default).
  • Pseudo-randomness:
  • Quasi-random sequences randtoolbox ***
    • the Sobol sequence
    • the Halton (hence Van Der Corput) sequence
    • the Torus sequence (also known as Kronecker sequence).
    • latin hypercube sampling lhs ***
    • quasi/pseudo random method mc2d ***
  • True randomness: random conectionn to random.org .***
  • RNG tests: RDieHarder ***

Statistical Modeling

  • Analysis of Variance (ANOVA)
  • Factor Analysis
  • Design of Experiments (DoE) & Analysis of Experimental Data
  • Kernel density estimation
  • Linear Models
    • Linear Regression
    • Multiple Regression
    • Comparison of linear models
    • Gaussian mixed-effect models
    • Generalized Additive Models *** mgvc
    • Generalized Linear Models (GLM)
    • GLM ANOVA stats
    • Hierarchical and mixed effects models
  • Multivariate Statistics
    • Multidimensional scaling
    • Multivariate ANOVA
  • Non-linear models
    • Gaussian mixed-effect models
    • Non-linear least squares
  • Principal Components Analysis
  • Robust Statistical Methods
  • Spatial statistics
  • Survival Analysis
  • Time Series Analysis
    • Autoregressive models
    • ARIMA models
      • SARIMA
      • ARIMAX
      • Subset ARIMA models
    • ARCH and GARCH models***
    • Classical decomposition
    • Embedded time series
    • Exponential smoothing models
    • Holt-Winters forecasting
    • Moving Average models
    • Spectral Analysis
    • Vector Autoregressive models (VAR)

Statistical Tests

  • Bartlett test of homogeneity of variances
  • Box-Pierce and Lyjung-Box tests
  • Cochran-Mantel-Haenszel Chi-Squared Test for Count Data
  • F test to compare two variances
  • Fisher’s Exact test for count data
  • Kolmogorov-Smirnov test
  • Kruskal-Wallis Rank Sum test
  • Mood Two-Sample Test of Scale
  • Pairwise t tests
  • Pairwise Wilcoxon Rank Sum tests
  • Power calculations for 1 and 2 sample t tests
  • Shapiro-Wilk Normality test
  • Student’s t-test
  • Tukey Honest Significant Differences
  • Wilcoxon Rank Sum and Signed Tess