R Lanugage Feature: Analytics
This document describes the capabilities of the open-source R language included with Revolution R Enterprise. Unless otherwise noted below, these capabilities are current for R 2.14.2 and Revolution R Enterprise 6.0. Additional capabilities not included in the standard open-source R distribution are indicated as follows:
* Requires Revolution R Enterprise
** Requires Revolution R Enterprise for IBM Netezza
*** Requires additional open-source community packages from CRAN
Basic Mathematics
- Complex arithmetic
- Computation of orthogonal polynomials
- Cross products
- Cumulative products and sums
- Exponential functions
- Hyperbolic functions
- Kronecker products on arrays
- Logarithms (any base)
- Logical operators
- Matrix operations including
- median polish of a matrix
- QR decomposition
- singular value decomposition
- spectral decomposition
- median polish of a matrix
- Symbolic and algorithmic derivatives
- Trigonometric functions
Basic Statistics
- mean
- standard deviation
- variance
- median
- quantile
- correlation
- cross tabulations
Probability Distributions
Density, quantiles, probability and simulation for:
- Beta
- Binomial
- Birthday
- Chi-squared
- Empirical cumulative distribution
- Exponential
- F Distribution
- Gamma
- Geometric
- Logistic
- Lognormal
- Negative Binomial
- Normal
- Poisson
- Student’s t
- Tukey’s studentized range distribution
- Uniform
- Weibull
- Wilcoxon signed rank distribution
- Wilcoxon rank sum distribution
Big Data Analytics *
- Memory efficient storage
- External memory algorithms
- Input/Output
- rxImport: Creates an ‘.xdf’ file or data frame from a data source (e.g. text, SAS, SPSS data files).
- rxTextToXdf: Creates an ‘.xdf’ file from a delimited text file.
- rxDataFrameToXdf: Creates an ‘.xdf’ file from a data frame.
- rxXdfToText: Creates a text file from an ‘.xdf’ file.
- rxDataStep, rxXdfToDataFrame, rxReadXdf: Reads an ‘.xdf’ file into a data frame.
- rxGetInfo: Retrieves header information from an ‘.xdf’ file or summary information from a data frame.
- rxSetInfo: Sets a file description in an ‘.xdf’ file or a description attribute in a data frame.
- rxGetVarInfo: Retrieves variable information from an ‘.xdf’ file or data frame.
- rxSetVarInfo: Modifies variable information in an ‘.xdf’ file or data frame.
- RxXdfData: Creates an ‘.xdf’ data source object.
- RxTextData: Creates a text data source object.
- RxSasData: Creates a SAS data source object.
- RxSpssData: Creates an SPSS data source object.
- RxOdbcData: Creates an ODBC data source object.
- rxOpen: Open a data source for reading.
- rxReadNext: Read data from a data source.
- rxClose: Close a data source.
- rxImport: Creates an ‘.xdf’ file or data frame from a data source (e.g. text, SAS, SPSS data files).
- Data Manipulations/Data Step
- rxDataStep: Transform and subset data in ‘.xdf’ files or data frames.
- rxSort: Multi-key sorting of the variables an ‘.xdf’ file or data frame.
- rxMerge: Merges two ‘.xdf’ files or data frames using a variety of merge types.
- rxSplitXdf: Split an input ‘.xdf’ file or data frame into multiple ‘.xdf’ files or a list of data frames.
- rxDataStep: Transform and subset data in ‘.xdf’ files or data frames.
- Descriptive Statistics and Cross Tabs
- rxSummary: Basic summary statistics of data.
- rxQuantile: Computes approximate quantiles for .xdf files and data frames without sorting
- rxCorCov: Calculate the covariance, correlation, or sum of squares / cross-product matrix for a set of variables
- rxCrossTabs: Formula-based cross-tabulation of data.
- rxCube: Alternative formula-based cross-tabulation designed for efficient representation.
- rxMarginals: Marginal summaries of cross-tabulations.
- as.xtabs: Converts cross tabulation results to an xtabs object.
- rxChiSquaredTest: Performs Chi-squared Test on xtabs object.
- rxFisherTest: Performs Fisher's Exact Test on xtabs object.
- rxKendallCor: Computes Kendall's Tau Rank Correlation Coefficient using xtabs object.
- rxPairwiseCrossTab: Apply a function to pairwise combinations of rows and columns of an xtabs object.
- rxRiskRatio: Calculate the relative risk on a two-by-two xtabs object.
- rxOddsRatio: Calculate the odds ratio on a two-by-two xtabs object.
- Statistical Modeling
- rxLinMod: Fits a linear model to data.
- rxCov: Calculate the covariance matrix for a set of variables.
- rxCor: Calculate the correlation matrix for a set of variables
- rxSSCP: Calculate the sum of squares / cross-product matrix for a set of variables
- rxLogit: Fits a logistic regression model to data.
- rxRoc: Receiver Operating Characteristic (ROC) computations using actual and predicted values from binary classifier system
- rxGlm: Fits a generalized linear model to data
- rxPredict: Calculates predictions for fitted models.
- rxKmeans: Performs k-means clustering.
- rxDTree: Fits a classification or regression tree to data.
- Utility Functions
- rxOptions: Gets or sets RevoScaleR-specific options.
- rxGetOption: Retrieves a specific RevoScaleR option.
- rxOptions: Gets or sets RevoScaleR-specific options.
- Basic Graphing Functions
- rxHistogram: Creates a histogram from data.
- rxLinePlot: Creates a line plot from data.
- rxLorenz: Computes a Lorenz curve which can be plotted
- rxRocCurve: computes and plots ROC curves from actual and predicted data
- rxHistogram: Creates a histogram from data.
- Distributed Computing
- RxComputeContext: Creates a compute context.
- RxHpcServer: Creates a Microsoft HPC Server compute context.
- RxAzureBurst: Creates an Azure Burst HPC Server compute context.
- RxLsfCluster: Creates an IBM Platform Computing LSF compute context.
- RxForeachDoPar: Creates a compute context for rxExec using one of the foreach dopar back end.
- RxLocalParallel: Creates a local compute context for rxExec using the 'parallel' package as back end.
- RxLocalSeq: Creates a local compute context for rxExec using sequential computations.
- rxSetComputeContext: Sets a compute context.
- rxGetComputeContext: Gets the current compute context.
- rxGetAvailableNodes: Get all the available nodes on a distributed compute context.
- rxGetNodeInfo: Get information on nodes specified for a distributed compute context.
- rxPingNodes: Test round trip from end user through computation node(s) in a cluster or cloud
- rxExec: Run an arbitrary R function on nodes or cores of a cluster.
- rxGetJobStatus: Get the status of a non-waiting distributed computing job.
- rxGetJobResults: Get the return object(s) of a non-waiting distributed computing job.
- rxGetJobOutput: Get the console output from a non-waiting distributed computing job.
- rxGetJobs: Get the available distributed computing job information objects.
- rxLocateFile: Get the first occurrence of a specified input file in a set of specified paths.
- In-database execution with IBM Netezza **
- Hadoop connectivity
Machine Learning
- Cluster Analysis
- K-means
- Hierarchical clustering
- K-means
- General tree structures
- Neural Networks
- Trees and Recursive Partitioning
Optimization and Mathematical Programming
- General purpose optimization
- Linear constrained optimization
- Linear programming
- Nonlinear programming
- One dimensional optimization
- Optimization using PORT routines
Signal Processing
- Convolutions
- Fast Discrete Fourier Transform
- FFT
- Filters
- Holt-Winter filtering
- Kalman Filtering
- Wavelets***
Simulation and Random Number Generation
- The default RNG is the Mersenne-Twister algorithm.
- Other generators include
- Wichmann-Hill
- Marsaglia-Multicarry
- Super-Duper
- Knuth-TAOCP
- Knuth-TAOCP-2002, as well as
- user-supplied RNGs.
- Wichmann-Hill
- normal random number algorithms :
- Kinderman-Ramage
- Ahrens-Dieter
- Box-Muller
- Inversion (default).
- Kinderman-Ramage
- Pseudo-randomness:
- RNGs from GNU GSL RDieHarder ***
- SF Merseene Twister
- Well randtoolbox ***
- RNGs from GNU GSL RDieHarder ***
- Quasi-random sequences randtoolbox ***
- True randomness: random conectionn to random.org .***
- RNG tests: RDieHarder ***
Statistical Modeling
- Analysis of Variance (ANOVA)
- Factor Analysis
- Design of Experiments (DoE) & Analysis of Experimental Data
- Kernel density estimation
- Linear Models
- Linear Regression
- Multiple Regression
- Comparison of linear models
- Gaussian mixed-effect models
- Generalized Additive Models *** mgvc
- Generalized Linear Models (GLM)
- GLM ANOVA stats
- Hierarchical and mixed effects models
- Linear Regression
- Multivariate Statistics
- Multidimensional scaling
- Multivariate ANOVA
- Multidimensional scaling
- Non-linear models
- Gaussian mixed-effect models
- Non-linear least squares
- Gaussian mixed-effect models
- Principal Components Analysis
- Robust Statistical Methods
- Spatial statistics
- Survival Analysis
- Time Series Analysis
- Autoregressive models
- ARIMA models
- SARIMA
- ARIMAX
- Subset ARIMA models
- SARIMA
- ARCH and GARCH models***
- Classical decomposition
- Embedded time series
- Exponential smoothing models
- Holt-Winters forecasting
- Moving Average models
- Spectral Analysis
- Vector Autoregressive models (VAR)
- Autoregressive models
Statistical Tests
- Bartlett test of homogeneity of variances
- Box-Pierce and Lyjung-Box tests
- Cochran-Mantel-Haenszel Chi-Squared Test for Count Data
- F test to compare two variances
- Fisher’s Exact test for count data
- Kolmogorov-Smirnov test
- Kruskal-Wallis Rank Sum test
- Mood Two-Sample Test of Scale
- Pairwise t tests
- Pairwise Wilcoxon Rank Sum tests
- Power calculations for 1 and 2 sample t tests
- Shapiro-Wilk Normality test
- Student’s t-test
- Tukey Honest Significant Differences
- Wilcoxon Rank Sum and Signed Tess
