R Is Hot
How Did a Statistical Programming Language Invented in New Zealand Become a Global Sensation?
By David Smith
The growing popularity of R suggests it is headed toward mainstream acceptance by the analytic community. It has also been praised by established media outlets, which is astounding.
Why is an esoteric programming language suddenly all the rage? There are underlying economic and social factors, such as the fast pace at which data is being generated. The perceived value of data is also greater, which has led to new methods for analyzing complex datasets. The current analytic solutions are cumbersome and costly, which has prompted the creation of new, less expensive number-crunching techniques, many of which are written in R.
Norman Nie, a nationally recognized scholar in survey research, quantitative social science, and political behavior, as well as a co-founder of SPSS, calls R “the most powerful and flexible statistical programming language in the world.”
Since its release in 1996, R has dramatically changed research software. There are few things that SAS or SPSS will do that R cannot. R can do things that the other applications cannot. Because R is freely available, it is worth investigating.
More Than a Programming Language
R is a full-fledged programming language with a radically different approach to processing large and complex datasets. It is an open-source project that depends on a worldwide development community to grow and evolve. Like Linux, R is maintained and supported by those individuals who use it and contribute to its ongoing development. Mike King, a quantitative analyst at Bank of America, for example, uses R to write programs for capital adequacy modeling, decision systems design, and predictive analytics.
Critical Mass and Going Viral
R was created in 1993 by Ross Ihaka and Robert Gentleman of the University of Aukland, New Zealand. It was named R simply because its creators both have first names beginning with the letter “R.” Some believe that its name is an homage to the S language from which R is derived.
Ihaka and Gentleman intended to create a language that would enable them to more easily teach their introductory data-analysis courses. News of the new language spread quickly and they were convinced to make its source code available under the GNU General Public License. Their decision to share R freely was a seminal mark in analytic software development.
As interest in R increased, a core group of global, leading statisticians and computer scientists became the project’s official leadership team, which regularly oversees changes and implementations of new R features as well as provide support for R users.
The R user community is so large that it generates new R packages at an astonishing pace akin to a gigantic, self-organizing virtual factory. Glenn Meyers, a vice president of research at ISO Innovative Analytics and a well-known expert in the field of casualty actuary, writes about new techniques for analyzing data. He often includes the code in his column. Most of his code is written in R, which has become the lingua franca of statistical analysis. Meyers says once R code is released, it is immediately usable.
Commercial software vendors will rarely develop new programs unless there’s a sufficiently large market to justify their development costs, but the process can also take years. By contrast, the R community develops and releases new software continuously.
R also offers benefits to companies trying to reduce their software expenditures with enterprise software vendors such as SAS and SPSS.
Power from Elegance
The rock star of the R movement is Hadley Wickham, an assistant professor at Rice University who has written and contributed to more than 20 R packages. Most of his research focuses on making data analysis better, faster, and easier; he is also interested in data visualization techniques. He’s interested in making it easy to use R because it was specifically designed to deal with common data problems.
Because R was created by statisticians for statisticians, it’s loaded with features required for every day statistical analysis. Its design is frequently described as “elegant” because it is in tune with how statisticians think.
No Need to Reinvent the Wheel
Precisely because R is a programming language -- as opposed to a piece of software -- new analytic techniques written in R can be saved and reused. When R users discover something novel, they have two options not generally available to users of conventional software: they can share the new techniques with other R users or they can reproduce and reuse the new techniques they have discovered.
These represent enormous potential value. The ability to save and reuse improvised functions means that you’re not forced to reinvent the wheel each time an analytic operation is run. New R code can be shared through groups such as the Comprehensive R Archive Network (CRAN).
High Quality Graphics, Made Easy
R is especially useful for quickly and easily generating charts and graphics. This is important because it enables users to see patterns and anomalies within the data.
The New York Times has been a leader in the use of charts and graphics designed to help readers understand complicated stories. Amanda Cox, a Times graphics editor, says R is particularly valuable at deadline when data is scant and time is precious.
Peter Aldhous, the San Francisco bureau chief of New Scientist magazine, has used R to generate information used by graphic designers to create charts for his articles, as well as to draw insights quickly from data for his articles. He says graphical analysis is helpful in understanding data.
R allows those who are not professional analysts to create high-quality charts and graphs. If the user experiences challenges while working with R, they are able to tap the R community’s expertise.
Building a Business
The value of R to business is illustrated by John Lucker and his team in the Advanced Analytics and Modeling practice at Deloitte Consulting LLP. When it was initially launched, the group’s prime focus was solving vexing business problems for insurance industry clients. A lack of robust analytic processes for supporting critical underwriting decisions was particularly challenging. Although the process was rigorous, Lucker says it was also subjective and based on intuition.
R allowed them to effectively communicate their analytic results in ways that were easily understood by non-technical audiences. The success of Deloitte Consulting Advanced Analytics and Modeling practice became a blueprint for the business’s expansion into new markets. R played an important role in growing the practice, allowing it to address clients’ specific needs.
Changing, Transforming, and Evolving
R is a truly global phenomenon. Unlike traditional, commercial data-analysis software, R is both flexible and extensible. It is also constantly changing to meet the evolving needs of the global economy.
R’s popularity is no fad. It was designed from the ground up for handling real-world, complex data sets. R-based programs are routinely used for solving real-time problems in various fields.
Acceptance of R as a statistical lingua franca is based on its ability to transform and evolve. As new statistical analysis techniques are discovered, they now emerge as R packages first, well before they are incorporated in conventional software.
R has become both ubiquitous and indispensable. The R community supports its development, innovation, and continuous improvement, and also contributes novel ideas to the field of quantitative analytics. Although the future of quantitative analysis is uncertain, it is clear that a good deal of it will be written in R.
About David Smith
David Smith is vice president of marketing at Revolution Analytics and co-author of An Introduction to R. He was an originating developer of ESS: Emacs Speaks Statistics.
About Revolution Analytics
Revolution Analytics delivers advanced analytics software used by leading organizations for their data analysis, development, and mission-critical production needs. The company is committed to fostering the growth of the R community and providing commercial applications for every type of user and budget.
|Download the full PDF white paper R is Hot|