Revolution Webinar: Scalable Data Analysis in R
| Presented: | Wednesday, October 26th, 2011 |
| Presenter: | Lee Edlefsen, Ph.D., Chief Scientist, Revolution Analytics |
Click here to replay the webinar and download the presentation. |
For the past several decades the rising tide of technology -- especially the increasing speed of single processors -- has allowed the same data analysis code to run faster and on bigger data sets. That happy era is ending. The size of data sets is increasing much more rapidly than the speed of single cores, of I/O, and of RAM. To deal with this, we need software that can use multiple cores, multiple hard drives, and multiple computers. That is, we need scalable data analysis software.
R is the ideal platform for scalable data analysis software. It is easy to add new functionality in the R environment, and easy to integrate it into existing functionality. R is also powerful, flexible and forgiving.
In this webinar, Dr. Edlefsen will discuss the approach to scalability Revolution Analytics has taken with its RevoScaleR package. He will discuss this approach from the point of view of:
- Storing data on disk
- Importing data from other sources
- Reading and writing of chunks of data
- Handling data in memory
- Using multiple cores on single computers
- Using multiple computers
- Automatically parallelizing "external memory" algorithms
Presenter Bio
Prior to joining Revolution Analytics, Lee Edlefsen was CEO and co-founder of ExaMetrix, the producer of ExaStat, an open source environment for analyzing huge data sets. Previously, he served as Vice President of Development for the Data Analysis Products Division at MathSoft (now TIBCO Software). Edlefsen co-founded, grew, and sold two successful software companies, Aptech Systems—the producers of the GAUSS Mathematical and Statistical System, and TriMetrix—the creators of the Axum technical graphics and data analysis package. He holds a Ph.D. from Harvard University.
