The Rise of Data Science in the Age of Big Data Analytics: Why data distillation and machine learning aren't enough

Presented: Thursday, November 1, 2012
Presenter: David Smith, VP Marketing & Community
Download the webinar presentation (PDF) and replay (WMV)

 

The reason why Big Data is important is because we want to use it to make sense of our world. It’s tempting to think there’s some “magic bullet” for analyzing big data, but simple “data distillation” often isn’t enough, and unsupervised machine-learning systems can be dangerous. (Like, bringing-down-the-entire-financial-system dangerous.) Data Science is the key to unlocking insight from Big Data: by combining computer science skills with statistical analysis and a deep understanding of the data and problem we can not only make better predictions, but also fill in gaps in our knowledge, and even find answers to questions we hadn’t even thought of yet.

In this talk, David will

  • Introduce the concept of Data Science, and give examples of where Data Science succeeds with Big Data … and where automated systems have failed.
  • Describe the Data Scientists’ Toolkit: the systems and technology components Data Scientists need to explore, analyze and create data apps from Big Data.
  • Share some thoughts about the future of Big Data Analytics, and the diverging use cases for computing grids, data appliances, and Hadoop clusters
  • Discuss the skills needed to succeed
  • Talk about the technology stack that a data scientist needs to be effective with Big Data, and describe emerging trends in the use of various data platforms for analytics: specifically, Hadoop for data storage and data “refinement”; data appliances for performance and production, and computing grids for data exploration and model development.

View the replay:

View the presentation:

About the Speaker

David Smith

David Smith is the Vice President of Marketing and Community at Revolution Analytics, the leading provider of software and services for the open-source R statistical language. David writes daily about applications of R, analytics and open-source software at the Revolutions blog (blog.revolutionanalytics.com), and was named a top 10 influencer on the topic of “Big Data” by Forbes. He is the co-author (with Bill Venables) of the tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Prior to joining Revolution Analytics, David was the director of product management for S-PLUS at Insightful, Inc. Follow David on Twitter as @revodavid.