You are here

R Competition Brings Out the Best in Data Analytics: R Provides a Winning Edge for Competitive Data Scientists

Competition brings out the best in people. We all seem naturally attracted to contests.

A small Australian firm named Kaggle has taken that human proclivity and leverages it, managing competitions among the world's best data scientists. Corporations, governments, and research laboratories describe their problems to Kaggle and provide data sets. Based on this information, Kaggle creates contests. The entrant with the best solution can win a cash prize. Prizes have ranged from USD $100 to USD $3 million. Clients range from small startups to multinational corporations, such as Ford Motor Company, and large government agencies.

Kaggle is essentially crowdsourcing analytic problems. It is a win-win scenario: contestants get access to real-world, anonymized data; sponsors benefit from the contestants' creativity.

Many Kaggle contestants use programs or packages written in R, the open-source programming language designed specifically for data analysis that has become the lingua franca of statistical analysts. R enables analysts to visualize and model data very rapidly, which has made it a favorite tool for working with extremely large, complex datasets.

R is also uniquely suited for competitions such as those managed by Kaggle because they focus on prototyping and modeling, not on execution.

Jeremy Howard, a highly successful Kaggle contestant, says that R is an important tool for data mining, particularly because it allows users to try different approaches or techniques and quickly determine if that works for them.