Northern Trust Bank Speeds Operational Risk Models with Revolution R Enterprise
Calculating operational risk is a relatively new discipline and while there are guidelines provided by international agreements and federal statute a great deal of the decision making about how to identify and measure datasets is left intentionally vague. Most risk analysts have settled upon a Loss Distribution Approach (LDA) incorporating four data elements: Internal Loss Data, External Loss Data, Scenario Analysis Data, and Business Environment/Internal Control Factor data. Using these elements analysts must determine whether a single line of business is distinct from the others and requires its own risk exposure estimate for use in the LDA. The biggest challenges here for analysts are the relative youth of operational risk practices and relative paucity of data available for measurement.
Once the heterogeneous datasets, now called “units of measure," have been identified, a Poisson distribution is then used to model the frequency of operational loss events. Understanding the severity of loss events is a much more difficult task and there are several ways that it can be done. The method used is dictated by the nature of the dataset that is being fitted. This is the area where the youth of this practice is especially troublesome. The rules require that organizations estimate a 1-in-1,000 year event based on less than 15 years of operational loss data. In many cases organizations have units of measure that have a small number of observations which can lead to unidentified heterogeneity and/or heavily skewed loss distributions.
With loss frequency modeling and loss severity fitting complete for each unit of measure, the next step is to draw a set of random frequency observations and severities for use in Monte Carlo simulations. Each simulation provides a single point on the aggregate loss distribution.
Figure 1: Each Loss Dataset requires millions of Monte Carlo simulations to be run.
Many simulations containing millions of iterations must be run to observe a sufficient number of losses to reasonably assess what a 1 in 1,000 year event might look like. According to David Humke, "Doing a simulation on this type of data really could take days if you’re just using base R."
Revolution Analytics and Northern Trust Come Together
"Northern Trust went to Revolution Analytics and asked if we could explore the opportunity to parallelize our Monte Carlo simulations." David Humke
In addition to the time lost waiting for results, there is the management headache of trying to run parallelized simulations across different hardware with different operating systems. This can be a considerable resource drain on an analyst group.
Knowing there had to be a better way to parallelize their Monte Carlo simulations, Northern Trust and Revolution Analytics set up a series of tests to benchmark performance of the doRSR and doSMP parallelization packages across different hardware packages. They put together an environment that had both 32-bit and 64-bit operating systems. They also looked at using a single node with multiple processors and multiple nodes with multiple processors including a laptop with 4 cores, a server with 8 cores and a 3-node high performing cluster on Amazon with 8 cores a piece.
The metrics they used to evaluate the software and hardware in performing the simulations were: 1) Elapsed Time by Step and 2) Memory Usage.
Improved Performance and Easily Scalable
Revolution Analytics’ parallelization can be easily scaled up from laptop/server to the cluster using Revolution Analytics’ distributed computing capabilities. As suspected parallelization greatly improved simulation performance. Performance improves with the number of cores and easily scales with the available resources within the cluster.
The graphs below demonstrate the group’s findings.
Figure 2: Revolution Analytics showed significant improvements in processing time.
Save Time and Resources
"Overall, the take away that we found from this work with Revolution Analytics is that they do have a good product offering for parallelization." David Humke
Parallelization will allow analysts to spend less time waiting for results and more time analyzing the results. For risk analysts working with products and services that are active in the marketplace and need to meet regulatory compliance accurately and in time; time to result is a critical factor in their success. Effective parallelization routines are just as important in providing effective resource management and ensuring that your group is taking advantage of all the computing resources available. The benchmark testing with Northern Trust demonstrated that the use of Revolution Analytics’ parallelization packages doRSR and doSMP are much more efficient in managing a diverse hardware environment than attempting to do it manually and that the packages are effective at scaling to use all computing resources within that environment.
About Revolution Analytics
Revolution Analytics is the leading commercial provider of software and services based on the open source R project for statistical computing. The company brings high performance, productivity and enterprise readiness to R, the most powerful statistics language in the world. The company’s flagship Revolution R Enterprise product is designed to meet the production needs of large organizations in industries such as finance, life sciences, retail, manufacturing and media. Used by over two million analysts in academia and at cutting-edge companies such as Google, Bank of America and Acxiom, R has emerged as the standard of innovation in statistical analysis. Revolution Analytics is committed to fostering the continued growth of the R community through sponsorship of the Inside-R.org community site, funding worldwide R user groups and offering free licenses of Revolution R Enterprise to everyone in academia.