High-Performance Monte Carlo Simulation on Multiprocessor Computers using ParallelR™
ABSTRACT – ParallelR for Monte Carlo Portfolio Analysis
We examined the implementation of ParallelR in a parallel computing problem typically found in the financial industry: portfolio analysis. We used a Monte Carlo simulation run in parallel to statistically determine the risk and reward of thousands of different investment portfolios. Our results show excellent performance and the system scales as the number of CPU cores increases. Such software can be developed and run by a typical R user who may not be an expert in parallel programming, which is an important feature since programming in parallel is both difficult and time-consuming.
Introduction – Monte Carlo Analysis with ParallelR
Organizations must now process and analyze exorbitant amounts of data and use it in real-time to ensure their continued competitiveness. However, most problems that present themselves are too complex to be mathematically analyzed using conventional means. This has led to the use of large scale, statistical Monte Carlo simulations.
Also, to keep pace with users’ processing needs, computing technology has moved from using a single CPU to using combinations of multicore CPU chips as well as nodes constructed of these chips. The availability of these commodity multiprocessors has meant that Monte Carlo simulations -- which are “embarrassingly parallel” -- can be easily and affordably implemented using a combination of high-level software tools and scripting languages in lieu of conventional compiled languages such as Fortran or C.
Some users have selected the R statistical computing environment for these types of tasks. Now, a user can select ParallelR, specifically designed for parallelizing applications written in R and for developing these sorts of programs, and can write code easily, without the need for formal training in parallel programming. The tool’s Sleigh function allows a developer to work within the familiar R environment without the need to master the low-level tools such as MPI.
To illustrate how simply ParallelR can be implemented, we show a Monte Carlo application created for financial services. Our results show that the system provides excellent performance and scales very well.
Example – Markowitz Portfolio Optimization with ParrallelR
High-performance computing systems are commonly used within the financial sector for tasks such as credit risk assessment, portfolio optimization, and credit-card fraud detection. Our example is a simple, prototype portfolio optimization problem based on a concept originally suggested by Markowitz.
With this example, we intend to illustrate how multiprocessors can be used to accelerate general portfolio optimization. Our benchmark data shows our parallel portfolio optimizer scales as a function of the number of available cores.
For our example, we've created a portfolio that consists of a fixed number of investments, the prices of which are characterized by randomly selected normal distributions. We chose to use random normal distribution functions to represent investments because we are focused on the parallel computing aspects of this problem. Using a Monte Carlo approach, we evaluate the reward/risk of a portfolio chosen from these investments with only “long positions” allowed.
Each investment is assigned a specific weight based on a fraction of the total portfolio value that is allocated to that investment. We generate a large number of random long position portfolios, computing the reward and risk by applying a Monte Carlo simulation. We randomly sample each investment in the portfolio a large number of times and compute the portfolio’s mean and risk. The upper bound of the points plotted form is what Markowitz called “the efficient frontier.” The efficient frontier approach has been used to analyze reward/risk within the petroleum industry for many years.
Discussion – Using ParallelR and the Sleigh function
In order to write a program in parallel, we used R as well as the Sleigh function in ParallelR. The portion of the code that was made to run in parallel was the outer loop used to construct random portfolios. Although, in theory, this process can create numerous tasks that require evaluation, in our code, we calculated the evaluation using a Monte Carlo simulation, which was run on a six node cluster, each node of which contained dual, dual-core AMD Opteron™ CPU chips. A total of 24 cores were available for computation.
Portfolio optimization problems are typically computationally intensive. It may be necessary to compute the problem in real-time to take into consideration the various market variables.
Benchmark – Time Required for Large Scale Portfolio Analysis with R
For our benchmark, we examined 3,600 portfolios, each of which contained 25 stocks. We used a Monte Carlo approach to calculate the reward and risk for each stock, then randomly sampled 10,000 points from each stock’s distribution function. We calculated the reward and the risk for a given portfolio.
The amount of time it took to analyze these portfolios using various Sleigh workers is shown in Figure 1 in the PDF. Figure 2 shows the acceleration possible given varying numbers of Sleigh workers compared to analyzing the same portfolio using sequential computation. With eight Sleigh workers, for example, the portfolio analysis speed was increased by a factor of seven. We used Gigabit Ethernet for the analysis; however, larger datasets may require a user to select a different, high-performance interconnect for optimal performance.
Conclusion
Multiprocessors are ideal to apply to complex problems as they can be scaled up to the natural limits of the problem size. ParallelR allows R users to easily create and run applications able to efficiently use multiprocessor hardware, which allows Monte Carlo simulations to be executed expediently and affordably.
| Download the full PDF white paper on High-Performance Monte Carlo Simulation on Multiprocessor Computers using ParallelR™. |
