Monte Carlo Simulation, explained based on Buffons Needle Problem

Monte Carlo simulation is a stochastic method, in which a large number of random experiments is performed. This is helpful, especially if there is no analytical solution to a problem. I will present “Buffon’s needle” problem. The idea is to throw a needle on a grid with horizontal lines. The probability of a needle intersecting a horizontal line can be used to estimate pi.

Objectives: Monte Carlo Simulation
Requirements: ggplot() for visualising the result, R Basics
Level: Advanced

Monte Carlo Simulation

Monte Carlo simulation is used

as an alternative to an analytical problem, if this is possible
to analyse distribution of random variables of an unknown distribution type
to reproduce complex processes, which are impossible to analyse directly

Enrico Fermi presented the idea of Monte Carlo simulation in 1930s. Later, Stanislaw Ulam and John von Neumann applied it in 1946. Why is it called “Monte Carlo”? John von Neumann had to assign a name for a secret project, in which he used this method. He assigned the name, referring to the casino in Monte Carlo.

Buffon’s Needle

A defined number of identical needles is required. These are thrown on a flat ground. On the ground horizontal threads are strained. Their distance equals needle length. Each intersection of needles is counted. The number of intersecting needles can then be used to estimate pi. George-Louis Leclerc de Buffon presented this idea in 1733 to Paris scientific academy.

We will use a numerical approach to throw our needles.

Throwing the needles

We set the length of a needle to 2 as well as the distance of lines. The number of needles can be modified, we will use 17 needles.

How can we create random needles with a length of two? A needle is defined by two points (x1, y1) and (x2, y2). We start by randomly defining the first point with runif(). In fact, we fix randomness by setting seed with set.seed(), so you will get exactly the same results as presented here.

For both coordinates random numbers between 0 and 6 are created. Now we need some trigonometric functions to calculate coordinates of the second point.

library(ggplot2)
set.seed(61)

length <- 2
distance <- 2
n_needles <- 17
needles <- data.frame(x1 = runif(n = n_needles, min = 0, max = 6), 
              y1 = runif(n = n_needles, min = 0, max = 6),
              angle = runif(n = n_needles, min = -180, max = 180))
needles$x2 <- sin(needles$angle * pi / 180) * length + needles$x1
needles$y2 <- cos(needles$angle * pi / 180) * length + needles$y1

Let’s see how it works. We will plot results with ggplot(). It requires some data reshaping to draw individual needles. In a for() loop for each needle a new data frame is create and plotted as a line. Additionally, horizontal lines are plotted with geom_hline(). coord_cartesian() limits shown area. scale_x_continuous() is used manually set axis labelling.

g <- ggplot()
for (i in 1: n_needles) {
    g <- g + geom_line(data = data.frame(x = c(needles$x1[i], needles$x2[i]),
                         y = c(needles$y1[i], needles$y2[i])),
               aes (x, y), size = 1, color = "blue")
}
g <- g + geom_hline(yintercept = 0, color = "red")
g <- g + geom_hline(yintercept = 2, color = "red")
g <- g + geom_hline(yintercept = 4, color = "red")
g <- g + geom_hline(yintercept = 6, color = "red")
g <- g + coord_cartesian(xlim = c(-1, 7), ylim = c(-1, 7))
g <- g + scale_x_continuous(breaks = seq(-1, 7, 1))
g <- g + scale_y_continuous(breaks = seq(-1, 7, 1))
g

I count 11 intersections. This can be calculated. First, a sequence for horizontal lines is defined. For each element in “needles” intersection is checked. Y coordinates of needles are extracted, sorted, and rounded. Result is stored in “needle_y_coord”. Now, variable “intersect” in dataframe “needles” is defined either True (there is an intersection) or False (no intersection). For this we check if any of the horizontal lines is in between needle y-coordinates.

table() delivers the number of intersections or no intersections. Pi estimation is done according to presented formula.

# order vector
seq_hor_line <- seq(0, 6, length)
needles$intersect <- NA
for (i in 1:n_needles) {
    needle_y_coord <- round(sort(c(needles$y1[i], needles$y2[i])), 2)
    seq_needle <- round(seq(needle_y_coord[1], needle_y_coord[2], .01), 2)
    needles$intersect[i] <- any(seq_hor_line %in% seq_needle)
}

intersect_no <- table(needles$intersect)[1]
intersect_yes <- table(needles$intersect)[2]
pi_estimation <- 2 * n_needles / intersect_yes * length / distance
pi_estimation

##     TRUE 
## 3.090909

Ok, for a first test, quite close to the original. You can play around with parameters to find a better estimation. If you increase number of needles, law of large numbers provides a very good pi estimate.

More Information

Monte Carlo Method https://en.wikipedia.org/wiki/Monte_Carlo_method
Buffon’s Needle Problem https://en.wikipedia.org/wiki/Buffon%27s_needle