Moment by moment – Variance

We are talking about moments which are  measurements that describes some aspect of the shape, for a given set of points or values.

Last week we talked about the mean which is known as the first moment.

Lets now carry on to the second moment, which is known as the variance. We will continue to focus on the normal distribution

Variance

The mean shows us the central point for a distribution of values. The variance of the distribution gives us a measure of how far the values are spread out or dispersed, around the mean.

Calculating

Here is the formula to calculate the variance for a sample of values:

S^{2} \;= \; \frac{ \sum_{i=1}^{n} (x_{i} - \bar{x})} {n-1}

One way to think about variance, is as an “average difference” for the values around the mean, which we could calculate like this:

  • First thing we do, is subtract the mean from all the values, this gives us a difference from the mean for each value.
  • Then sum all these differences
  • Finally divide by n-1 to get this “average difference”.

Differences can be negative

However there is a problem doing this. When we subtract the mean from a value, if the value is less than the mean – we get a negative number for the difference.

Now, when we sum the differences, all the negative ones will just be cancelling out the other positive differences.

To get around this, after we calculate each difference, we square (raise to power 2) the difference (just multiply it by itself). Now if the difference is negative, after we square it we get a positive value … as a negative times a negative gives a positive number. So now, when we do the sum, none of the differences are cancelling each other out.

One other thing, is why do we use n-1 and not n? This is discussed here. Essentially the mean we calculate is for a sample from the entire population of all values. This mean will be a bit smaller than the mean of the population as the sample will not have the same range of values as the population. To adjust for this we use n-1 in the calculation, which makes a bigger adjustment as n gets smaller for a smaller samples.

Another fly in the ointment

There is a problem with the variance, We had to square each difference before we took their sum, in order to get rid of negative differences. We can undo the taking of the square when calculating the variance, by taking the square root of the final calculated variance.

This square root gives a new value which is called the standard deviation, and here is its formula:

S \;= \; \sqrt{ \frac{ \sum_{i=1}^{n} (x_{i} - \bar{x})} {n-1} }

R programming example

I am still very much a newbie to R, but here is some  R programming language code, I used to plot the histogram of some values, and overlay the standard deviation, to the left and to the right of the mean:

Moment_by_Moment_Variance

# Generate a random set of 10000 points from the Normal distribution
values <- rnorm(10000, mean=5, sd=2)

# Plot the histogram of the points
hist(values, breaks=100, main=”Mean with standard deviation to left and right”, xlab=”math4uandme.com”)

the.mean = mean(values)
the.sd = sd(values)

# Overlay the mean
abline(v=the.mean, col=”blue”,lwd=3)

# Overlay the sd to the left of the mean
abline(v=(the.mean – the.sd), col=”brown3″,lwd=5)

# Overlay the sd to the right of the mean
abline(v=(the.mean + the.sd), col=”brown3″,lwd=5)

Links

Covers population and then sample variance …

Introduction to R (PDF document)

Moment by moment – Mean

A moment is measurement that describes some aspect of the shape, for a given set of points or values.

In statistics, sets of points can form recognizable shapes, call distributions. A very popular shape or distribution is called the normal distribution. It is also known as the bell curve, as it looks a bit like the shape of a bell.

Mean

The first moment is called the mean or average. This moment describes a value the points are centered about. Sometimes described as a measurement of “central tendency”.

A histogram is a graphical representation of a distribution of points. We can plot the value, or a range of values against their frequency.

R programming example

Here is some  R programming language code, I used to plot the histogram of some points, and overlay their mean:

# Generate a random set of 1000 points from the Normal distribution
x <- rnorm(1000, mean=5, sd=2)

# Plot the histogram of the points
hist(x, main=”Mean”, xlab=”math4uandme.com”)

# Overlay the mean
abline(v=mean(x), col=”blue”,lwd=5)

Moment_by_Moment_Mean

Calculating

There are different ways to calculate the mean, we will calculate the arithmetic mean, which is also known as the “standard” average:

\bar{x} \; = \; \frac{1}{n}\sum_{i=1}^{n}\;x_{i}

… or to put it another way …

\bar{x} \; = \; \frac{\sum_{i=1}^{n}\;x_{i}}{n}

Either way, we

  • sum (\Sigma which just means add up)
  • all the our points  (x),
  • and divide by the number of points (n).

It works like something like this:

  • When we divide a number, we cut it into equally sized parts
  • We are dividing by the number of points in the sum
  • The most frequent values from this sum, will have most influence on the size of part, the divide comes up with
  • So the result will be based around the most frequent values, which as we plot the histogram in value order … will be the center of the values

Links

Introduction to R (PDF document)

The book no one can read

Code breaking or Cryptanalysis dates back at least to the 9th-century.

There have been a number of well-known successes, such as the breaking of the Enigma code in World War II, which was thought to be unbreakable. This success included the famous mathematician Alan Turing.

Code breaking is a quite a consumer of mathematical talent. According to its website, the National Security Agency (NSA) is the largest employer of mathematicians in the United States.

A very mysterious document

The Voynich manuscript is a 15th century document named after Wilfrid Voynich who bought it in 1912.

voynich_manuscript

Despite attempts by numerous code breakers including William F. Friedman of the NSA, who spent 4 decades on it in his free time – the Voynich manuscript remains unread.

Links

Pareto’s principle or the 80/20 rule

Vilfredo Pareto was an Italian economist, who in 1906 noticed that hat twenty percent of the population owned eighty percent of the wealth.

This has become known as Pareto’s Principle or more descriptively as the 80/20 rule.

In this article from 2002, Microsoft noted that 80 percent of the errors and crashes in Windows and Office – were being caused by 20 percent of the bugs detected.

Pareto chart

A Pareto Chart is used to highlight the most important among a (typically large) set of factors, to help improve results, for example with a business -  its service, quality or sales etc.

Here is an example, that I produced in Excel using the YouTube video in the Links below:

Pareto_Chart

You basically count whatever you are studying and sort by frequency and show them as a bar chart.

You also overlay their cumulative percentage, so you can identify the 80% mark – which can make for a cut-off point. This allows you to then focus on just those factors causing the 80%.

Links

Pictorial demonstration of 80/20, there is no sound for this one

How to draw a Pareto chart in Excel 2010 by Dr Eugene O’Loughlin:

Benford’s law, lead with number 1

Benford’s law describes the distribution of the leading digit in numbers, from many real-life numbers from street addresses to population numbers.

The distribution is not uniform as you can see:

Benfords_Law

According to Benford’s law, the leading digit is 1 about 30% of the time, with 2 appearing  between 17% and 18%, and so on.

The law was first discovered in 1881 by mathematician Simon Newcomb. He noticed that in books of logarithm tables, the front pages were worn more than the back pages. This was because the front pages were used more often, and they contained numbers beginning with the lower digits.

The law was rediscovered in 1938 by Physicist Frank Benford.

Professor Hal Varian of Berkley University California, suggested in 1972 that the law could be used to detect possible fraud.

Interesting what wear patterns can reveal huh.

Links

Monte Carlo Methods – Painting by random numbers

Monte Carlo Methods use random numbers to calculate an approximate value for something. They are used in the physical sciences, finance and engineering amongst other disciplines.

They were developed at Los Alamos, in the 1950s, by a group of researchers led by Nicholas Metropolis, that included John von Neumann and Ulam Stanislaw (of  Ulam Spiral fame). The name Monte Carlo was taken from the famous Casino.

Example approximating pi

Defining Monte Carlo Methods is not easy, so lets just work through a visual example that is often used – calculating the value of \pi

What follows is based on this article which I really liked.

  • Image we draw a circle with radius of length 1.0  (for some unit of measurement),
  • We then surround the circle by a square,
  • Finally cut out a quarter of the square, this will just make the calculations easier later when we program this.

So here is the quarter square (hereafter we’ll just call it the square)  that we will be using:

Monte_Carlo_Methods

Continue imagining – that we now randomly throw darts at this square. So some darts will land  inside the shaded part of the square, and some will be outside.

Now, the number of darts landing inside the shaded area, will be proportional to its area and with the number of darts we toss at the square.

\frac{Number \; of \;darts \; hitting \; shaded \; part}{Number \;of \;darts \; hitting \; the \; square}  =  \frac{Area \; of \; shaded \; part}{Area \; of \; the \; square}

  • The length of the square’s sides is 1.0,
  • Also the radius of the shaded part – (which is quarter of a circle) is also 1.0, which we will call r.

Digging a bit more we find:

  • The area of a circle is \pi r^2, so the area of the shaded part (which is quarter of a circle) is \frac{1}{4}\pi r^2,
  • The area of the square is r^2:

\frac{Number \; of \; darts \; hitting \; shaded \; part}{Number \; of \; darts \; hitting \; the \; square}  =  \frac{Area \; of \; shaded \; part}{Area \; of \; the \; square} = \frac{\frac{1}{4}\pi r^2}{r^2} = \frac{1}{4}\pi

Lets do a final bit of rearranging to get \pi on its own:

\pi \; = \; 4 \times \; \frac{Number \; of \; darts \; hitting \; the \; shaded \; part}{Number \; of \; darts \; hitting \; the \; square}

Cutting the pi with R programming

Okay so much for the theory, lets give it a try. Here is the example in the R programming language.

I started R, and typed “pi” at the console to get its value, so we have something to compare the approximation to:

> pi
[1] 3.141593
>

Now here is the example in the R programming language. This is very inefficient, and there are packages in R for Monte Carlo, like MCSM and MCMCPack (see links), this is just to try and give an intuitive example:

Here we go, with 1,000,000 darts:

> # Monte Carlo simulation for 1,000,000 darts
> #
> num.darts=1000000
>
> # A hit is when a dart lands inside the shaded part
> num.hits = 0
>
> for(i in 1:num.darts) {
+     # Generate random uniform values for x and y
+     # coordinates for the dart
+     x <- runif(1, 0, 1)
+     y <- runif(1, 0, 1)
+
+     # Calculate the distance from the origin for the
+     # dart’s x and y coordinates, using the Pythagoras
+     # theorem
+     dist <- sqrt(x^2 + y^2)
+
+     # Add to the “hits” if the dart landed in the shaded
+     # part, where the distance is less than or equal to
+     # 1.0
+     if(dist <= 1.0) {
+         num.hits = num.hits + 1
+     }
+ }
>
> # Print out the approximate value for pi, multiplying by 4
> # as our calculations we done on 1/4 of a circle
> pi.approx = 4 * (num.hits / num.darts)
> print(pi.approx)
[1] 3.142956
>

So there you go, we got an approximate value for \pi by painting with random numbers.

Links

Just the R packages mentioned for Monte Carlo Methods:

The earth can be flat

Most people are on-board with the earth being round, or to be more precise it is a 3 dimensional spheroid (as it’s not a perfect sphere).

So, how come it fits on a flat 2 dimensional map?

We have Gerardus Mercator to initially thank for this. In 1569 he presented his solution to this problem, and it is known as the Mercator projection.

A grid on the earth

The first thing we need to do, is put some kind of measuring grid on the earth, so we can specify points on it. We do this with lines of  longitude and latitude.

  • Latitude lines are horizontal, starting at the equator with 0 degrees, to  90° (North or South) at the poles. Lines of latitude are sometimes called parallels.
  • Longitude lines are vertical, starting at Greenwich in the United Kingdom. Lines of longitude are called meridians, and Greenwich is called the prime meridian.

From one grid to another

The Mercator projection, is a transformation of points of latitude and longitude on the earth, to an imaginary cylinder that fits over it, as seen here.

The Mercator projection converts a point’s:

  • longitude to x, by subtracting the longitude of a starting point – which is usually Greenwich.
  • latitude to y, by applying an exponential scaling to it, as shown here. As the latitude parallels move towards to poles, they get further and further away from the enclosing cylinder. So, a greater and greater scaling factor is needed to transform the latitude onto the cylinder

With the points of latitude and longitude transformed onto the cylinder, it can then be unrolled to form the map.

There’s a problem

There is distortion incurred with the Mercator projection, in transforming the points of latitude and longitude, and it gets worse as the latitude moves further and further from the equator. The size of countries gets stretched.

On a Mercator map, Greenland shows as larger than Australia as seen here. Actuality, Australia is more than three and a half times larger than Greenland, which you can check here.

There are a number of alternative map transformations, such as Gall-Peters. This site shows examples for a number of these alternatives. However you can’t get away from some kind of distortion, in transforming from a 3 dimensional spheroid, to a 2 dimensional flat plane.

Despite originating in 1569, the Mercator map is still the one many people get to see.

Links

Funny clip from “The West Wing”, where a group is trying to explain the Gall-Peters projection:

Wiki page for Mercator projection