Central Limit Theorem

CLT_Title

We will make some observations and do some experiments first. At the end we can summarize the understanding and it will become a definition/explanation.

Let us say we have a dice and the sample space ( or a set of all the possibilities) for a roll is {1, 2, 3, 4, 5, 6}.

Let us pick a random number from this set, or if you have a dice, roll it. Let us say I rolled it and got a number say 6. Let us roll it a 100 times. Since it is difficult to roll it that many times, we will use computer to do it.


Let us create a digital dice first.

Input Code: Table[n, {n, 1, 6}]

Output: {1, 2, 3, 4, 5, 6}


From this digital dice, we will pick a random sample of 100 (this is similar to rolling a dice 100 times and noting the outcomes)

Input Code: Table[RandomChoice[a], {n, 1, 100}]

Output: {4, 2, 1, 1, 5, 6, 2, 3, 1, 2, 5, 2, 2, 2, 4, 3, 1, 1, 5, 2, 2, 3, 1, 5, 4, 1, 3, 4, 1, 1, 3, 5, 4, 2, 5, 1, 2, 6, 1, 5, 6, 4, 6, 6, 6, 4, 1, 4, 4, 4, 4, 4, 2, 2, 1, 6, 4, 6, 4, 5, 5, 4, 6, 2, 4, 5, 2, 3, 5, 3, 5, 5, 2, 1, 2, 6, 6, 6, 5, 6, 3, 3, 3, 6, 3, 6, 5, 4, 6, 1, 1, 5, 5, 6, 1, 6, 2, 4, 3, 1}

The mean of all those values is 3.53

Let us call this one set of calculations.


Let us make hundred sets of these calculations.

Input Code: Table[Mean[Table[RandomChoice[a], {n, 1, 100}]], {n, 1, 100}] 

Output: {3.57, 3.12, 3.61, 3.4, 3.63, 3.67, 3.69, 3.38, 3.26, 3.56, 3.63, 3.36, 3.51, 3.52, 3.43, 3.29, 3.38, 3.84, 3.27, 3.57, 3.55, 3.11, 3.35, 3.42, 3.48, 3.45, 3.33, 3.5, 3.38, 3.56, 3.57, 3.64, 3.63, 3.46, 3.69, 3.48, 3.44, 3.41, 3.54, 2.98, 3.56, 3.29, 3.53, 3.5, 3.64, 3.6, 3.3, 3.65, 3.39, 3.55, 3.38, 3.87, 3.61, 3.44, 3.57, 3.58, 3.03, 3.53, 3.28, 3.4, 3.35, 3.31, 3.71, 3.24, 3.06, 3.66, 3.29, 3.5, 3.41, 3.56, 3.51, 3.75, 3.47, 3.39, 3.52, 3.43, 3.59, 3.9, 3.26, 3.77, 3.72, 3.56, 3.56, 3.84, 3.5, 3.31, 3.59, 3.63, 3.31, 3.33, 3.64, 3.4, 3.57, 3.24, 3.44, 3.4, 3.59, 3.63, 3.5, 3.47}


If we look at the distribution of this set using a histogram

What do you see? Ans: We see that not all of them occur with the same frequency. Let us try for higher number of trials. (Or higher number of averages)

 

 

Distribution of 100 Averages

 

Histogram 100 Samples

 

 

Distribution of 1000 Averages

Histogram 1000 Samples

 

 

Distribution of 5000 Averages

Histogram 5000 Samples

 

 

 

Distribution of 10000 Averages

Histogram 10000 Samples

 

 

 

Distribution of 50000 Averages

Histogram 50000 Samples

 

 

 

Distribution of 100000 Averages

Histogram 100000 Samples

 

 

 

Distribution of 500000 Averages

Histogram 500000 Samples

 

 

Distribution of 1000000 Averages (A Million!)

Histogram 1000000 Samples


What have we seen from this ?

As the number of experiments are increasing, or in this case the number of averages are increasing, the distribution of the means is becoming more and more normal. There is a surprise for you.

What was the set we started the experiment with? It was {1, 2, 3, 4, 5, 6} and the mean of all the elements in the list is 3.5

Look at the last diagram now. Where is the maximum value of the distribution located? 3.5

You see that we didn’t know much about the data we just had the knowledge about the random samples that were extracted from it but we were able to get a good estimate of the mean of the set by looking at the distribution of the sample means. This is in-fact what central limit theorem is.


The code used is

n = 100;
Histogram[Table[Mean[Table[RandomChoice[a], {n, 1, 100}]], {n, 1, n}], 50, ChartStyle -> Hue[0.58], ChartBaseStyle -> {Opacity[0.2], EdgeForm[{Black}]}]


 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.