Sheldon Ross 10: Example 3.30

sheldon-ross-10-example-3-30_population.jpg

Question: A population consists of m families. Let Xj denote the size of family j and suppose that X1, . . . ,Xm are independent random variables having the common probability mass function

p(k) = P(Xj = k) such that Σk pk = 1 with mean μ = Σkkpk. Suppose a member of the population is randomly chosen, in that the selection is equally likely to be any of the members of the population, and let Si be the event that the selected individual is from a family of size i. Argue that P(Si) → ipiμ as m→∞


Analytical Solution

The text presents two forms of arguments to prove this. The first one is the heuristic method and the second one is a more formal method. We will argue the solution using the heuristic method and then extend our understanding to the simulation. We will do this in several steps outlined below

  • Firstly, the probability that the person chosen randomly from the population belongs to the family of size ‘i’ can be calculated by dividing the following fraction
    • Numerator: This is the total number of people from the family of size ‘i’
    • Denominator: This is the total number of people in the population (aptly put, the size of the population)
  • Numerator – This is simply the total number of families of size i multiplied by i
    • total number of families of size i = m pi
    • the numerator is hence equal to i m pi
  • Denominator – This is the size of the population like we discussed above
    • Sum over all the possibilities of the family sizes and we will get this number
    • hence denominator = Σkjmpk
  • The probability = i m piΣkjmpk
    • as m→∞, we get the probability as ipiμ

Simulations

The program required is of moderate difficulty. But the results are elegant. In the following set of pictures we will see the probabilities unfold as the population size increases and becomes very evident.

  • I used a poisson distribution to define  the probabilities for the sizes of the population (λ = 25)
  • The sizes of the population have been restricted to between 1 and 50 (both numbers inclusive)
  • Since the event space of poisson distribution can be upto infinity, I have ‘unitized’ the sum of the probabilities to 1 for occurrences for 1 through 50
  • This unitization makes sure that we do not have an spaces in our probability space
  • The problem is now set up

Reading the images

  • Reading images is simple enough if we identify the two distinct elements of each of the image
    • The thick dashed red line: This is the calculated or the estimated probability (or the set of pk)
    • The green and red lines: There are a 200 of these lines per each image. These are the results of the simulations
  • What is common across all the images? In all the images the number of green lines is the same and the selected sample size from the population is also the same which is a 1000
    • In summary 200 samples with a size of a  1000 each have been picked and the probabilities of the person being from a family of a particular size have been
  • What is different from image to image? Just the number of families there is.
    • Remember from the question? We are looking how the probabilities themselves change as the number of families are increasing
    • Observe that, as the number of families are increasing, we also see the simulated probabilities are getting close to the actual probabilities
      • This is despite the fact that the sample sizes have not changed
      • It is truly amazing 😲

100 families 


150 families


200 families


500 families


1000 families


2000 families


5000 families


10000 families


20000 families


50000 families


100000 families


Million families


10 Million families


Code

For this post, I have done the code purely in mathematica. It is shown below. Lemme know if you have questions with it.

AppendTo[$Path, "Your_Path_To_Package here"];
Needs["statistics`"];

simulation[families_] :=
    Module[{probabilities =
        unitize[PDF[PoissonDistribution[25], #] & /@ Range[1, 50]] ->
            Range[1, 50], samples, plotData, plotProbabilities},
      samples =
          Table[unitize[
            Counts[RandomChoice[
              Join @@ (Table[# - 1, #] & /@
                  RandomChoice[probabilities, 2000]), families]]], 200];
      plotProbabilities =
          associateLists[Values[probabilities], Keys[probabilities]];
      ListLinePlot[samples ~ Join ~ {plotProbabilities},
        GridLines -> {Range@50, Range[0, 0.20, 0.01]}, ImageSize -> 788,
        InterpolationOrder -> 2,
        PlotStyle ->
            Table[{Thickness[0], Opacity[0.2], Green}, Length@samples] ~
                Join ~ {{Red, Thick, Dashed}}, Mesh -> All,
        MeshStyle -> {Red, PointSize[0]}, PlotRange -> {{0, 50}, {0, 0.15}}]
    ]

The unitize function from the package statistics
unitize[list_List] := N[list / Total[list]]
unitize[association_Association] := Module[{sortedAssociation = KeySort[association],keys,values},
  keys = Keys[sortedAssociation];
  values = unitize[Values[sortedAssociation]];
  Association[MapThread[{#1->#2}&,{keys,values}]]
]


End of the post 😉


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.