**Question:** A population consists of m families. Let X_{j} denote the size of family j and suppose that X_{1}, . . . ,X_{m} are independent random variables having the common probability mass function

p(k) = P(X_{j} = k) such that Σ_{k} p_{k} = 1 with mean μ = Σ_{k}kp_{k}. Suppose a member of the population is randomly chosen, in that the selection is equally likely to be any of the members of the population, and let Si be the event that the selected individual is from a family of size i. Argue that P(S_{i}) → ^{ipi}⁄_{μ} as m→∞

**Analytical Solution**

The text presents two forms of arguments to prove this. The first one is the heuristic method and the second one is a more formal method. We will argue the solution using the heuristic method and then extend our understanding to the simulation. We will do this in several steps outlined below

- Firstly, the probability that the person chosen randomly from the population belongs to the family of size ‘i’ can be calculated by dividing the following fraction
*Numerator:*This is the total number of people from the family of size ‘i’*Denominator:*This is the total number of people in the population (aptly put, the size of the population)

*Numerator*– This is simply the total number of families of size i multiplied by i- total number of families of size i = m p
_{i} - the numerator is hence equal to i m p
_{i}

- total number of families of size i = m p
*Denominator –*This is the size of the population like we discussed above- Sum over all the possibilities of the family sizes and we will get this number
- hence denominator = Σ
_{k}jmp_{k}

- The probability =
^{i m pi}⁄_{Σkjmpk}- as m→∞, we get the probability as
^{ipi}⁄_{μ}

- as m→∞, we get the probability as

**Simulations**

The program required is of moderate difficulty. But the results are elegant. In the following set of pictures we will see the probabilities unfold as the population size increases and becomes very evident.

- I used a poisson distribution to define the probabilities for the sizes of the population (λ = 25)
- The sizes of the population have been restricted to between 1 and 50 (both numbers inclusive)
- Since the event space of poisson distribution can be upto infinity, I have ‘unitized’ the sum of the probabilities to 1 for occurrences for 1 through 50
- This unitization makes sure that we do not have an spaces in our probability space
- The problem is now set up

*Reading the images*

- Reading images is simple enough if we identify the two distinct elements of each of the image
*The thick dashed red line:*This is the calculated or the estimated probability (or the set of p_{k})*The green and red lines:*There are a 200 of these lines per each image. These are the results of the simulations

*What is common across all the images?*In all the images the number of green lines is the same and the selected sample size from the population is also the same which is a 1000- In summary 200 samples with a size of a 1000 each have been picked and the probabilities of the person being from a family of a particular size have been

*What is different from image to image?*Just the number of families there is.- Remember from the question? We are looking how the probabilities themselves change as the number of families are increasing
- Observe that, as the number of families are increasing, we also see the simulated probabilities are getting close to the actual probabilities
- This is despite the fact that the sample sizes have not changed
- It is truly amazing 😲

*100 families *

*150 families*

*200 families*

*500 families*

*1000 families*

*2000 families*

*5000 families*

*10000 families*

*20000 families*

*50000 families*

*100000 families*

*Million families*

*10 Million families*

**Code**

For this post, I have done the code purely in mathematica. It is shown below. Lemme know if you have questions with it.

AppendTo[$Path, "Your_Path_To_Package here"]; Needs["statistics`"]; simulation[families_] := Module[{probabilities = unitize[PDF[PoissonDistribution[25], #] & /@ Range[1, 50]] -> Range[1, 50], samples, plotData, plotProbabilities}, samples = Table[unitize[ Counts[RandomChoice[ Join @@ (Table[# - 1, #] & /@ RandomChoice[probabilities, 2000]), families]]], 200]; plotProbabilities = associateLists[Values[probabilities], Keys[probabilities]]; ListLinePlot[samples ~ Join ~ {plotProbabilities}, GridLines -> {Range@50, Range[0, 0.20, 0.01]}, ImageSize -> 788, InterpolationOrder -> 2, PlotStyle -> Table[{Thickness[0], Opacity[0.2], Green}, Length@samples] ~ Join ~ {{Red, Thick, Dashed}}, Mesh -> All, MeshStyle -> {Red, PointSize[0]}, PlotRange -> {{0, 50}, {0, 0.15}}] ]The unitize function from the package statistics

unitize[list_List] := N[list / Total[list]] unitize[association_Association] := Module[{sortedAssociation = KeySort[association],keys,values}, keys = Keys[sortedAssociation]; values = unitize[Values[sortedAssociation]]; Association[MapThread[{#1->#2}&,{keys,values}]] ]

End of the post 😉

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.