**Question: **An automobile insurance company classifies each of its policyholders as being of one of the types i = 1, . . . , k. It supposes that the numbers of accidents that a type i policyholder has in successive years are independent Poisson random variables with mean λ_{i}, i = 1, . . . , k. The probability that a newly insured policyholder is type i is p_{i}, such that Σ_{i} p_{i} = 1.

- Given that a policyholder had n accidents in her first year, what is the expected number that she has in her second year?
- What is the conditional probability that she has m accidents in her second year?

**Analytical Solution Part 01: **There are many subtle elements to this problem, so we would break it down slowly

- For writing simplicity, we will designate the following notation
- Probability of Poisson of rate λ and the events being k is written shortly as
- ψ ( λ, k )

- We know that the policyholder has
*n*accidents in the first year. We also know that the number of accidents in*any*given year is governed by Poisson Distribution of a rate λ_{i}- and λ
_{i}being the rate is probabilistically determined by p_{i}

- and λ
- Firstly we will start off by symbolically written as
- P = E [N
_{2 }| N_{1 }= n]- where N
_{i }is the number of events in the the i^{th}year

- where N
- Recall from the statement above that the policyholder could be of any of the type
*T*, determined by the probability p_{i} - Conditioning on the type
*T*, we get - E = E [N
_{2 }| T = j, N_{1 }= n] P{ T = j | N_{1 }= n }- Even though the overall events are defined by a Poisson Distribution, the events themselves are independent which means
- E[N
_{1}] = E[N_{2}] = E[N_{j}] = λ_{j} - This understanding is vital for the simplification which is illustrated in the next step

- E[N

- Even though the overall events are defined by a Poisson Distribution, the events themselves are independent which means
- This makes P = λ
_{j }P{ T = j | N_{1 }= n } - Calculation of P{ T = j | N
_{1 }= n }: This is sort of a tricky calculation so we will break this down further- Since this is also the conditional probability, an expansion gives us
- P{ T = j | N
_{1}= n } =^{P{T=j, N1=n}}⁄_{P{N1=n}} - Let us compute the numerator and denominator separately
*Numerator:*P{T=j, N_{1}=n} is also, P{ N_{1 }= n, T = j } further becomes, P{ N_{1 }= n | T = j } P { T = j }- ψ ( λ
_{j}, n ) p_{j}

- ψ ( λ
*Denominator:*For the calculation of this, P{ N_{1 }=n } we can use more conditioning as following- ⇒ P{ N
_{1 }=n } = Σ_{j}P{ N_{1 }=n | T = j } P{ T = j } - ⇒ P{ N
_{1 }=n } = Σ_{j}P{ N_{1 }= n | T = j } P{ T = j } - ⇒ P{ N
_{1 }=n } = Σ_{j}P{ N_{1 }= n | T = j } p_{j} - ⇒ P{ N
_{1 }=n } = Σ_{j }ψ ( λ_{j}, n ) p_{j}

- ⇒ P{ N

- P = E [N
- Putting everything together the expected value is
^{Σj ψ ( λj, n ) λj pj }⁄_{ Σj ψ ( λj, n ) pj}

**Analytical Solution Part 02:**

- We can use a similar approach to the
*Part 01*to determine the probability - Firstly the probability can be written and then we can condition on the type
- P { N
_{2 }= m | N_{1 }= n } = Σ_{j }= P { N_{2 }= m | T = j, N_{1 }= n } P { N_{1 }= n | T = j }- The first part of the above can be simplified by noting the independence N
_{1 }and N_{2}- this essentially means that the first part of the probability product is ψ ( λ
_{j}, m )

- this essentially means that the first part of the probability product is ψ ( λ
- For the second part, we can use the result from Part 01 which has ..
- a numerator of ψ ( λ
_{j}, n ) p_{j} - and a denominator of Σ
_{j }ψ ( λ_{j}, n ) p_{j}

- a numerator of ψ ( λ

- The first part of the above can be simplified by noting the independence N

- P { N
- Putting everything together, we get P { N
_{2 }= m | N_{1 }= n } = Σ_{j }^{ψ ( λj, m ) ψ ( λj, n ) pj }⁄_{ Σj ψ ( λj, n ) pj }

**Algorithm:** The methodology-algorithm has for the simulation has been outlined below

- Decide on the number of types of profiles of the policyholders. Let us call this number k
- Generate k random real numbers
- ‘Unitize’ the list of the real number
- By unitize here, I mean make the sum of the list to be equal to one.
- After unitization, you can the list as probabilities

- Generate k unique rates
- these would correspond to having
*k*distinct profiles

- these would correspond to having
- After generating two lists of equal length
*k*the rate can be associated with the probabilities - This association can be used to randomly generate a long list of rates with their odds being determined by the probability associated with it
- Then pass these rates into a Poisson Generator to get the number of incidents for the first year of the policy and the second year of the policy
- Use these lists for probability and calculations and plots
- compare results from the simulation with the calculated ones

**Simulations:**

*All the profiles: *The types of possible profile have been pre-generated and shown below. The type of profile means that the number accidents per year for that profile is determined by the corresponding value of λ_{j}

)Type of Profile (also λ_{j} |
Probability of Profile |

2 | 0.036757591 |

3 | 0.051059596 |

4 | 0.088901919 |

7 | 0.033155605 |

11 | 0.013171358 |

13 | 0.092618411 |

16 | 0.001453449 |

17 | 0.049638466 |

18 | 0.047169129 |

21 | 0.057450608 |

23 | 0.070877438 |

25 | 0.077041618 |

27 | 0.061598955 |

28 | 0.062665608 |

29 | 0.052379805 |

30 | 0.093841463 |

32 | 0.028773177 |

35 | 0.025502251 |

36 | 0.006873262 |

38 | 0.049070292 |

*Expected values:* The expected number of accidents in the second year has been derived with a conditional on the type of profile and the number accidents in the first year. The simulation of the same is shown in the plot below.

- The green dashed line is the expected probability
- The squiggly lines are the results from 200 different simulations (or put another way, 200 policyholders) with
- each line containing a sample size of a 1000 people
- this set was split into various groups based on the number of accidents in the first year
- the mean of the each of the groups’ second year accidents have been calculated

*Probabilities:* This is a three dimensional in nature because of the following reasons

- there are many possible number of accidents during the first year (N
_{1}) - then for each of possibility, there are many possible types of accidents in the second year (N
_{2}) - Which is why I plot several parameters in

**Code**

AppendTo[$Path, "Add_your_path_here"]; Needs["utilities`"]; Needs["statistics`"]; ClearAll[expectedValue, theoProbabilities]; expectedValue[probabilities_Association, n_] := Module[{rates = Keys@probabilities, odds = Values@probabilities}, Plus @@ (rates * odds * (PDF[PoissonDistribution[#], n] & /@ rates)) / Plus @@ (odds * (PDF[PoissonDistribution[#], n] & /@ rates))] theoProbabilities[probabilities_Association, n_, m_] := Module[{rates = Keys@probabilities, odds = Values@probabilities}, Plus @@ (odds * (PDF[PoissonDistribution[#], n] & /@ rates) * (PDF[PoissonDistribution[#], m] & /@ rates)) / Plus @@ (odds * (PDF[PoissonDistribution[#], n] & /@ rates))] Module[{types = 20, probabilities, typeProbabilities, allTypeTimeLines, someTypeTimeLines, restTypeTimeLines, firstYears = Range[0, 42], meanData, meanPlotData, expectedData, nPolicyholders = 200, probPreData, probPreDataGrouped, probPreDataGroupedKeys, singlePlotData, probPlotData1, probPlotData2}, BlockRandom[ probabilities = Association@ MapThread[#1 -> #2 &, {RandomSample[Range[1, 2 * types], types], RandomReal[{0, 1}, types]}]; probabilities = statistics`unitize[probabilities]; typeProbabilities = Values@probabilities -> Keys@probabilities;, RandomSeeding -> 1]; allTypeTimeLines = Table[RandomVariate[PoissonDistribution[#], 2] & /@ RandomChoice[typeProbabilities, 1000], nPolicyholders]; ListPlot[Sort[probabilities], Frame -> True, FrameLabel -> {"Profile Type", "Probability"}, GridLines -> {Range@40, Range[0, 0.2, 0.005]}, ImageSize -> 788] // Print; expectedData = {#, expectedValue[probabilities, #]} & /@ firstYears; meanData = Table[Module[{list = Module[{select = Select[#, #[[1]] == firstYears[[r]] &]}, If[select === {}, Null, Transpose[select][[2]]]]}, If[list === Null, 0, N@Mean[list]]], {r, 1, Length@firstYears}] & /@ allTypeTimeLines; meanPlotData = MapThread[{#1, #2} &, {firstYears, #}] & /@ meanData; ListLinePlot[{expectedData} ~ Join ~ meanPlotData, PlotStyle -> ({{Thick, Darker@Green, Dashed}} ~ Join ~ Table[{Thickness[0], Gray, Opacity[0.1]}, Length@meanPlotData]), ImageSize -> 788, Mesh -> None, InterpolationOrder -> 2, Frame -> True, FrameLabel -> {"\!\(\*StyleBox[\"#\",FontSlant->\"Italic\"]\) \ accidents in the First Year", "Expected number of accidents in the second year"}, AspectRatio -> 43 / 50, PlotRange -> {{0, 43}, {0, 50}}]; Export[StringReplace[NotebookFileName[], ".nb" -> ".csv"], {{"Type of Profile", "Probability of Profile"}} ~ Join ~ MapThread[{#1, #2} &, {Keys@probabilities, Values@probabilities}], OverwriteTarget -> True]; probPreData = RandomVariate[PoissonDistribution[#], 2] & /@ RandomChoice[typeProbabilities, 1000]; probPreDataGrouped = GroupBy[probPreData, #[[1]] &]; probPreDataGroupedKeys = Keys[probPreDataGrouped]; probPreDataGroupedKeys = Range[0, 40]; probPlotData1 = Module[{subData = probPreDataGrouped[#]}, KeySort[N[ Length /@ GroupBy[subData, #[[2]] &] / Length@subData]]] & /@ probPreDataGroupedKeys; probPlotData2 = Table[{#, theoProbabilities[probabilities, r, #]} & /@ Range[0, 60], {r, 0, 40}]; ListLinePlot[probPlotData2, PlotRange -> All, ImageSize -> 788, PlotStyle -> Thickness@0, Frame -> True, FrameLabel -> {"\!\(\*SubscriptBox[\(N\), \(2\)]\)", "Probability"}, PlotLegends -> ("\!\(\*SubscriptBox[\(N\), \(1\)]\) = " <> ToString@# & /@ Range[0, 40])] ]

End of the post 😉