Sheldon Ross 10: Example 3.32 (Automobile Insurance)

Question: An automobile insurance company classifies each of its policyholders as being of one of the types i = 1, . . . , k. It supposes that the numbers of accidents that a type i policyholder has in successive years are independent Poisson random variables with mean λi, i = 1, . . . , k. The probability that a newly insured policyholder is type i is pi, such that Σi pi = 1.

  1. Given that a policyholder had n accidents in her first year, what is the expected number that she has in her second year?
  2. What is the conditional probability that she has m accidents in her second year?

Analytical Solution Part 01: There are many subtle elements to this problem, so we would break it down slowly

  1. For writing simplicity, we will designate the following notation
    • Probability of Poisson of rate λ and the events being k is written shortly as
    • ψ ( λ, k )
  2. We know that the policyholder has n accidents in the first year. We also know that the number of accidents in any given year is governed by Poisson Distribution of a rate λi
    • and λi being the rate is probabilistically determined by pi
  3. Firstly we will start off by symbolically written as
    • P = E [N| N= n]
      • where Nis the number of events in the the ith year
    • Recall from the statement above that the policyholder could be of any of the type T, determined by the probability pi
    • Conditioning on the type T, we get
    • E = E [N| T = j, N= n] P{ T = j | N= n }
      • Even though the overall events are defined by a Poisson Distribution, the events themselves are independent which means
        • E[N1] = E[N2] =  E[Nj] = λj
        • This understanding is vital for the simplification which is illustrated in the next step
    • This makes P = λP{ T = j | N= n }
    • Calculation of P{ T = j | N= n }: This is sort of a tricky calculation so we will break this down further
      • Since this is also the conditional probability, an expansion gives us
      • P{ T = j | N1 = n } = P{T=j, N1=n}P{N1=n}
      • Let us compute the numerator and denominator separately
        • Numerator: P{T=j, N1=n} is also, P{ N= n, T = j } further becomes, P{ N= n | T = j } P { T = j }
          • ψ ( λj, n ) pj
        • Denominator: For the calculation of this, P{ N=n } we can use more conditioning as following
          • ⇒ P{ N=n } = Σj P{ N=n | T = j } P{ T = j }
          • ⇒ P{ N=n } = Σj P{ N= n | T = j } P{ T = j }
          • ⇒ P{ N=n } = Σj P{ N= n | T = j } pj
          • ⇒ P{ N=n } = Σψ ( λj, n ) pj
  4. Putting everything together the expected value is Σψ ( λj, n ) λj pj Σψ ( λj, n ) pj

Analytical Solution Part 02:

  1. We can use a similar approach to the Part 01 to determine the probability
  2. Firstly the probability can be written and then we can condition on the type
    • P { N= m | N= n } = Σ= P { N= m | T = j, N= n } P { N= n | T = j }
      • The first part of the above can be simplified by noting the independence Nand N2
        • this essentially means that the first part of the probability product is ψ ( λj, m )
      • For the second part, we can use the result from Part 01 which has ..
        • a numerator of ψ ( λj, n ) pj
        • and a denominator of  Σψ ( λj, n ) pj
  3. Putting everything together, we get P { N= m | N= n } = Σj ψ ( λj, m ) ψ ( λj, n ) pj Σψ ( λj, n ) pj

Algorithm: The methodology-algorithm has for the simulation has been outlined below

  • Decide on the number of types of profiles of the policyholders. Let us call this number k
  • Generate k random real numbers
    • ‘Unitize’ the list of the real number
    • By unitize here, I mean make the sum of the list to be equal to one.
    • After unitization, you can the list as probabilities
  • Generate k unique rates
    • these would correspond to having k distinct profiles
  • After generating two lists of equal length k the rate can be associated with the probabilities
  • This association can be used to randomly generate a long list of rates with their odds being determined by the probability associated with it
  • Then pass these rates into a Poisson Generator to get the number of incidents for the first year of the policy and the second year of the policy
  • Use these lists for probability and calculations and plots
    • compare results from the simulation with the calculated ones

Simulations:

All the profiles: The types of possible profile have been pre-generated and shown below. The type of profile means that the number accidents per year for that profile is determined by the corresponding value of λj

Type of Profile (also λj) Probability of Profile
2 0.036757591
3 0.051059596
4 0.088901919
7 0.033155605
11 0.013171358
13 0.092618411
16 0.001453449
17 0.049638466
18 0.047169129
21 0.057450608
23 0.070877438
25 0.077041618
27 0.061598955
28 0.062665608
29 0.052379805
30 0.093841463
32 0.028773177
35 0.025502251
36 0.006873262
38 0.049070292

 

 

Expected values: The expected number of accidents in the second year has been derived with a conditional on the type of profile and the number accidents in the first year. The simulation of the same is shown in the plot below.

  1. The green dashed line is the expected probability
  2. The squiggly lines are the results from 200 different simulations (or put another way, 200 policyholders) with
    • each line containing a sample size of a 1000 people
    • this set was split into various groups based on the number of accidents in the first year
    • the mean of the each of the groups’ second year accidents have been calculated

Probabilities: This is a three dimensional in nature because of the following reasons

  • there are many possible number of accidents during the first year (N1)
  • then for each of possibility, there are many possible types of accidents in the second year (N2)
  • Which is why I plot several parameters in


Code

AppendTo[$Path, "Add_your_path_here"];
Needs["utilities`"];
Needs["statistics`"];

ClearAll[expectedValue, theoProbabilities];
expectedValue[probabilities_Association, n_] :=
    Module[{rates = Keys@probabilities, odds = Values@probabilities},
      Plus @@ (rates * odds * (PDF[PoissonDistribution[#], n] & /@ rates)) /
          Plus @@ (odds * (PDF[PoissonDistribution[#], n] & /@ rates))]
theoProbabilities[probabilities_Association, n_, m_] :=
    Module[{rates = Keys@probabilities, odds = Values@probabilities},
      Plus @@ (odds * (PDF[PoissonDistribution[#], n] & /@
          rates) * (PDF[PoissonDistribution[#], m] & /@ rates)) /
          Plus @@ (odds * (PDF[PoissonDistribution[#], n] & /@ rates))]

Module[{types = 20, probabilities, typeProbabilities,
  allTypeTimeLines, someTypeTimeLines, restTypeTimeLines,
  firstYears = Range[0, 42], meanData, meanPlotData, expectedData,
  nPolicyholders = 200, probPreData, probPreDataGrouped,
  probPreDataGroupedKeys, singlePlotData, probPlotData1,
  probPlotData2},


  BlockRandom[
    probabilities =
        Association@
            MapThread[#1 -> #2 &, {RandomSample[Range[1, 2 * types], types],
              RandomReal[{0, 1}, types]}];
    probabilities = statistics`unitize[probabilities];
    typeProbabilities = Values@probabilities -> Keys@probabilities;,
    RandomSeeding -> 1];

  allTypeTimeLines =
      Table[RandomVariate[PoissonDistribution[#], 2] & /@
          RandomChoice[typeProbabilities, 1000], nPolicyholders];


  ListPlot[Sort[probabilities], Frame -> True,
    FrameLabel -> {"Profile Type", "Probability"},
    GridLines -> {Range@40, Range[0, 0.2, 0.005]}, ImageSize -> 788] //
      Print;
  expectedData = {#, expectedValue[probabilities, #]} & /@ firstYears;
  meanData =
      Table[Module[{list =
          Module[{select = Select[#, #[[1]] == firstYears[[r]] &]},
            If[select === {}, Null, Transpose[select][[2]]]]},
        If[list === Null, 0, N@Mean[list]]], {r, 1,
        Length@firstYears}] & /@ allTypeTimeLines;
  meanPlotData = MapThread[{#1, #2} &, {firstYears, #}] & /@ meanData;
  ListLinePlot[{expectedData} ~ Join ~ meanPlotData,
    PlotStyle -> ({{Thick, Darker@Green, Dashed}} ~ Join ~
        Table[{Thickness[0], Gray, Opacity[0.1]}, Length@meanPlotData]),
    ImageSize -> 788, Mesh -> None, InterpolationOrder -> 2,
    Frame -> True,
    FrameLabel -> {"\!\(\*StyleBox[\"#\",FontSlant->\"Italic\"]\) \
accidents in the First Year",
      "Expected number of accidents in the second year"},
    AspectRatio -> 43 / 50, PlotRange -> {{0, 43}, {0, 50}}];
  Export[StringReplace[NotebookFileName[],
    ".nb" -> ".csv"], {{"Type of Profile", "Probability of Profile"}} ~
      Join ~ MapThread[{#1, #2} &, {Keys@probabilities,
    Values@probabilities}], OverwriteTarget -> True];

  probPreData =
      RandomVariate[PoissonDistribution[#], 2] & /@
          RandomChoice[typeProbabilities, 1000];
  probPreDataGrouped = GroupBy[probPreData, #[[1]] &];
  probPreDataGroupedKeys = Keys[probPreDataGrouped];
  probPreDataGroupedKeys = Range[0, 40];

  probPlotData1 =
      Module[{subData = probPreDataGrouped[#]},
        KeySort[N[
          Length /@ GroupBy[subData, #[[2]] &] / Length@subData]]] & /@
          probPreDataGroupedKeys;
  probPlotData2 =
      Table[{#, theoProbabilities[probabilities, r, #]} & /@
          Range[0, 60], {r, 0, 40}];

  ListLinePlot[probPlotData2, PlotRange -> All, ImageSize -> 788,
    PlotStyle -> Thickness@0, Frame -> True,
    FrameLabel -> {"\!\(\*SubscriptBox[\(N\), \(2\)]\)", "Probability"},
    PlotLegends -> ("\!\(\*SubscriptBox[\(N\), \(1\)]\) = " <>
        ToString@# & /@ Range[0, 40])]

]

End of the post 😉