The birthday problem is the surprising fact that in a group of just 23 people, there’s a 50-50 chance that two of them share the same birthday, which is way more likely than expected by most.
What is the birthday problem?
In probability theory, the birthday problem asks for the probability that, in a set of n randomly chosen people, at least two will share a birthday. The birthday paradox is that, counterintuitively, the probability of a shared birthday exceeds 50% in a group of only 23 people. (Source)
Solving the Birthday Problem
To answer what the probability of at least two birthdays falling together is, I simulate 5,000 trials from 1 to 100 people and overlay the empirically obtained results with the closed-form mathematical solution in a plot. The results show that 5,000 simulations for any number of people is sufficient to obtain a fairly stable result, which nicely matches the closed form solution.
Code
# paramssimulations =5000players =c(2:100)resmat =matrix(ncol =3, nrow =length(players))resmat[,1] = playersresmat[1,3] =365/365*364/365for (i in2:length(players)){ resmat[i,3] = resmat[i-1,3] * (365-i)/365}resmat[,3] =1- resmat[,3]#simulationfor (v in players){ valmat <-matrix(ncol = simulations, nrow = v) occvec <-vector(mode ="logical", length = simulations)for (i in1:simulations){ valmat[,i] =sample(x =1:365, size = v, replace =TRUE) occvec[i] =sum(duplicated(valmat[,i])) >0 } resmat[v-1,2] =sum(occvec)/simulations}#convert to data frame for ggplotresmat <- resmat %>%as.data.frame()#confidence levelsfifty_percent_confidence =min(which(resmat$V3 >0.5) +1)ninetynine_percent_confidence =min(which(resmat$V3 >0.99) +1)
Code
ggplot(resmat, aes(x = V1)) +geom_line(aes(y = V2, colour ="Simulation"), lty ="solid", linewidth =1) +geom_line(aes(y = V3, colour ="Closed Form"), lty ="dashed", linewidth =0.5) +labs(title ="A Birthday Problem",subtitle ="How many people would you put into a room until you are 99% \ncertain that at least two people share the same birthday? ",caption =paste(simulations, "simulations for every number 'n' people")) +guides(linetype ="none") +labs(x ="People in a Room",y ="Prob. of at least 2 birthdays falling together",colour =NULL) +geom_segment(aes(x = fifty_percent_confidence,y =0,xend = fifty_percent_confidence,yend = resmat[fifty_percent_confidence-1,3]),lty =3, colour ="black") +geom_segment(aes(x = ninetynine_percent_confidence,y =0,xend = ninetynine_percent_confidence,yend = resmat[ninetynine_percent_confidence-1,3]),lty =3, colour ="black") +annotate(geom ="text", size =3, x =27, y =0.25,label ="50%", fontface ="italic") +annotate(geom ="text", size =3, x =61, y =0.49,label ="99%", fontface ="italic") +scale_colour_manual(values =c("black", "dodgerblue")) +scale_x_continuous(limits =c(1, 100), breaks =c(seq(0, 100, by=50), 23, 57)) +scale_y_continuous(labels = scales::percent_format()) +theme_bw() +theme(text=element_text(size =11, color ="black"),panel.grid.major.x =element_blank(),panel.grid.minor.x =element_blank(),panel.grid.minor.y =element_blank(),plot.title=element_text(size=14, face ="bold", color="black"),plot.subtitle=element_text(size=12, face="italic", color="grey50"),plot.caption=element_text(size=6, face="italic", color="grey50"))