The Coupon Collector Problem
From [1]:
Suppose each box of cereal contains
one of N different coupons and once a consumer has collected a coupon of each
type, he can trade them for a prize. The aim of the problem is determining the
average number of cereal boxes the consumer should buy to collect all coupon
types, assuming that each coupon type occurs with the same probability in the
cereal boxes.
If there 5 different coupons, how many boxes do I have to buy to get the
prize?
mc_sample_arg_first(coupons(5,T),1,T,[NT-_]).
If there are 5 different coupons, what is the distribution of the number of
boxes I have to buy to get the prize?
dist(5,1000,Chart).
If there are 5 different coupons, what is the expected number of
boxes I have to buy to get the prize?
mc_expectation(coupons(5,T),100,T,Exp).
Plot the dependency of the expected number of boxes from the number of coupons.
boxes_vs_N(12,1,Chart).
As indicated in [1], the number of boxes grows as \(O(N\log N)\) where \(N\) is the number of coupons. The graphs shows the accordance of the two curves.
The coupon collector problem is similar to the sticker collector problem,
where you
have an album with a space for every different sticker,
you can buy stickers in
packs and your objective is to complete the album.
A program for the coupon collector problem can be applied to solve the
sticker collector problem:
if you have \(N\) different stickers and packs contain \(P\) stickers, solve
the coupon collector problem for \(N\) coupons and get the number of boxes
\(B\). Then the number of packs you have to buy to complete the collection is
\(\lceil B/P \rceil\).
If there are 50 different stickers and pack contain 4 stickers, how many packs
do I have to buy to get the prize?
mc_sample_arg_first(stickers(50,4,T),1,T,[NP-_]).
Code
Preamble:
:- use_module(library(mcintyre)).
:- if(current_predicate(use_rendering/1)).
:- use_rendering(c3).
:- endif.
:- mc.
:- begin_lpad.
Given the number of coupons N, T is the number of boxes to get all coupons.
coupons(N,T):-
length(CP,N),
CPTerm=..[cp|CP],
new_coupon(N,CPTerm,0,N,T).
If 0 coupons remain to be collected, the collection ends.
new_coupon(0,_CP,T,_N,T).
If N0 coupons remain to be collected, collect one and set to 1
the cell of the CP array (stored as a term with arity N).
new_coupon(N0,CP,T0,N,T):-
N0>0,
collect(CP,N,T0,T1),
N1 is N0-1,
new_coupon(N1,CP,T1,N,T).
Collect one new coupon and update the number of boxes bought.
collect(CP,N,T0,T):-
pick_a_box(T0,N,I),
T1 is T0+1,
arg(I,CP,CPI),
(var(CPI)->
CPI=1,
T=T1
;
collect(CP,N,T1,T)
).
pick_a_box(_,N,I):uniform(I,L):-numlist(1, N, L).
:-end_lpad.
Given the number of stickers N and the size of a pack P, T is the number of
packs to get all stickers.
stickers(N,P,T):-
coupons(N,T0),
T is ceiling(T0/P).
Plot the distribution of the number of boxes to collect N coupons by taking
Samples samples.
dist(N,Samples,Chart):-
mc_sample_arg_first(coupons(N,T),Samples,T,L),
density(L,Chart,[nbins(10)]).
Plot the dependency of the number of boxes from the number of coupons.
boxes_vs_N(MaxN,Step,Chart):-
NSteps is ceil((MaxN-1)/Step),
findall(E,(
between(0,NSteps,N0),N is 1+N0*Step,mc_expectation(coupons(N,T),10,T,E)
),V),
findall(N,(between(0,NSteps,N0),N is 1+N0*Step),X),
findall(NlogN,(between(0,NSteps,N0),N is 1+N0*Step,NlogN is 1+1.2*N*log(N)),NLN),
Chart=c3{data:_{x:x, columns:[[x|X],['Expected number of boxes'|V],['1+1.2NlogN'|NLN]]}}.
## References
[1] Kaminski, B. L., Katoen, J.-P., Matheja, C., Olmedo, F., Weakest precondition reasoning for expected run-times of probabilistic programs. Programming languages and systems, LNCS, vol. 9632, pp. 364-389,
Springer, Heidelberg (2016)