The Coupon Collector Problem

From [1]:

Suppose each box of cereal contains one of N different coupons and once a consumer has collected a coupon of each type, he can trade them for a prize. The aim of the problem is determining the average number of cereal boxes the consumer should buy to collect all coupon types, assuming that each coupon type occurs with the same probability in the cereal boxes.

If there 5 different coupons, how many boxes do I have to buy to get the prize?

mc_sample_arg_first(coupons(5,T),1,T,[NT-_]).

If there are 5 different coupons, what is the distribution of the number of boxes I have to buy to get the prize?

dist(5,1000,Chart).

If there are 5 different coupons, what is the expected number of boxes I have to buy to get the prize?

mc_expectation(coupons(5,T),100,T,Exp).

Plot the dependency of the expected number of boxes from the number of coupons.

boxes_vs_N(12,1,Chart).

As indicated in [1], the number of boxes grows as \(O(N\log N)\) where \(N\) is the number of coupons. The graphs shows the accordance of the two curves.

The coupon collector problem is similar to the sticker collector problem, where you have an album with a space for every different sticker, you can buy stickers in packs and your objective is to complete the album.

A program for the coupon collector problem can be applied to solve the sticker collector problem: if you have \(N\) different stickers and packs contain \(P\) stickers, solve the coupon collector problem for \(N\) coupons and get the number of boxes \(B\). Then the number of packs you have to buy to complete the collection is \(\lceil B/P \rceil\).

If there are 50 different stickers and pack contain 4 stickers, how many packs do I have to buy to get the prize?

mc_sample_arg_first(stickers(50,4,T),1,T,[NP-_]).

Code

Preamble:

:- use_module(library(mcintyre)). :- if(current_predicate(use_rendering/1)). :- use_rendering(c3). :- endif. :- mc. :- begin_lpad.

Given the number of coupons N, T is the number of boxes to get all coupons.

coupons(N,T):- length(CP,N), CPTerm=..[cp|CP], new_coupon(N,CPTerm,0,N,T).

If 0 coupons remain to be collected, the collection ends.

new_coupon(0,_CP,T,_N,T).

If N0 coupons remain to be collected, collect one and set to 1 the cell of the CP array (stored as a term with arity N).

new_coupon(N0,CP,T0,N,T):- N0>0, collect(CP,N,T0,T1), N1 is N0-1, new_coupon(N1,CP,T1,N,T).

Collect one new coupon and update the number of boxes bought.

collect(CP,N,T0,T):- pick_a_box(T0,N,I), T1 is T0+1, arg(I,CP,CPI), (var(CPI)-> CPI=1, T=T1 ; collect(CP,N,T1,T) ). pick_a_box(_,N,I):uniform(I,L):-numlist(1, N, L). :-end_lpad.

Given the number of stickers N and the size of a pack P, T is the number of packs to get all stickers.

stickers(N,P,T):- coupons(N,T0), T is ceiling(T0/P).

Plot the distribution of the number of boxes to collect N coupons by taking Samples samples.

dist(N,Samples,Chart):- mc_sample_arg_first(coupons(N,T),Samples,T,L), density(L,Chart,[nbins(10)]).

Plot the dependency of the number of boxes from the number of coupons.

boxes_vs_N(MaxN,Step,Chart):- NSteps is ceil((MaxN-1)/Step), findall(E,( between(0,NSteps,N0),N is 1+N0*Step,mc_expectation(coupons(N,T),10,T,E) ),V), findall(N,(between(0,NSteps,N0),N is 1+N0*Step),X), findall(NlogN,(between(0,NSteps,N0),N is 1+N0*Step,NlogN is 1+1.2*N*log(N)),NLN), Chart=c3{data:_{x:x, columns:[[x|X],['Expected number of boxes'|V],['1+1.2NlogN'|NLN]]}}.

## References [1] Kaminski, B. L., Katoen, J.-P., Matheja, C., Olmedo, F., Weakest precondition reasoning for expected run-times of probabilistic programs. Programming languages and systems, LNCS, vol. 9632, pp. 364-389, Springer, Heidelberg (2016)