# Implementation of SIATEC pattern discovery By Samer Abdallah David Meredith's algorithm is described in [1]. This is a Prolog implementation. First, the groundwork - vec_point_point/4 is a tabled predicate that, for each point database (unary predicate) DB, contains every pair of points in the database and the vector between them. Only distinct pairs are stored, with the second being strictly later in the standard order of terms. Note that add/3 is defined using constraints, so that it can also be used to subtract as well, but only for integer values. We're also going to use SWISHs C3 term rendering to plot point sets so we'll set that up too. [1] David Meredith, Kjell Lemstr ̈om, and Geraint A Wiggins. Algorithms for discovering repeated patterns in multidimensional representations of polyphonic music. Journal of New Music Research, 31(4):321–345, 2002. http://www.tandfonline.com/doi/abs/10.1076/jnmr.31.4.321.14162
:- use_module(library(clpfd)). :- table vec_point_point/4. vec_point_point(DB,V,P1,P2) :- call(DB,P1), call(DB,P2), P2 @> P1, vec_add(V,P1,P2). vec_add(DR,P1,P2) :- maplist(add,DR,P1,P2). add(DX,X1,X2) :- X2 #= X1 + DX. member_of(Xs,X) :- member(X,Xs). % for literal database predicate.
vec_point_point(member_of([[1,1],[2,1],[2,2]]), V, P1, P2).
Next, mtp/3 computes the maximal translatable patterns (MTPs) for a point set. It's simply the set of start points for each distinct vector in vec_point_point/4.
%% mtp(+DB:pred(-point), -V:vector, -Ps:list(point)) is nondet. mtp(DB,V,Points) :- setof(P1, P2^vec_point_point(DB,V,P1,P2), Points).
mtp(member_of([[1,1],[2,1],[2,2],[3,2]]), Vec, Points).
Finally, siatec/2 computes the translational equivalence class for each distinct shape of MTP. The shape of an MTP, computed using pattern_shape/2, is represented as the list of vectors from the first point to each of the others, and relies on the fact that mtp/3 returns point lists in the standard order. Since vec_point_point/4 is asymmetric and only contains vectors in the right half-plane, we have to use it in both directions to get all the translationally equivalent patterns for each MTP.
%% siatec(-TECs:list(list(point))) is nondet. siatec(DB,Sets) :- distinct(H, (mtp(DB,_,P), pattern_shape(P,H))), setof(Q, (Q=P; trans(DB,P,Q); trans(DB,Q,P)), Sets). pattern_shape([P1|Ps],Vs) :- maplist(vec_diff(P1),Ps,Vs). vec_diff(P1,P2,DR) :- maplist(add,DR,P1,P2). trans(DB,S0,S1) :- maplist(vec_point_point(DB,_), S0, S1).
siatec(member_of([[1,1],[2,1],[2,2],[3,2],[1,3],[2,3]]), Patterns).
Now we can test siatec/2 for larger point-sets by using a random point set predicate that relies on a tabled Bernoulli generator to implement random world semantics. Note that the sample query analyses a new point set each time it is run only because the table for bernoulli/2 is implicitly abolished when submitting the query to the Prolog server. We'll also add some plotting code.
random_point(W,H,P,[X,Y]) :- between(1,W,X), between(1,H,Y), bernoulli(t(W,H,X,Y),P). :- table bernoulli/2. bernoulli(_,P) :- random(U), U<P. :- use_rendering(c3). patterns_plot(Type, Patterns, c3{data:_{ xs:XS, type:Type, columns:Cols}, size:_{width:400,height:300}}) :- foldl(pattern_label, Patterns, Names, 0, _), foldl(c3_add_points, Names, Patterns, _{}-Cols, XS-[]). pattern_label(Pattern, Label, I, J) :- succ(I,J), length(Pattern,NPoints), format(atom(Label), '[#~w:~d]', [J,NPoints]). c3_add_points(Name, Points, XS-[[NameX|Xs],[Name|Ys] | Cols], XS1-Cols) :- atom_concat(Name, '_x', NameX), XS1 = XS.put(Name,NameX), maplist(vec2d,Xs,Ys,Points). % It's annoying that these C3 plots are not very composable. % We can worry about that another day... scatter_plot(Name, XLabel-XData, YLabel-YData, Plot) :- XS = _{}.put(Name,x), Plot = c3{data:_{type:scatter, xs:XS, columns:[[x|XData],[Name|YData]]}, axis:_{x:_{label:XLabel, tick:_{fit:false}}, y:_{label:YLabel}}}. concat_lists(Lists, List) :- foldl(append,Lists,[],List). vec2d(X,Y,[X,Y]).
Let's apply siatec/2 to some random point sets, doing a separate plot of each TEC, with points grouped into individual patterns. (This query should already have produced the first 10 solutions on loading the page.)
siatec(random_point(10,5,0.2), TEC), patterns_plot(line,TEC, Plot).
Now we're going to look at the time complexity of this implementation as a function of the number of points in the point set. Meredith states that the overall complexity of his implementation is O(kN^3), where N is the number of points and k is the dimensionality of the points (2 in our case). We'll write a predicate to test the algorithm on a random point cloud in a space of a given size and density, returning the number of points in the cloud and the time (or cputime or inferences) need to compute all the TECs.
time_op(Stat, Op, Prob, H, W, NPoints, Time) :- abolish_all_tables, findall(P, random_point(W,H,Prob,P), Points), length(Points, NPoints), time_state(Stat, T0), run_op(Op,random_point(W,H,Prob)), time_state(Stat, T1), Time is T1 - T0. run_op(all_siatec, DB) :- findall(TEC, siatec(DB, TEC), _). run_op(all_mtp, DB) :- findall(Vec, mtp(DB, Vec, _), _). time_state(time,A) :- !, get_time(A). time_state(S,A) :- statistics(S,A). log(X,Y) :- Y is log(X).
Finally, we run the tests on random point clounds of fixed density and height, but a range of widths, plotting the results logarithmically on both axes, along with a guide line showing the slope of the expected cubic performance. This takes a while to run.
projection([Plot]), numlist(1,8,_I), maplist([X,Y]>>(Y is 5*X), _I, _Widths), maplist(time_op(cputime, all_siatec, 0.25, 12), _Widths, NPs, Times), maplist(log, Times, _LogTimes), maplist(log, NPs, _LogNPs), maplist([X,Y]>>(Y is 3*log(X)-12), NPs, _LogCubicTimes), Plot = c3{data:_{types:_{time:scatter, cubic:line}, xs:_{time:points, cubic:points}, columns:[[points|_LogNPs],[time|_LogTimes], [cubic|_LogCubicTimes]]}, axis:_{x:_{label:"log N", tick:_{fit:false}}, y:_{label:"log cputime"}}}.
Looks pretty cubic to me! The cubic runtime is supposed to be the result of the last step of SIATEC, which is to find all the translations of a given shape. Let's check that the complexity of mtp/3 is the expected O(k N^2 log N). We'll plot the run time to find all the MTPs divided by the N^2 log N, where N is the number of points.
projection([Plot]), numlist(2,50,_I), maplist([X,Y]>>(Y is 2*X), _I, _Widths), maplist(time_op(cputime, all_mtp, 0.25, 12), _Widths, NPs, Times), maplist([N,T,Y]>>(Y is T/(log(N)*N^2)), NPs, Times, _Ys), scatter_plot('random_point({4..100},12,0.25)', "N"-NPs, "cputime / (N^2 log N)"-_Ys, Plot).
Well shiver-me-timbers, it looks like we're doing better than N^2 log(N). If you delete the factor of log(N) in the mapping from Times to _Ys, you'll see the complexity is more like just N^2. How can this be? I don't know. Perhaps I will look into it another day. For a final test, let's look at how the number of MTPs goes up in relation to the number of points.
width_npoints_nmtps(W, NPoints, NMTPs) :- aggregate_all(count, random_point(W,12,0.25,_), NPoints), aggregate_all(count, mtp(random_point(W,12,0.25), _, _), NMTPs).
projection([Plot]), numlist(1,30,_I), maplist([X,Y]>>(Y is 3*X), _I, _Widths), maplist(width_npoints_nmtps, _Widths, NPs, NMTPs), scatter_plot('random_point({3..90},12,0.25)', "number of points"-NPs, "number of MTPs"-NMTPs, Plot).
Ok, so it looks vaguely linear, so we can conclude that the part of SIATEC which involves finding all the translationally equivalent patterns of a given MTP is approximately quadratic in the number of points. ### A note on performance Although the peformance of this implementation is of the same O-order as Meredith's implementation, the constant factors are not! SWI Prolog is not known for speed, and in this case, siatec/2 is about 60 times slower than Meredith's reported results, even on a much newer and faster computer - the original was written in C and ran on a 500MHz Sparc machine. Other Prologs are known to be faster than SWI: e.g. YAP was tested and found to be about 3 times faster than SWI on this program. When I tried B-Prolog it pulled a blinder and appeared to be about 60 to 180 times faster than SWI (I say 'appeared' because it also crashed a lot). It analysed 1000 points in 31s! This makes it about 3 times faster than Meredith's tests, though of course my computer (2012 Macbook Pro) is probably about 10 times faster than the one he used in 2002! Still, the Prolog version weighs in at about 12 lines of code, as compared with about 110 lines of pseudocode in the original paper and Lord knows how much C code, so I think something has been gained.