# Google's Protocol Buffers {#protobufs-main} ## Overview {#protobufs-overview} Protocol Buffers ("protobufs") are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data -- think XML, but smaller, faster, and simpler. You define how you want your data to be structured once. This takes the form of a template that describes the data structure. You use this template to encode your data structure into wire-streams that may be sent-to or read-from your peers. The underlying wire stream is platform independent, lossless, and may be used to interwork with a variety of languages and systems regardless of word size or endianness. Techniques exist to safely extend your data structure without breaking deployed programs that are compiled against the "old" format. See https://developers.google.com/protocol-buffers The idea behind Google's Protocol Buffers is that you define your structured messages using a domain-specific language. This takes the form of a ".proto" source file. You pass this file through a Google provided tool that generates source code for a target language, creating an interpreter that can encode/decode your structured data. You then compile and build this interpreter into your application program. Depending on the platform, the underlying runtime support is provided by a Google supplied library that is also bound into your program. ## Processing protobufs with Prolog {#protobufs-processing-with-prolog} There are two ways you can use protobufs in Prolog: with a compiled ".proto" file and protobuf_parse_from_codes/3 and protobuf_serialize_to_codes/3; or with a lower-level interface protobuf_message/2, which allows you to define your own domain-specific language for parsing and serliazing protobufs. ## protoc {#protobufs-protoc} A protobuf ".proto" file can be processed by the protobuf compiler (=protoc=), using a Prolog specific plugin. You can do this by either adding =|/usr/lib/swi-prolog/library/protobufs|= to your =PATH= or by specifying the option =|--plugin=protoc-gen-swipl=/usr/lib/swi-prolog/library/protobufs/protoc-gen-swipl|=. You specify where the generated files go with the =|--swipl_out|= option, which must be an existing directory. When using =protoc=, it's important to specify the =|--protopath|= (or =|-I|=) and files correctly. The idea of =protopath= is that it gives a list of source "roots", and the files are specified relative to that. If you want to include the current directory, you must also specify it (e.g., =|protoc -I. swipl_out=. foo.proto|=). For example, when bootstrapping the "swipl" plugin, these are used: ~~~{.sh} protoc -I/usr/include --swipl_out=gen_pb google/include/descriptor.proto google/include/compiler/plugin.proto ~~~ which creates these files: ~~~ gen_pb/google/protobuf/descriptor_pb.pl gen_pb/google/protobuf/compiler/plugin_pb.pl ~~~ The =|plugin_pb|= is used by: ~~~{.pl} :- use_module(gen_pb/google/protobuf/compiler/plugin_pb) ~~~ which has this (import is relative to the current module): ~~~{.pl} :- use_module('../descriptor_pb'). ~~~ Each =X.proto= file generates a =X_pb.pl= file in the directory specified by =|--swipl_out|=. The file contains a module name =X=, some debugging information, and meta-data facts that go into the =protobufs= module (all the facts start with "=|proto_meta_|=") -- protobuf_parse_from_codes/3 uses these facts to parse the wire form of the message into a Prolog term and protobuf_serialize_to_codes/3 uses them to serialize the data to wire form. The generated code does not rely on any Google-supplied code. You must compile all the ".proto" files separately but you only need to load the top-level generated file -- it contains the necessary load directives for things that it uses. You can find out the dependencies for a .proto file by running =|PATH="$PATH:/usr/lib/swipl/library/protobufs" protoc -I... --dependency_out=FILE --swipl_out=. SRC.proto|= ### protobuf_serialize_to_codes/3 {#protobufs-serialize-to-codes} The Prolog term corresponding to a protobuf =message= is a [dict](), with the keys corresponding to the field names in the =message= (the dict tag is treated as a comment). Repeated fields are represented as lists; enums are looked up and converted to atoms; bools are represented by =false= and =true=; strings are represented by Prolog strings or atoms; bytes are represented by lists of codes. TODO: Add an option to omit default values (this is the =proto3= behavior). When serializing, the dict tag is treated as a comment and is ignored. So, you can use any dict tags when creating data for output. For example, both of these will generate the same output: ~~~{.pl} protobuf_serialize_to_codes(_{people:[_{id:1234,name:"John Doe"}]}, 'tutorial.AddressBook', WireCodes). protobuf_serialize_to_codes('tutorial.AddressBook'{people:['tutorial.Person'{name:"John Doe",id:1234}]}, 'tutorial.AddressBook', WireCodes). ~~~ NOTE: if the wire codes can't be parsed, protobuf_parse_from_codes/3 fails. One common cause is if you give an incorrect field name. Typically, this shows up in a call to protobufs:field_segment/3, when protobufs:proto_meta_field_name/4 fails. ### protobuf_parse_from_codes/3 {#protobufs-parse-from-codes} This is the inverse of protobuf_serialize_to_codes/3 -- it takes a wire stream (list of codes) and creates a [dict](). The dict tags are the fully qualified names of the messages. Repeated fields that aren't in the wire stream get set to the value =|[]|=; other fields that aren't in the wire stream get their default value (typically the empty string or zero, depending on type). Embedded messages and groups are omitted if not in the wire stream; you can test for their presence using get_dict/3. Enums are looked up and converted to atoms; bools are represented by =false= and =true=; strings are represented by Prolog strings (not atoms); bytes are represented by lists of codes. There is no mechanism for determining whether a field was in the wire stream or not (that is, there is no equivalent of the Python implementation's =|HasField|=). The "oneof" feature causes a slightly different behavior. Only the field that's in the wire stream gets set; the other fields are omitted. And if none of the fields in the "oneof" are set, then none of the fields appears. You can check which field is set by using get_dict/3. Currently, there is no special support for the protobuf "map" feature. It is treated as an ordinary message field. The convenience predicates protobuf_field_is_map/3 and protobuf_map_pairs/3 can be used to convert between a "map" field and a key-value list, which gives you the freedom to use any kind of association list for the map. See also [Issue #12](https://github.com/SWI-Prolog/contrib-protobufs/issues/12) For example: ~~~{.c} message MapMessage { map number_ints = 5; } ~~~ is treated as if it is ~~~{.c} message MapMessage { message KeyValue { optional string Key = 1; optional sint64 Value = 2; } repeated KeyValue number_ints = 5; } ~~~ You can handle this on input by ~~~{.pl} protobuf_parse_from_codes(WireCodes, 'MapMessage', Term), protobuf_map_pairs(Term.number_ints, _, Pairs). ~~~ and on output by ~~~{.pl} protobuf_map_pairs(TermNnumberInts, _, Pairs), protobuf_serialize_to_codes(_{number_ints:TermNumberInts}, WireCodes). ~~~ ### addressbook example {#protobufs-addressbook-example} The Google documentation has a tutorial example of a simple addressbook: https://developers.google.com/protocol-buffers/docs/tutorials The Prolog equivalent is in =|/usr/lib/swi-prolog/oc/packages/examples/protobufs/interop/addressbook.pl|= and you can run it by =|make run_addressbook|=, which will run =|protoc|= to generate the _pb.pl files and then run the example. The resulting file is =|addressbook.wire|=. ## The low-level SWI-Prolog Implementation {#protobufs-swipl} For most users, protobuf_serialize_to_codes/3 and protobuf_parse_from_codes/3 suffice. However, if you need greater control, or wish to define your own domain-specific language that maps to protobufs, you can use protobuf_message/2. The wire stream interpreter is embodied in the form of a Definite Clause Grammar (DCG). It has a small underlying C-support library that loads when the Prolog module loads. This implementation does not depend on any code that is provided by Google and thus is not bound by its license terms. On the Prolog side, you define your message template as a list of predefined Prolog terms that correspond to production rules in the DCG. The process is not unlike specifiying the format of a regular expression. To encode a message, =X=, to wire-stream, =Y=, you pass a grounded template, =X=, and a variable, =Y=, to protobuf_message/2. To decode a wire-stream, =Y=, to template, =X=, you pass an ungrounded template, =X=, along with a grounded wire-stream, =Y=, to protobuf_message/2. The interpreter will unify the unbound variables in the template with values decoded from the wire-stream. An example template is: ```prolog protobuf([ unsigned(1, 100), string(2, "abcd"), repeated(3, atom([foo, bar])), boolean(4, true), embedded(5, protobuf([integer(1, -666), string(2, "negative 666")])), repeated(6, embedded([ protobuf([integer(1, 1234), string(2, "onetwothreefour")]), protobuf([integer(1, 2222), string(2, "four twos")])])), repeated(7, integer([1,2,3,4])), packed(8, integer([5,6,7,8])) ]) ``` This corresponds to a message created with this .proto definition (using proto2 syntax): ``` syntax = "proto2"; package my.protobuf; message SomeMessage { optional int32 first = 1; // example template also works with int64, uint32, uint64 optional string second = 2; repeated string third = 3; optional bool fourth = 4; message NestedMessage { optional sint32 value = 1; optional string text = 2; } optional NestedMessage fifth = 5; repeated NestedMessage sixth = 6; repeated sint32 seventh = 7; repeated sint32 eighth = 8 [packed=true]; } ``` The wire format message can be displayed: ``` $ protoc --decode=my.protobuf.SomeMessage some_message.proto { my_message_sequence(Type, Value, Proto) }, protobufs:message_sequence(embedded, Tag, Proto), !. % % On encode, the value type determines the tag. And on decode % the tag to determines the value type. % guard(Type, Value) :- ( nonvar(Value) -> is_of_type(Type, Value); true ). my_message_sequence(kv_pair, Key=Value, Proto) :- Proto = protobuf([atom(30, Key), X]), ( ( guard(integer, Value), X = integer(31, Value) ) ; ( guard(float, Value), X = double(32, Value) ) ; ( guard(atom, Value), X = atom(33, Value)) ). my_message_sequence(xml_element, element(Name, Attributes, Contents), Proto) :- Proto = protobuf([ atom(21, Name), repeated(22, kv_pair(Attributes)), repeated(23, aux_xml_element(Contents))]). my_message_sequence(aux_xml_element, Contents, Proto) :- Contents = element(_Name, _Attributes, _ElementContents), Proto = protobuf([xml_element(40, Contents)]). my_message_sequence(aux_xml_element, Contents, Proto) :- Proto = protobuf([atom(43, Contents)]). xml_proto([element(space1, [foo='1', bar='2'], [fum, bar, element(space2, [fum=3.1415, bum= -14], ['more stuff for you']), element(space2b, [], [this, is, embedded, also]), to, you])]). test_xml(X, Y) :- Proto = protobuf([repeated(20, xml_element(X))]), protobuf_message(Proto, Y). % And test it: ?- xml_proto(X), test_xml(X,Y), test_xml(Z,Y), Z == X. X = Z, Z = [element(space1, [foo='1', bar='2'], [fum, bar, element(space2, [fum=3.1415, bum= -14], ['more stuff for you'] ), element(space2b, [], [this, is|...] ), to, you])], Y = [162, 1, 193, 1, 170, 1, 6, 115, 112|...], ``` A protobuf description that is compatible with the above wire stream follows: ``` message kv_pair { required string key = 30; optional sint64 int_value = 31; optional double float_value = 32; optional string atom_value = 33; } message aux_xml_element { optional string atom = 43; optional xml_element element = 40; } message xml_element { required string name = 21; repeated kv_pair attributes = 22; repeated aux_xml_element contents = 23; } message XMLFile { repeated xml_element elements = 20; } ``` Verify the wire stream using the protobuf compiler's decoder: ``` $ protoc --decode=XMLFile pb_vector.proto (0,0). :- op(950, xfy, ~>). ~>(P, Q) :- setup_call_cleanup(P, (true; fail), assertion(Q)). write_as_proto(Vector) :- vector(Vector, WireStream), open('tmp99.tmp', write, S, [encoding(octet),type(binary)]) ~> close(S), format(S, '~s', [WireStream]), !. testv1(V) :- read_file_to_codes('tmp99.tmp', Codes, [encoding(octet),type(binary)]), vector(V, Codes). ``` Run the Prolog side: ```prolog ?- X is pi, write_as_proto(double([-2.2212, -7.6675, X, 0, 1.77e-9, 2.54e222])). X = 3.14159. ?- testv1(Vector). Vector = double([-2.2212, -7.6675, 3.14159, 0.0, 1.77e-09, 2.54e+222]) ?- ``` Verify the wire stream using the protobuf compiler's decoder: ``` $ protoc --decode=Vector pb_vector.proto