Published:
Version
0.1.0

RDF/CBOR: A CBOR-based Serialization of RDF

Abstract

This specification defines a binary serialization of RDF based on CBOR. Like CBOR the serialization is optimized for small code size and fairly small message size. The serialization is suitable for systems and devices that are possibly constrained in terms of network or computation.

By re-using the existing and well-defined CBOR data model, RDF terms can be efficiently encoded. In particular, mappings of common RDF literal datatypes into a binary CBOR representation are defined. Furthermore we use compression techniques such as Incremental Encoding and Bitmap Triples to make the serialization compact.

The serialization is defined using the Concise Data Definition Language (CDDL). This allows a precise and concise definition, enabling wide implementation and usage.

We also describe how the serialization can be used to make RDF content-addressable. A group of RDF statements can then be addressed by a unique identifier determined exactly by the contents of the statements. This allows RDF data to be made available more robustly and enables the use of RDF in decentralized systems.

Table of Contents

1. Introduction

The Resource Description Framework (RDF) [RDF] is a data model for structured content. The data is modeled as graph with nodes and edges labeled with unique identifiers that are at the same time references to further data (Internationalized Resource Identifier [RFC3987]).

Together with foundational principles such as the open-world assumption (no single agent has complete knowledge) and the unique name assumption (the same thing has the same name regardless of context) this graph structure makes RDF well-suited for decentralized systems.

Nevertheless, RDF and the Semantic Web (the vision behind RDF) have failed to provide a robust and decentralized foundation. The reasons for this has been argued to include low availability of content and challenges in scaling systems querying large data sets [Polleres20].

We would like to take the counterpoint and argue that the reason for why RDF has failed to provide a robust and decentralized foundation is not the challenge in scaling to large systems, but to small systems. RDF/CBOR is an attempt to make RDF usable by decentralized systems that exchange small pieces of content that are aggregated locally. An example of such a system is the network of servers speaking the ActivityPub protocol [ActivityPub].

We acknowledge the diversity of actors and devices that generate and consume content and emphasize the necessity for a well-defined encoding as well as an encoding that can be re-implemented.

The Concise Binary Object Representation (CBOR) [RFC8949] is a binary data serialization that provides basic data types (string, integer, arrays, etc.) as well as extendable tags for annotating more complex data types. By using CBOR we can re-use the already defined data types and tags. Implementations of CBOR exist for a wide range of languages and platforms and can be re-used to implement RDF/CBOR. Furthermore, we use the Concise Data Definition Language (CDDL) [RFC8610] which allows a concise and unambiguous description of CBOR data structures used in RDF/CBOR.

Finally, RDF/CBOR allows content-addressing. A group of RDF statements can be identified by a unique identifier that is determined by the content of the statements itself. This enables caching and duplicating to make content available more robustly.

1.1. Objectives

The objectives and requirements of RDF/CBOR are (roughly in order):

The specified serialization takes much inspiration from the HDT serialization [HDT]. HDT is a binary serialization of RDF optimized for large data sets and allowing in-place queries without loading the entire content. RDF/CBOR uses the same encoding of RDF terms into a dictionary (see Section 3.1) and triples (see Section 3.2). RDF/CBOR does not have a headers section for meta-data. Unlike HDT, RDF/CBOR uses variable length encoding of data items (by using CBOR). This can make binary representation more compact, but prevents random-access queries and in-place queries.

RDF/CBOR can be used for stream processing of RDF triples. When encoding large data sets, triples are packed into smaller groups that can be decoded independently. This is inspired by the ERI serialization [ERI]. Unlike ERI we don't provide a generic and abstract model, but a concrete encoding. The term molecule is adopted from ERI.

RDF/CBOR is an improvement to previous attempts of using CBOR for RDF [Sahlmann2018]. We attempt to combine the compression techniques from HDT with the built-in datatypes provided by CBOR.

CBOR-LD is a CBOR based serialization for Linked Data. CBOR-LD is based on the JSON-LD serialization [JSON-LD] and requires the JSON-LD processing algorithms. This limits usability of the serialization on constrained devices. Furthermore, CBOR-LD uses JSON-LD context, user defined and possibly remote schemas, that are required when decoding to RDF. This makes CBOR-LD unsuitable for systems with limited connectivity.

Previous work on signing RDF data include [Tummarello05] and [TrustyURIs]. They both use existing serializations (N-triples) that are not designed to have canonical representations. RDF/CBOR provides a canonical representation, making the content-addressing and signing scheme more robust. A W3C working group has recently been established to develop a canonical representation of RDF (RDF Dataset Canonicalization and Hash Working Group Charter).

This work is based on a previous paper titled Content-addressable RDF.

1.3. Overview

In section Section 2 we present an encoding of RDF terms (IRIs, literals and blank nodes) to CBOR. This uses existing CBOR datatypes and tags, allowing fairly compact and straight-forward encoding.

The core idea of RDF/CBOR is to split large sets of RDF triples into smaller groups and to encode such groups individually. The smaller groups are called molecules and are encoded using some compression tricks. Section 3 describes the encoding of molecules.

In Section 4 we describe how the encoding of molecules can be used to make RDF content-addressable. This requires defining a suitable grouping of triples as well as a canonical serialization based on the serialization of molecules.

Section 5 describes how multiple molecules can be combined to a stream. This is a relatively straight-forward construction but one that permits future optimizations.

We conclude with some final remarks and an outlook what could be possible in Section 7.

Examples of encoded content are provided in Appendix A.

1.4. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

The encoding is defined using the Concise Data Definition Language (CDDL) [RFC8610]. A basic understanding of CBOR and CDDL is required to read this document.

2. CBOR encoding of RDF Terms

An RDF triple consists of three components: subject, predicate and object. The components are IRIs [RFC3987], literals or blank nodes. Collectively IRIs, literals and blank nodes are called RDF terms. In this section we present an encoding of individual terms to CBOR.

The encoding of RDF terms is defined with the CDDL rules iri, literal and blank-node:

rdf-term = iri / literal / blank-node
Figure 1: Encoding of RDF Terms

The rules iri, literal and blank-node are described in the following sections.

2.1. IRIs

In general IRIs [RFC3987] can be encoded as CBOR text strings. Some IRIs (in particular URNs [RFC2141]) represent binary identifiers. For two kinds of such binary URNs we define specialized encodings that map the URNs directly to a binary CBOR representation (for UUID and ERIS URNs). This allows such binary URNs to be encoded much more efficiently.

The CDDL rule for encoding IRIs is:

iri = generic-iri / binary-urn

generic-iri = #6.266(tstr)
Figure 2: Encoding of IRIs

For generic IRIs we use the tag 266 [IRI_CBOR].

2.1.1. Binary URNs

We define CBOR encodings for UUID URNs [RFC4122] and ERIS URNs [ERIS] using the tags 37 [UUID_CBOR] and 276 respectively.

As the CBOR encodings can only encode URNs without fragment parts we introduce the fragment constructor tag 305. This can be used to construct binary URNs with fragment parts.

binary-urn = uuid-urn / eris-urn / fragment-binary-urn
uuid-urn = #6.37(bstr)
eris-urn = #6.276(bstr)

fragment-binary-urn =
    #6.305([uuid-urn, fragment-identifier])
  / #6.305([eris-urn, fragment-identifier])

fragment-identifier = tstr
Figure 3: Encoding of Binary URNs

2.2. Literals

literal = lang-string
/ xsd-string
/ xsd-boolean
/ xsd-integer
/ xsd-float
/ xsd-double
/ xsd-datetime
/ xsd-hexbinary
/ xsd-base64binary
/ generic-literal
Figure 4: Encoding of RDF literals

2.2.1. Language-tagged strings

A language-tagged string literal with datatype http://www.w3.org/1999/02/22-rdf-syntax-ns#langString is encoded using CBOR tag 38 [draft-ietf-core-problem-details-08]:

lang-string = #6.38([lang-tag, tstr])
lang-tag = tstr
Figure 5: Encoding of language-tagged strings

Note that the CBOR tag 38 allows a third element to indicate text direction. We do not use such a third element as this can not be mapped to RDF language-tagged strings.

2.2.2. XSD Datatypes

2.2.2.1. xsd:string

A literal with datatype xsd:string is encoded as a CBOR text string:

xsd-string = tstr
Figure 6: Encoding of xsd:string
2.2.2.2. xsd:boolean

A literal with datatype xsd:boolean is encoded as a CBOR boolean:

xsd-boolean = bool
Figure 7: Encoding of xsd:string
2.2.2.3. xsd:integer

A literal with datatype xsd:integer is encoded as a CBOR integer or as a Bignum if the integer is larger than what can be expressed in 64 bits (see section 3.4.3 of [RFC8949]):

xsd-integer = int / bignum
bignum = #6.2(bstr) / #6.3(bstr)
Figure 8: Encoding of xsd:string

Serialization of bignums MUST leave out any leading zeroes.

2.2.2.4. xsd:float

A literal with datatype xsd:float is encoded as a CBOR single-precision float:

xsd-float = float32
Figure 9: Encoding of xsd:string
2.2.2.5. xsd:double

A literal with datatype xsd:double is encoded as a CBOR double-precision float:

xsd-double= float64
Figure 10: Encoding of xsd:string
2.2.2.6. xsd:dateTime

A literal with datatype xsd:dateTime is encoded using CBOR tag 0 followed by a text string in the standard format described by the date-time production in [RFC3339]:

xsd-datetime = #6.0(tstr)
Figure 11: Encoding of xsd:string

See also section 3.4.1 of [RFC8949].

2.2.2.7. xsd:hexBinary

A literal with datatype xsd:hexBinary is encoded using CBOR tag 23 followed by a binary string:

xsd-hexbinary = #6.23(bstr)
Figure 12: Encoding of xsd:string

See also section 3.4.5.2 of [RFC8949].

2.2.2.8. xsd:base64Binary

A literal with datatype xsd:base64Binary is encoded as a CBOR binary string:

xsd-base64binary = bstr
Figure 13: Encoding of xsd:string

Note that we do not use the tag 22 as defined in section 3.4.5.2 of [RFC8949] to explicitly mark conversion to Base64. Instead we by default assume that binary strings correspond to Base64 encoded content. This makes the encoding more efficient.

2.2.3. Generic Literal

For RDF literals with other literal types we define the CBOR tag 303. The content of the tag is an CBOR array with exactly two items: The dataype IRI and the lexical form of the literal.

generic-literal = #6.303([literal-datatype, literal-lexical-form])
literal-datatype = iri
literal-lexical-form = tstr
Figure 14: Encoding of a generic literal

2.3. Blank Nodes

The usage of blank nodes is discouraged. For legacy reasons an encoding is provided by defining the CBOR tag 304. The content of the tag is the blank node identifier as text string:

blank-node = #6.304(blank-node-identifier)
blank-node-identifier = tstr
Figure 15: Encoding of a blank node

3. RDF/CBOR Molecule

A RDF molecule is a group of RDF triples that are encoded together. Triples in a molecule use the same dictionary. Encoding a molecule requires all triples of the molecule to be available. When decoding the entire molecule must be kept in memory to decode the RDF triples of the molecule.

Molecules allow large RDF datasets to be split into smaller groupings, enabling usage on constrained devices and allowing stream-processing of RDF data. RDF/CBOR molecules correspond exactly to the molecules defined in the ERI serialization [ERI].

Small molecules require less memory and processing capacity to encode and decode, whereas larger molecules allow more compact encodings. The most basic molecule is a single triple. More natural groupings are triples that share a common subject (subject-molecule as defined in [ERI]) or fragment-molecules (see Section 4.1). Users, libraries and applications MAY use any molecule grouping. It is RECOMMENDED to use fragment-molecules.

An encoded molecule consists of two sections:

  1. Dictionary: Encoding of all the terms that appear in the molecule. See Section 3.1.
  2. Triples: Encoding of the triples appearing in the molecule. Terms are represented with dictionary identifiers. See Section 3.2.

Concretely, a molecule is encoded as a CBOR array with 5 items:

molecule = [
    dictionary: dictionary,
    predicate-bitmap : uint / #6.2(bstr),
    predicates : [ * dictionary-reference ],
    object-bitmap : uint / #6.2(bstr),
    objects : [ * dictionary-reference ]
]

tagged-molecule = #6.301(molecule)
Figure 16: Encoding of RDF/CBOR Molecule

The items predicate-bitmap, predicates, object-bitmap and objects encode the triples of the molecule as bitmap triples. The meaning of the individual values is explained in Section 3.2 .

A molecule MAY be tagged with the CBOR tag 301.

3.1. Dictionary of RDF Terms

The collection of RDF terms appearing in a molecule is called the vocabulary. A dictionary is a structure that encodes the vocabulary efficiently and assigns every term an integer identifier. The integer identifiers can then be used when encoding triples.

This allows more compact representation of the molecule by allowing the triple encoding to just use the integer identifiers of terms. Terms that appear multiple times in the molecule are only encoded once. Furthermore, we can compress the dictionary. This is a simple, effective and widely-used optimization for encoding RDF data [RDF-Dict].

When encoding a dictionary, the vocabulary is provided as a sorted sequence of terms with following order:

  1. Terms that appear in subject position.
  2. Terms that appear in predicate or object position (but not in subject position).

Within the two groups terms are ordered as follows:

  1. IRIs
  2. Literals
  3. Blank nodes

Within the term types we use lexicographical order.

A dictionary is encoded as CBOR array of the terms. Terms are encoded as described in Section 2. If the preceding term in the CBOR array encodes an IRI and the current term is also an IRI that shares a prefix that is longer than 9 characters, then instead of encoding the current IRI in full, we encode the length of the shared prefix along with the suffix of the current term.

For example if the vocabulary contains the terms:

  • https://www.w3.org/ns/activitystreams#Create
  • https://www.w3.org/ns/activitystreams#actor
  • https://www.w3.org/ns/activitystreams#object

We encode the two following terms more compactly by indicating that a prefix is shared (prefix has length 38):

[["https://www.w3.org/ns/activitystreams#Create",
 [38, "actor"]
 [38, "object"]]

This compression method is called Incremental Encoding [Witten99] and is also used in the HDT serialization [HDT]. It is very effective when encoding IRIs appearing in RDF as shared prefixes are very common.

In CDDL the encoding of a dictionary is defined as:

dictionary = [ * dictionary-entry ]

dictionary-entry = rdf-term / compressed-iris

compressed-iris = (
    base : iri / compressed-iri,
    + compressed-iri : [ common-prefix-length: int, suffix: tstr ]
)
Figure 17: Encoding of RDF Term Dictionary

3.1.1. Referencing Dictionary Terms

References to terms appearing in the dictionary are simply the integer index of the term as appearing in the dictionary:

dictionary-reference = uint
Figure 18: Encoding of Dictionary Reference

Note that this requires dictionary references to only be used in contexts where there is no confusion between literals with datatype xsd:integer (see Section 2.2.2.3). This is the case for our encoding of triples and is more efficient than using the explicit references as proposed by Packed CBOR [draft-ietf-cbor-packed-07].

3.2. Bitmap Triples

In this section we describe the encoding of triples in a molecule.

Triples are assumed to be sorted according to lexicographical order using the same rules as the dictionary (see previous section).

As a first step, triples can be represented as a list of integer triples where the integers are references to dictionary terms:

[[0, 1, 5],
 [0, 2, 1],
 [1, 0, 2],
 [1, 1, 3],
 [1, 1, 4],
 [1, 3, 0],
 [2, 2, 1]]

We represent the triples as a list of subjects and lists of grouped predicates and objects:

[0, 1, 2] / subjects
[[1, 2], [0, 1, 3], [2]] // predicates
[[5], [1], [3, 4], [0], [1]] // objects

Every predicate group corresponds to a subject and every object group corresponds to a predicate.

Because we used the same ordering for triples as for terms in the dictionary, the subject list is redundant and can be omitted in the encoding.

The resulting encoding is called compact triples:

[[1, 2], [0, 1, 3], [2]] // predicates
[[5], [1], [3, 4], [0], [1]] // objects

We can improve further by encoding the groupings of predicates and objects with a bitmap. We collapse the list of predicates and objects to a simple list, but remember the last element of every group by setting a bit in a bitmap at the corresponding position:

predicate-bitmap: 0b010011
predicates: [1, 2, 0, 1, 3, 2]

object-bitmap: 0b110111
objects: [5, 1, 3, 4, 0, 1]

The predicate-bitmap and object-bitmap are encoded as CBOR integers. If the binary representation becomes larger than what can be represented with CBOR integers, CBOR Bignums are used.

This representation of triples is called bitmap triples. This is exactly the encoding used in the HDT serialization [HDT].

4. Content-addressable Molecule

Most existing RDF content is location-addressed. The IRIs are pointers to hosts that hold the content. If the host goes down the content is no longer available. This happens frequently enough to seriously undermine the robustness of systems relying on RDF [Polleres20].

Availability of content can be increased by caching the content on multiple peers. However, this results in the content receiving a new location. The original identifier does not match the location of the cache. Caching location-addressed content is complicated.

An alternative to identifying content by its location is to identify content by its content itself. This is called content-addressing. The hash of some content is computed and used as an unique identifier for the content.

In this section we illustrate how RDF data can be content-addressed. There are two concepts we need:

  1. A suitable grouping (molecule) of RDF triples: For this we introduce the fragment-molecule (see Section 4.1).
  2. A canonical encoding where the identifier of the molecule is not part of the encoding: Some small variations to the encoding presented in Section 3.

4.1. Fragment-Molecule

IRIs may include fragment identifiers. Fragment identifiers identify a secondary resource that is usually a part of, view of, defined in, or described in the primary resource.

In many transfer protocols, such as HTTP, fetching a resource with fragment identifier (e.g. http://example.com/resource#part-a) will return the primary (or base) resource (http://example.com/resource) that contains the requested resource with fragment identifier (and any other sub-resources with fragment identifiers).

Fragment identifiers form a natural grouping of RDF triples and we define a fragment-molecule with this intuition.

fragment-molecule

Given some IRI base subject s that does not have a fragment part. A fragment-molecule is the set of triples where either:

  • s appears in subject position
  • f appears in subject position and f is a fragment resource of s

For blank nodes subjects b, a fragment-molecule is the set of triples with b in subject position.

When content-addressing the base subject is replaced with a identifier that exactly identifies the content of the molecule. Such identifiers are URNs. Blank node base subjects are always replaced with URNs and there is no need to use them at all. Blank nodes are not permitted in content-addressed molecules.

4.2. Canonical Encoding

A content-addressable molecule is encoded like a regular molecule (see Section 3). The only difference is the types of terms encoded in the dictionary. We must make sure that terms are in a canonical form and that the base subject is replaced with a place-holder.

The CBOR tag 302 is defined for content-addressable molecules and MUST be used to clearly identify such molecules.

molecule = #6.302([
    dictionary: ca-dictionary,
    predicate-bitmap : uint / #6.2(bstr),
    predicates : [ * dictionary-reference ],
    object-bitmap : uint / #6.2(bstr),
    objects : [ * dictionary-reference ]
])
Figure 19: Encoding of RDF/CBOR Content-addressable Molecule

4.2.1. RDF Terms

RDF terms are encoded as described in Section 2 with the additional requirement that for generic literals, the canonical lexical form is used.

As the fragment molecules base subject can not be part of the encoding itself, we use the CBOR undefined item as a place-holder value:

base-subject = undefined
Figure 20: Encoding of Content-addressable Molecule Base Subject

Similarly we must make sure that in references to fragments of the molecule, the base subject is not present. We use the fragment constructor tag 305 with a single text string:

ca-fragment-reference = #6.305(tstr)
Figure 21: Encoding of Fragment References in a Content-addressable Molecule

Finally the encoding of RDF terms appearing in a content-addressable fragment molecule is:

ca-term = base-subject / ca-fragment-reference / iri / literal
Figure 22: Encoding of RDF Terms in a Content-addressable Molecule

4.2.2. Dictionary

The dictionary is encoded as defined in Section 3.1 with terms ca-terms.

Ordering between term types is as follows:

  1. Base subject
  2. Fragment references
  3. IRIs
  4. Literals

Within the term types we use lexicographical order.

ca-dictionary = [ * dictionary-entry ]

ca-dictionary-entry = ca-term / compressed-iris

compressed-iris = (
    base : iri / compressed-iri,
    + compressed-iri : [ common-prefix-length: int, suffix: tstr ]
)
Figure 23: Encoding of RDF Term Dictionary in Content-addressable Molecule

5. RDF/CBOR Stream

Multiple RDF/CBOR molecules and content-addressable molecules may be combined in a CBOR sequence [RFC8742]. In some cases it might be useful to explicitly tag a sequence (or stream) of RDF/CBOR molecules. For this we define the CBOR tag 300.

rdf-stream = [ * rdf-stream-element ]

tagged-rdf-stream = #6.300(rdf-stream)

rdf-stream-element = molecule / content-addressable-molecule
Figure 24: Encoding of RDF/CBOR Stream

Note that the array holding stream elements may be an indefinite-length array.

6. IANA Considerations

6.1. CBOR Tags Registry

This specification requires the assignment of a CBOR tag for various RDF/CBOR types. The tags are added to the CBOR Tags Registry as defined in RFC 8949 [RFC8949].

Table 1: CBOR Tag Registration for RDF/CBOR types
Tag Data Item Semantics
300 array RDF/CBOR Stream (see Section 5)
301 array RDF/CBOR Molecule (see Section 3)
302 array RDF/CBOR Content-Addressable Molecule (see Section 4)
303 array RDF/CBOR generic literal (see Section 2.2.3)
304 text string RDF/CBOR blank node (see Section 2.3)
305 array or text string RDF/CBOR fragment constructor (see Section 2.1.1 and Section 4.2.1)

7. Conclusion

We have described a binary RDF serialization based on CBOR. A reference implementation is provided in OCaml (see the ocaml-rdf library).

After implementing parsers for a bunch of other RDF serializations, using CBOR might not be such a bad idea. Low-level details of parsing are handled by CBOR and we can specify and document the encoding very concisely.

Initial tests seem promising. Further performance tests and comparisons with serializations such as HDT should be done.

Some possible improvements include:

Comments, feedback and questions are very welcome. Please get in touch with the author by mail or join the #openEngiadina IRC channel on the Libera network.

8. Acknowledgments and Final Words

Development of RDF/CBOR was done as part of the openEngiadina project and was supported by the NLnet Foundation trough the NGI0 Discovery Fund.

The openEngiadina developer rustra is imprisoned as a victim of political repression in Belarus. Read his last words in court and an interview with him. Consider donating to the Anarchist Black Cross Belarus. Support victims of repression and resist any form of repression and oppression. Resistance is not futile.

9. References

9.1. Normative References

[draft-ietf-cbor-packed-07]
Bormann, C., "Packed CBOR", , <https://datatracker.ietf.org/doc/draft-ietf-cbor-packed/07/>.
[draft-ietf-core-problem-details-08]
Fossati, T. and C. Bormann, "Concise Problem Details For CoAP APIs", , <https://datatracker.ietf.org/doc/draft-ietf-core-problem-details/08/>.
[ERIS]
Renberg, E., "Encoding for Robust Immutable Storage (ERIS)", , <http://purl.org/eris>.
[IRI_CBOR]
Occil, P., "Internationalized Resource Identifiers in CBOR", <https://peteroupc.github.io/CBOR/iri.html>.
[RDF]
Cyganiak, R. and D. Wood, "RDF 1.1 Concepts and Abstract Syntax", World Wide Web Consortium LastCall WD-rdf11-concepts-20130723, , <http://www.w3.org/TR/2013/WD-rdf11-concepts-20130723>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC2141]
Moats, R., "URN Syntax", RFC 2141, DOI 10.17487/RFC2141, , <https://www.rfc-editor.org/info/rfc2141>.
[RFC3339]
Klyne, G. and C. Newman, "Date and Time on the Internet: Timestamps", RFC 3339, DOI 10.17487/RFC3339, , <https://www.rfc-editor.org/info/rfc3339>.
[RFC3987]
Duerst, M. and M. Suignard, "Internationalized Resource Identifiers (IRIs)", RFC 3987, DOI 10.17487/RFC3987, , <https://www.rfc-editor.org/info/rfc3987>.
[RFC4122]
Leach, P., Mealling, M., and R. Salz, "A Universally Unique IDentifier (UUID) URN Namespace", RFC 4122, DOI 10.17487/RFC4122, , <https://www.rfc-editor.org/info/rfc4122>.
[RFC8610]
Birkholz, H., Vigano, C., and C. Bormann, "Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, , <https://www.rfc-editor.org/info/rfc8610>.
[RFC8949]
Bormann, C. and P. Hoffman, "Concise Binary Object Representation (CBOR)", STD 94, RFC 8949, DOI 10.17487/RFC8949, , <https://www.rfc-editor.org/info/rfc8949>.
[UUID_CBOR]
Clemente, L., "UUIDs for CBOR", , <https://github.com/lucas-clemente/cbor-specs/blob/master/uuid.md>.

9.2. Informative References

[ActivityPub]
"ActivityPub", W3C REC activitypub, W3C activitypub, <https://www.w3.org/TR/activitypub/>.
[Bast2021]
Bast, H., Brosi, P., Kalmbach, J., and A. Lehmann, "An Efficient RDF Converter and SPARQL Endpoint for the Complete OpenStreetMap Data", <https://ad-publications.cs.uni-freiburg.de/SIGSPATIAL_osm2rdf_BBKL_2021.pdf>.
[ERI]
Fernández, J.D., Llaves, A., and O. Corcho, "Efficient RDF Interchange (ERI) Format for RDF Data Streams", , <https://sci-hub.se/https://doi.org/10.1007/978-3-319-11915-1_16>.
[HDT]
Fernández, J.D., Martínez-Prieto, M.A., Gutiérrez, C., Polleres, A., and M. Arias, "Binary RDF representation for publication and exchange (HDT)", , <https://sci-hub.se/https://doi.org/10.1016/j.websem.2013.01.002>.
[JSON-LD]
Kellogg, G., Champin, P., and D. Longley, "JSON-LD 1.1", World Wide Web Consortium Recommendation REC-json-ld11-20200716, , <https://www.w3.org/TR/2020/REC-json-ld11-20200716>.
[Polleres20]
Polleres, A., Kamdar, M., Fernández, J., Tudorache, T., and M. Musen, "A more decentralized vision for Linked Data", Semantic Web Vol. 11, pp. 101-113, DOI 10.3233/SW-190380, , <https://sci-hub.se/10.3233/SW-190380>.
[RDF-Dict]
Martínez-Prieto, M. A., Fernández, J. D., Cánovas, R., and Association for Computing Machinery (ACM), "Querying RDF dictionaries in compressed space", ACM SIGAPP Applied Computing Review, vol. 12, no. 2, pp. 64-77, DOI 10.1145/2340416.2340422, , <http://dx.doi.org/10.1145/2340416.2340422>.
[RFC8742]
Bormann, C., "Concise Binary Object Representation (CBOR) Sequences", RFC 8742, DOI 10.17487/RFC8742, , <https://www.rfc-editor.org/info/rfc8742>.
[Sahlmann2018]
Sahlmann, K., Lindemann, A., and B. Schnor, "Binary Representation of Device Descriptions: CBOR versus RDF HDT", , <https://publikationsserver.tu-braunschweig.de/servlets/MCRFileNodeServlet/dbbs_derivate_00044805/Proceedings_FGSN_2018.pdf>.
[TrustyURIs]
Kuhn, T. and M. Dumontier, "Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data", , <https://arxiv.org/abs/1401.5775>.
[Tummarello05]
Tummarello, G., Morbidoni, C., Puliti, P., Piazza, F., and ACM Press, "Signing individual fragments of an RDF graph", Special interest tracks and posters of the 14th international conference on World Wide Web - WWW '05, DOI 10.1145/1062745.1062848, , <https://sci-hub.se/10.1145/1062745.1062848>.
[Witten99]
Witten, I. H., Moffat, A., and T. C. Bell, "Managing Gigabytes : Compressing and Indexing Documents and Images", , <http://library.lol/main/EA934CDCB3AB61402F8491C4F84C2370>.

Appendix A. Examples of RDF/CBOR Encoded Data

A.1. Terms

Table 2: Examples of Encoded RDF Terms
Term CBOR Diagnostic Encoded
<https://example.com> 266("https://example.com/") 0xd9010a7468747470733a2f2f6578616d706c652e636f6d2f
<https://example.com#fragment> 266("https://example.com#fragment") 0xd9010a781c68747470733a2f2f6578616d706c652e636f6d23667261676d656e74
<urn:uuid:1da600cf-c852-469a-936f-e608d3d90d9b> 37(h'1da600cfc852469a936fe608d3d90d9b') 0xd825501da600cfc852469a936fe608d3d90d9b
<urn:uuid:1da600cf-c852-469a-936f-e608d3d90d9b#a> 305([37(h'1da600cfc852469a936fe608d3d90d9b'), "a"]) 0xd9013182d825501da600cfc852469a936fe608d3d90d9b6161
"Hello World!"@en 38(["en", "Hello World!"]) 0xd8268262656e6c48656c6c6f20576f726c6421
"asdf" "asdf" 0x6461736466
true true 0xf5
42 42 0x182a
1.5 1.5 0xfa3fc00000
"POINT(7.9736903 47.5412464)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> 303([266("http://www.opengis.net/ont/geosparql#wktLiteral"), "POINT(7.9736903 47.5412464)"]) 0xd9012f82d9010a782f687474703a2f2f7777772e6f70656e6769732e6e65742f6f6e742f67656f73706172716c23776b744c69746572616c781b504f494e5428372e393733363930332034372e3534313234363429
_:bnode0 304("bnode0") 0xd9013066626e6f646530

A.2. Molecule

Some sample RDF data in Turtle:

@prefix as: <https://www.w3.org/ns/activitystreams#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix mo: <http://purl.org/ontology/mo/> .

<https://example.com/activity>
  a as:Create ;
  as:actor <xmpp:pukkamustard@jblis.xyz> ;
  as:published "2022-08-18T09:04:45-00:00"^^xsd:dateTime ;
  as:object <https://example.com/activity#object> .

<https://example.com/activity#object>
  a as:Create ;
  a as:Note ;
  geo:lat "45.1864";
  geo:long "5.7361";
  as:content "RDF/CBOR allows the efficient encoding of small pieces of content"@en .

<urn:uuid:c34d4219-5fbb-4e54-9217-1cbdaf831a64>
  a as:Listen ;
  as:published "2022-08-13T09:04:45-00:00"^^xsd:dateTime ;
  as:actor <xmpp:pukkamustard@jblis.xyz> ;
  as:object <urn:uuid:c34d4219-5fbb-4e54-9217-1cbdaf831a64#track> .

<urn:uuid:c34d4219-5fbb-4e54-9217-1cbdaf831a64#track>
  a mo:Track ;
  dcterms:creator "Funki Porcini" ;
  dcterms:title "Back Home" ;
  mo:musicbrainz <urn:uuid:a9dae29a-3f23-4c4b-804d-e125d4582adf> ;
  mo:release <urn:uuid:0f028066-5891-322e-ad8d-6aa588063a2e> ;
  foaf:maker <urn:uuid:2adb429d-e39c-467b-b175-3f40440ff630> .

The sample RDF data can be encoded in a single molecule (all triples are grouped together). The encoding in CBOR diagnostic notation:

[
// dictionary
[37(h'c34d42195fbb4e5492171cbdaf831a64'),
 [45, "#track"],
 266("https://example.com/activity"),
 [28, "#object"],
 37(h'0f0280665891322ead8d6aa588063a2e'),
 305([37(h'1da600cfc852469a936fe608d3d90d9b'), "object"]),
 37(h'2adb429de39c467bb1753f40440ff630'),
 37(h'a9dae29a3f234c4b804de125d4582adf'),
 266("xmpp:pukkamustard@jblis.xyz"),
 266("http://purl.org/dc/terms/creator"),
 [25, "title"],
 [16, "ontology/mo/Track"],
 [28, "musicbrainz"],
 [28, "release"],
 266("http://www.w3.org/1999/02/22-rdf-syntax-ns#type"),
 [18, "2003/01/geo/wgs84_pos#lat"],
 [41, "ong"],
 266("https://www.w3.org/ns/activitystreams#Create"),
 [38, "Listen"],
 [38, "Note"],
 [38, "actor"],
 [38, "content"],
 [38, "object"],
 [38, "published"],
 266("http://xmlns.com/foaf/0.1/maker"),
 38(["en", "RDF/CBOR allows the efficient encoding of small pieces of content"]),
 0("2022-08-13T09:04:45-00:00"), 0("2022-08-18T09:04:45-00:00"),
 "45.1864",
 "5.7361",
 "Back Home",
 "Funki Porcini"],

// bitmap triples
0b100010001000001000,
[14, 20, 22, 23, 9, 10, 12, 13, 14, 24, 14, 20, 22, 23, 14, 15, 16, 21],
0b1111011111111111111,
[18, 8, 1, 26, 31, 30, 7, 4, 11, 6, 17, 8, 5, 27, 17, 19, 28, 29, 25]]

The binary representation in octets (715 bytes):

8598 1fd8 2550 c34d 4219 5fbb 4e54 9217 1cbd af83 1a64 8218 2d66
2374 7261 636b d901 0a78 1c68 7474 7073 3a2f 2f65 7861 6d70 6c65 2e63 6f6d
2f61 6374 6976 6974 7982 181c 6723 6f62 6a65 6374 d825 500f 0280 6658 9132
2ead 8d6a a588 063a 2ed8 2550 2adb 429d e39c 467b b175 3f40 440f f630 d825
50a9 dae2 9a3f 234c 4b80 4de1 25d4 582a dfd9 010a 781b 786d 7070 3a70 756b
6b61 6d75 7374 6172 6440 6a62 6c69 732e 7879 7ad9 010a 7820 6874 7470 3a2f
2f70 7572 6c2e 6f72 672f 6463 2f74 6572 6d73 2f63 7265 6174 6f72 8218 1965
7469 746c 6582 1071 6f6e 746f 6c6f 6779 2f6d 6f2f 5472 6163 6b82 181c 6b6d
7573 6963 6272 6169 6e7a 8218 1c67 7265 6c65 6173 65d9 010a 782f 6874 7470
3a2f 2f77 7777 2e77 332e 6f72 672f 3139 3939 2f30 322f 3232 2d72 6466 2d73
796e 7461 782d 6e73 2374 7970 6582 1278 1932 3030 332f 3031 2f67 656f 2f77
6773 3834 5f70 6f73 236c 6174 8218 2963 6f6e 67d9 010a 782c 6874 7470 733a
2f2f 7777 772e 7733 2e6f 7267 2f6e 732f 6163 7469 7669 7479 7374 7265 616d
7323 4372 6561 7465 8218 2666 4c69 7374 656e 8218 2664 4e6f 7465 8218 2665
6163 746f 7282 1826 6763 6f6e 7465 6e74 8218 2666 6f62 6a65 6374 8218 2669
7075 626c 6973 6865 64d9 010a 781f 6874 7470 3a2f 2f78 6d6c 6e73 2e63 6f6d
2f66 6f61 662f 302e 312f 6d61 6b65 72d8 2682 6265 6e78 4152 4446 2f43 424f
5220 616c 6c6f 7773 2074 6865 2065 6666 6963 6965 6e74 2065 6e63 6f64 696e
6720 6f66 2073 6d61 6c6c 2070 6965 6365 7320 6f66 2063 6f6e 7465 6e74 c078
1932 3032 322d 3038 2d31 3354 3039 3a30 343a 3435 2d30 303a 3030 c078 1932
3032 322d 3038 2d31 3854 3039 3a30 343a 3435 2d30 303a 3030 6734 352e 3138
3634 6635 2e37 3336 3169 4261 636b 2048 6f6d 656d 4675 6e6b 6920 506f 7263
696e 691a 0002 2208 920d 1315 1608 090b 0c0d 170d 1315 160d 0e0f 141a 0007
bfff 9311 0701 1819 181e 181d 0604 0a05 1007 0318 1a10 1218 1b18 1c18
18

A.3. Content-addressable Molecule

A small RDF molecule in Turtle:

@prefix as: <https://www.w3.org/ns/activitystreams#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<>
  a as:Create ;
  as:actor <xmpp:pukkamustard@jblis.xyz> ;
  as:published "2022-08-18T09:04:45-00:00"^^xsd:dateTime ;
  as:object <#object> .

<#object>
  a as:Note ;
  as:content "RDF is underused in decentralized systems. RDF/CBOR is an attempt to change that.".

Encoded as a Content-addressable molecule in CBOR diagnostic notation:

302([
    // dictionary
    [undefined,
    305("object"),
    266("xmpp:pukkamustard@jblis.xyz"),
    266("http://www.w3.org/1999/02/22-rdf-syntax-ns#type"),
    266("https://www.w3.org/ns/activitystreams#Create"),
    [38, "Note"],
    [38, "actor"],
    [38, "content"],
    [38, "object"],
    [38, "published"],
    0("2022-08-18T09:04:45-00:00"),
    "RDF is underused in decentralized systems. RDF/CBOR is an attempt to change that."],

    // bitmap triples
    0b101000,
    [3, 6, 8, 9, 3, 7],
    0b111111,
    [4, 2, 1, 10, 5, 11]])

The binary representation in octets (329 bytes):

d901 2e85 8cf7 d901 3166 6f62 6a65 6374 d901 0a78 1b78 6d70 703a
7075 6b6b 616d 7573 7461 7264 406a 626c 6973 2e78 797a d901 0a78 2f68 7474
703a 2f2f 7777 772e 7733 2e6f 7267 2f31 3939 392f 3032 2f32 322d 7264 662d
7379 6e74 6178 2d6e 7323 7479 7065 d901 0a78 2c68 7474 7073 3a2f 2f77 7777
2e77 332e 6f72 672f 6e73 2f61 6374 6976 6974 7973 7472 6561 6d73 2343 7265
6174 6582 1826 644e 6f74 6582 1826 6561 6374 6f72 8218 2667 636f 6e74 656e
7482 1826 666f 626a 6563 7482 1826 6970 7562 6c69 7368 6564 c078 1932 3032
322d 3038 2d31 3854 3039 3a30 343a 3435 2d30 303a 3030 7851 5244 4620 6973
2075 6e64 6572 7573 6564 2069 6e20 6465 6365 6e74 7261 6c69 7a65 6420 7379
7374 656d 732e 2052 4446 2f43 424f 5220 6973 2061 6e20 6174 7465 6d70 7420
746f 2063 6861 6e67 6520 7468 6174 2e18 2886 0306 0809 0307 183f 8604 0201
0a05 0b

The URN of the molecule when using the Blake2b-256 hash function is urn:blake2b:7B6VYVGTSQC7KWXANVA4PYUP6VDGSNIOLYX4QLY7AF5CKHAIMJ4QE7U3DTGPCSSFEW4PIJ4OFZ4AEZVYEOZV3KW476RDGUFZR4JGOOY.

Back as Turtle using the computed base subject:

@prefix as: <https://www.w3.org/ns/activitystreams#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<urn:blake2b:7B6VYVGTSQC7KWXANVA4PYUP6VDGSNIOLYX4QLY7AF5CKHAIMJ4QE7U3DTGPCSSFEW4PIJ4OFZ4AEZVYEOZV3KW476RDGUFZR4JGOOY>
  a as:Create ;
  as:actor <xmpp:pukkamustard@jblis.xyz> ;
  as:published "2022-08-18T09:04:45-00:00"^^xsd:dateTime ;
  as:object <#object> .

<urn:blake2b:7B6VYVGTSQC7KWXANVA4PYUP6VDGSNIOLYX4QLY7AF5CKHAIMJ4QE7U3DTGPCSSFEW4PIJ4OFZ4AEZVYEOZV3KW476RDGUFZR4JGOOY#object>
  a as:Note ;
  as:content "RDF is underused in decentralized systems. RDF/CBOR is an attempt to change that.".

Index

B C D F M R

Author's Address

pukkamustard