Direct access to terminus-store?

This is a note received via chat bot - we will follow up here as it is better for async comms:

I have a technical question regarding direct access to terminus-store via terminus_store_prolog — I would assume terminus_store_prolog is primarily meant for internal use by terminusdb, and has been published for those who might want direct access to the store (from Prolog) in addition to accessing the higher level APIs of TerminusDB. the question is: what are the limitations of using TerminusDB via its triplestore using Prolog vs using the higher level API’s TerminusDB? what can be done (without reimplementing the main functionality from scratch) using terminus_store_prolog?

surely the git-like features would not be available without borrowing code from the main terminus repository…? but I assume that the graph queries would be just as easy from Prolog given Prolog is meant for such things. but what else to consider?

1 Like

terminus_store_prolog gives direct low-level access to the store. It allows the following things:

  • Open graphs by their ID or through a graph name
  • Query graphs for triples
  • Query individual layers for additions and removals
  • Create new layers
  • Update named graphs (graphs that have a label file) to point at a new layer
  • Create ‘optimization layers’, which bundle several layers of a graph into a single layer for quicker lookups
  • Squash graphs into a single layer for even quicker lookup, at the cost of losing history information
  • Pack layers into a file
  • Unpack a layer file into a store

An incomplete list of what you don’t have here:

  • No data types. Every value is just a string. TerminusDB uses a particular string format to save typed data, but the conversion logic is all in TerminusDB, and on the store level there’s nothing preventing you from putting stuff in here that TerminusDB will simply not recognize.
  • No fixed node format. terminusdb-store, and by extension, terminus_store_prolog, does not validate that a string you claim is a node is actually a proper RFC3987 IRI.
  • No prefixes or auto-expansions. Whereas TerminusDB has some notion of active prefixes in a query and lets you write shorthands like rdf:nil, no such thing exists in store.
  • No metadata. Graphs are just stacks of layers with data in them. There is no notion of ‘commit time’, ‘author’, or ‘commit message’.
  • No nested graphs. In TerminusDB, branches are kept track of in commit graphs, which are in turn kept track of in repository metadata graphs. As far as terminusdb-store and terminus_store_prolog are concerned, only the repository metadata graph is a real named graph which can be queried directly through its name. If you wish to query a branch directly from terminus_store_prolog, you’ll have to reimplement how TerminusDB finds the proper top layer for that branch from these metadata graphs and open it by id.
  • No schema checking. TerminusDB keeps track of a schema along with your instance data, and makes sure that any data that ends up in your database follows that schema. This ensures that every node is properly typed, that all cardinalities are proper, and that no predicate connects nodes of unexpected types. By using store directly, you completely circumvent these mechanisms, and you’re able to insert whatever you like (or, more likely, you’ll be able to insert bogus data without realizing).
  • No documents. TerminusDB is able to use schema information to ingest json documents and convert them to graph data, and to convert graph data into json documents. This obviously doesn’t exist on the store level.
  • No transactions beyond a very crude ‘only update a named graph if a new layer has the current layer associated with that named graph as an ancestor, otherwise error’.

So by using store directly you’ll lose out on a lot. If you mean to interop with a terminusdb installation (rather than using store standalone for your own project), you’ll need to reimplement some logic just to be able to figure out what layers to open, and if you mean to use the library to do data modifications, you’re completely on your own in ensuring that your modifications fit the schema, and that you updated all metadata graphs right.

Assuming you have found the layer ID that you want to query, and you’re fine with working with expanded IRIs (no prefixes like rdf:nil), the triple query predicates in terminus_store_prolog are almost equivalent to those in WOQL. So you could feasibly write your queries in store directly. Still though, you’re losing out on data type awareness. Every value in store is a string, and it’ll be up to you to convert that to the proper data type.

So long story short, I don’t recommend using store directly if you intend to interop with terminusdb. If you like to use prolog directly rather than run terminusdb as a server, a much more feasible pathway is to use the core module inside terminusdb directly. This implements all the logic around resolving and opening resource descriptors, running transactions, and doing queries with full prefix awareness and datatype parsing. There’s unfortunately no good guide out there for how to do this, but you might be able to get quite far by just looking up how resolve_absolute_string_descriptor, open_descriptor, with_transaction, ask (our internal WOQL entrypoint which is able to interact with prolog variables) and the more low-level query and update predicates xrdf, xrdf_added, xrdf_deleted, insert and delete are used in various parts of the code.

If however your intention is to write a completely different database, with its own query and schema language, terminusdb-store and terminus_store_prolog may still be a good foundation for you.

I hope that helps.

2 Likes

Thank you, that is exactly the kind of response I was looking!

1 Like