The triples are stored in a self-index based on succinct data-structures. The precise index depends on the mode of access. Forward search uses succinct indexes. Going backwards entails the use of wavelet trees.
To expand on this, and correct slightly…
The triples are indeed stored in a self-index, meaning, they’re stored ordered, as two adjacency lists. This makes it very quick to do a triple lookup if you have the subject, and even quicker if you also know the predicate.
If you do not query using the subject, we use an index structure to retrieve your data. In addition to the self-index, there’s two more indexes:
- an index from object to subject-predicate pairs. This index is also an adjacency list, only in the other direction compared to the self-index. Looking up by object is slightly slower than by subject, because there’s less data locality.
- an index from predicate to subject. This index is a wavelet tree, which compared to the adjacency list has worse lookup time but better space efficiency. Since most datasets will have much less distinct predicates than distinct subjects/objects, this is a worthwhile tradeoff.
So what structure is used depends on what information is known in the lookup. We try to go for the self-index, failing that we try to use the object to subject-predicate-pairs index, and failing that, we use the predicate index.
The only time a full scan is done is when you query for every triple.