RDF Sync
Contents
Description
TL;DR Synchronizes Drupal entities, as triples, to an RDF backend
Terminology
Subject
The semantic triple subject as a URI. For background, see https://en.wikipedia.org/wiki/Semantic_triple
Predicate
The semantic triple predicate as a URI. For background, see https://en.wikipedia.org/wiki/Semantic_triple
Object
The semantic triple object. Can be a resource URI, pointing to other triple, or a literal. For background, see https://en.wikipedia.org/wiki/Semantic_triple
Entity URI
Used to identify the entity in the RDF/triplestore and acts as a subject of all triples representing that entity. Think of this value as a universally unique ID that identifies the entity, very similar to the well-known Drupal entity UUID field, but complying to a URI as pattern. Two entities, even they are from different entity types and/or bundles, cannot share the same URI.
Mapping
A relation between an entity field property/column and an RDF
predicate. This relation can be defined as third-party settings in
the entity bundle config entity or in code, by implementing
hook_entity_bundle_info_alter()
. When an entity is synchronized to RDF, as
triples, each mapped field property value will be represented as a semantic
triple having:
- The entity URI as triple subject,
- The entity field property mapping as triple predicate,
- The entity field property value as triple object.
RDF type
An RDF resource URI that identifies the entity bundle in RDF. Normally, this URI
is the object of a triple having the entity URI as
subject and http://www.w3.org/1999/02/22-rdf-syntax-ns#type
as
predicate. A notable exception are the taxonomy term entities,
which are a special case in the "RDF World". They provide also a mapping for the
entity bundle field and the RDF type will be object of a triple
with the bundle mapping as predicate. Being a
predicate, RDF type is always a URI.
How it works
The module allows to map entity field properties so that their values are synchronized to an RDF/triplestore backend.
Automatic synchronization
When an entity is inserted, updated or deleted, its RDF representation in the RDF/triplestore is synchronized. Only fields that are mapped will be synchronized.
Manual synchronization
In some circumstances the automatic synchronization can be disabled:
PHP
\Drupal::service('rdf_sync.synchronizer')->disableSynchronization();
CLI
vendor/bin/drush rdf_sync:disable
Run manual synchronization:
PHP
use Drupal\rdf_sync\Model\SyncMethod;
\Drupal::service('rdf_sync.synchronizer')->synchronize(SyncMethod::UPDATE, [$entity1, $entity2, ...]);
CLI
# Synchronize all nodes with mapped node-type.
vendor/bin/drush rdf_sync:synchronize node
# Synchronize all page and article nodes.
vendor/bin/drush rdf_sync:synchronize node --bundle=page,article
Switch back to automated synchronization:
PHP
\Drupal::service('rdf_sync.synchronizer')->enableSynchronization();
CLI
vendor/bin/drush rdf_sync:enable
Configuration
Visit /admin/config/system/rdf-sync
and configure:
- The RDF graph URI
- The endpoint (protocol, host, port, query & update & graph store paths)
Defining mappings
Mappings can be either configured, or defined in code.
Configure mappings
Defining mappings in configuration is only possible for entity types that are defining bundles as config entities. Such entity types are nodes, taxonomy terms, etc. There are two kind of mappings: bundle level and configurable field level.
Bundle mappings
Visit the administrative bundle edit form (e.g., for article nodes go to
/admin/structure/types/manage/article
) and fill values under the "RDF sync"
section.
Configurable field mappings
Visit the field configuration form (e.g., for article body go to
/admin/structure/types/manage/article/fields/node.article.body
) and fill
values under "RDF sync mapping" section.
Define in code
As an alternative to configuration of mappings but also for entity types which
are not declaring their bundles as config entities, it's possible to define
mapping directly in code by implementing hook_entity_bundle_info_alter()
.
Here's an example on how to add mappings to the article node bundle:
function my_module_entity_bundle_info_alter(array &$bundles): void {
if (isset($bundles['node']['article'])) {
$bundles['node']['article']['rdf_sync'] = [
// The RDF type.
'type' => 'http://example.com/article',
// The name of the field for the entity URI.
'uri_field_name' => 'rdf_uri',
// The plugin used to build the URI for new entities.
// @see \Drupal\rdf_sync\Plugin\rdf_sync\RdfUriGenerator\DefaultRdfUriGenerator()
'uri_plugin' => 'my_custom_plugin',
// Fields properties mappings. Includes (bundle) base & configurable fields.
'fields' => [
'title' => [
'value' => [
// Mapped predicate.
'predicate' => 'http://example.com/article/title',
// NULL for translatable strings or a simple type, such as
// 'xsd:boolean', 'xsd:string', etc., or 'resource' for entity
// reference fields.
'type' => NULL,
],
],
'body' => [
// Note that we're using 'processed', which is a computed property,
// instead of 'value', in order to benefit from text formatting.
'processed' => [
'predicate' => 'http://example.com/article/content',
'type' => 'xsd:string',
],
],
],
];
}
}
Architecture
The module relies on EasyRdf which is a PHP library that allows to manipulate RDF graphs and triples.
In the core of synchronizer is a specialized normalizer
(\Drupal\rdf_sync\Normalizer\RdfSyncNormalizer
) that knows to normalize a
mapped entity as an EasyRdf graph PHP representation (\EasyRdf\Graph
).
The encoder (\Drupal\rdf_sync\Encoder\RdfSyncEncoder
) allows to serialize a
graph as any of the formats supported by EasyRdf. This includes serialization as
jsonld, n3, ntriples, rdfxml or turtle.
By providing the normalizer and encoder it's now easy to get very quickly the RDF representation of an entity:
PHP
// Represent the entity as JSON-LD
\Drupal::service('serializer')->serialize($node, 'jsonld');
// As Turtle
\Drupal::service('serializer')->serialize($node, 'turtle');
CLI
# Supposing http://example.com/node/123 is the canonical URL of the entity
curl http://example.com/node/123?_format=jsonld
curl http://example.com/node/123?_format=turtle
Events
The module provides a set of events that can be used to alter the data and entities before and after synchronization. The events are:
-
\Drupal\rdf_sync\Event\RdfSyncNormalizeEvent
: Allows to alter the triples before they are serialized or add new ones. -
\Drupal\rdf_sync\Event\RdfSyncEvent
: Allows to perform alterations on the array of entities before syncing them.
Sub-modules
The rdf_sync_published
submodule provides a way to filter out unpublished
entities from synchronization. It does so by listening to the
\Drupal\rdf_sync\Event\RdfSyncEvent
event and filtering out the entities that
are not published.
Contributing
Feature requests, bug reports, and merge requests are welcomed. Please follow the Drupal coding standards and best practices. Merge requests should contain test coverage.
All development takes place in Drupal.org.
We're using DDEV together with DDEV integration for developing Drupal contrib projects add-on the for module development:
- To get familiar with DDEV, visit https://ddev.readthedocs.io/en/stable/
- Read the add-on instructions at https://github.com/ddev/ddev-drupal-contrib