Robot Automations#
sequenceDiagram
Actor RO as Robot Owner
participant R as Robot
participant DR as Data Repository
participant ES as Elasticsearch
RO->>+DR: POST /enhancement-requests/automations/ : percolator query
DR->>-ES: Register percolator query
alt On Import Batch
loop For each Reference in batch
DR->>DR: Ingest Reference
DR->>DR: Deduplicate Reference
DR->>ES: Percolate new Reference
loop For each matching robot
DR->>R: Enhancement Request with matching References
end
loop Continuous polling
R->>DR: POST /robot-enhancement-batches/ : Poll for work
DR->>R: RobotEnhancementBatch (if pending enhancements available)
end
end
else On Batch Enhancement
DR->>DR: Ingest Enhancements
DR->>ES: Percolate new Enhancements
loop For each matching robot
DR->>R: Enhancement Request with matching References
end
loop Continuous polling
R->>DR: POST /robot-enhancement-batches/ : Poll for work
DR->>R: RobotEnhancementBatch (if pending enhancements available)
end
end
flowchart TD
subgraph Repository
G_R([Reference]) --> G_R1[Ingest Reference]
G_R1 --> G_AUTO{Robot Automation Percolation}
G_AUTO --> G_REQ[Create EnhancementRequests]
G_R1 --> G_P[(Persistence)]
G_REQ --> G_P[(Persistence)]
G_REPO_PROC[Ingest Enhancement] --> G_P
G_REPO_PROC --> G_AUTO
end
subgraph "Robot(s)"
G_POLL[Robot Polling] --> G_BATCH[Fetch RobotEnhancementBatch]
G_P --> G_BATCH
G_BATCH --> G_ROBOT_PROC[Process Batch]
G_ROBOT_PROC --> G_UPLOAD[Upload Results]
G_UPLOAD --> G_REPO[Notify Repository]
end
G_REPO --> G_REPO_PROC
Context#
Robot automations allow Enhancement Requests to be automatically triggered based on criteria on incoming references or enhancements. This is achieved through a percolator query registered by the robot owner in the data repository using the /enhancement-requests/automations/ endpoint.
When references or enhancements match the automation criteria, the data repository creates EnhancementRequest objects for the matching robots. Robots can discover and process work through polling. When a robot polls for work, the repository creates a RobotEnhancementBatch on-demand containing available pending enhancements for that robot, up to a configurable batch size limit.
Percolation#
The automation criteria is implemented as an Elasticsearch percolator query. Percolation is the inverse of a traditional Elasticsearch search: the query is stored in the index, and the document is used to search. When writing a percolator query, the key question is: “What shape should new references and/or enhancements have to automatically request enhancements from this robot?”.
Query context is implicit when the percolator query is registered - i.e. the top-level element of RobotAutomationIn.query should not be query.
There are two scenarios that can trigger percolation:
On deduplication, if the active decision has changed
On added enhancement
Structure#
Each percolated document contains two fields: reference and changeset. Both of these fields map to Reference objects. reference is the complete reference, deduplicated, and changeset is the change that was just applied. The repository is append-only, and so is the changeset - it only represents newly available information to the reference.
Automations trigger on reference - note the implications of this below.
Some examples:
After deduplicating a reference, if the reference is canonical,
referenceandchangesetwill be identical: the imported reference. Automations trigger on that reference.After deduplicating a reference, if the reference is a duplicate,
referencewill be the deduplicated view of its canonical reference, andchangesetwill be the duplicate reference. Automations trigger on the canonical reference.After adding an enhancement,
referencewill be the reference with the new enhancement applied, andchangesetwill be an empty reference just including the new enhancement. Automations trigger on the reference that was enhanced, canonical or not.referenceis still deduplicated - if it is canonical, its duplicate’s contents will be included.
For the exact structure of these inner documents, see ReferenceDomainMixin.
Query#
Automation queries must specify a filter against changeset, otherwise they risk matching against all documents.
Most use-cases will only need to lookup against changeset, to trigger upon some new dependent information. reference is provided for more complex use-cases, such as triggering on a combination of existing and new information.
The active DuplicateDetermination is included in both reference and changeset, however note this will not capture the previous duplicate decision if it has just changed. This can be used to filter automations based on if a reference has been determined to be definitely canonical, for instance.
Safeguards#
There is a simple cycle-checker in place to prevent an enhancement request from triggering an automatic enhancement request for the same robot.
Cycles involving multiple robots are however possible, so caution should be taken when considering robot automation criteria.
Examples#
The following examples are used in DESTINY to orchestrate robot automations.
Request Missing Abstract#
This percolator query matches on imported references that don’t have an abstract and have received a DOI.
{
"bool": {
"must": [
{
"nested": {
"path": "changeset.identifiers",
"query": {
"term": {"changeset.identifiers.identifier_type": "DOI"}
}
}
}
],
"must_not": [
{
"nested": {
"path": "reference.enhancements",
"query": {
"term": {
"reference.enhancements.content.enhancement_type": "abstract"
}
}
}
}
]
}
}
Request Domain Inclusion Annotation#
This percolator query matches on references that have received an abstract. This might either be on import, or on addition of a new abstract enhancement. This is an example of how the orchestration starts to piece together - if the above automation is executed, and an abstract is created, this automation will then be triggered.
{
"bool": {
"must": [
{
"nested": {
"path": "changeset.enhancements",
"query": {
"term": {
"changeset.enhancements.content.enhancement_type": "abstract"
}
},
}
},
],
}
}
Advanced Rationale#
Automating enhancements added to duplicates#
When an enhancement is added to a duplicate, we trigger enhancements on the duplicate reference, not on the canonical reference.
Most likely, if we are fulfilling an enhancement on a duplicate, some other process has updated the duplicate determination between request & fulfillment.
Note
This also retains the option to enhance duplicates independently if desired (relevant automations will need to not filter for duplicate_determination=canonical; this is just a niche bonus).
Justification#
The rationale for this behavior is as follows:
[A] Enhancements should be generated on canonical references where possible, as this provides the most context to the enhancing robot.
[B] Because this reference is a duplicate, we can be confident that automations were triggered on the canonical reference when the duplicate decision was made.
Thus:
Because of [A], we would rather not trigger automation on enhancements derived purely from a duplicate reference.
Because of [B], we can be confident that there is no missed automation pathway by not bubbling up the automation trigger to the canonical reference.
Example scenario#
E is a domain inclusion example, requiring a DOI and an abstract.
C is an existing reference with a good abstract but no DOI.
D is a newly ingested reference with a DOI and a partial abstract.
D is imported and incorrectly marked as canonical.
Automation is fired on D and sent to the robot to add E (let’s call this E(D)).
A new duplicate decision is made marking D as a duplicate of C.
Automation is fired on C with D as the changeset (let’s call this E(C, D)). This is statement [B] above.
Robot processes and returns E(D) on D:
We automate E(D) on D. (Likely this is a no-op as most automations will filter for canonicals.)
Robot processes and returns E(C, D) on C:
We automate E(C, D) on C (our preferred path per [A]).
sequenceDiagram
participant I as External
participant D as Reference D (has DOI, bad abstract)
participant C as Reference C (no DOI, good abstract)
participant R as Robot
I->>D: Import D, mark as canonical (incorrectly)
D->>R: Fire automation for E(D) based on import of D
I->>D: Deduplication reruns and marks D as duplicate of C
C->>R: Fire automation for E(C,D) based on duplicate decision ([B])
R->>D: Process & return E(D)
D-->>D: Automate E(D) on D
Note over D: Likely terminal
R->>C: Process & return E(C, D)
C-->>C: Automate E(C,D) on D
Note over C: Preferred automation route ([A])