Robot Automations#

        sequenceDiagram
    Actor RO as Robot Owner
    participant R as Robot
    participant DR as Data Repository
    participant ES as Elasticsearch

    RO->>+DR: POST /enhancement-requests/automations/ : percolator query
    DR->>-ES: Register percolator query

    alt On Import Batch
        loop For each Reference in batch
            DR->>DR: Ingest Reference
            DR->>DR: Deduplicate Reference
            DR->>ES: Percolate new Reference
            loop For each matching robot
                DR->>R: Enhancement Request with matching References
            end
            loop Continuous polling
                R->>DR: POST /robot-enhancement-batches/ : Poll for work
                DR->>R: RobotEnhancementBatch (if pending enhancements available)
            end
        end
    else On Batch Enhancement
        DR->>DR: Ingest Enhancements
        DR->>ES: Percolate new Enhancements
        loop For each matching robot
            DR->>R: Enhancement Request with matching References
        end
        loop Continuous polling
            R->>DR: POST /robot-enhancement-batches/ : Poll for work
            DR->>R: RobotEnhancementBatch (if pending enhancements available)
        end
    end
    
        flowchart TD
subgraph Repository
        G_R([Reference]) --> G_R1[Ingest Reference]
        G_R1 --> G_AUTO{Robot Automation Percolation}
        G_AUTO --> G_REQ[Create EnhancementRequests]
        G_R1 --> G_P[(Persistence)]
        G_REQ --> G_P[(Persistence)]
        G_REPO_PROC[Ingest Enhancement] --> G_P
        G_REPO_PROC --> G_AUTO
    end
    subgraph "Robot(s)"
        G_POLL[Robot Polling] --> G_BATCH[Fetch RobotEnhancementBatch]
        G_P --> G_BATCH
        G_BATCH --> G_ROBOT_PROC[Process Batch]
        G_ROBOT_PROC --> G_UPLOAD[Upload Results]
        G_UPLOAD --> G_REPO[Notify Repository]
    end
    G_REPO --> G_REPO_PROC
    

Context#

Robot automations allow Enhancement Requests to be automatically triggered based on criteria on incoming references or enhancements. This is achieved through a percolator query registered by the robot owner in the data repository using the /enhancement-requests/automations/ endpoint.

When references or enhancements match the automation criteria, the data repository creates EnhancementRequest objects for the matching robots. Robots can discover and process work through polling. When a robot polls for work, the repository creates a RobotEnhancementBatch on-demand containing available pending enhancements for that robot, up to a configurable batch size limit.

Percolation#

The automation criteria is implemented as an Elasticsearch percolator query. Percolation is the inverse of a traditional Elasticsearch search: the query is stored in the index, and the document is used to search. When writing a percolator query, the key question is: “What shape should new references and/or enhancements have to automatically request enhancements from this robot?”.

Query context is implicit when the percolator query is registered - i.e. the top-level element of RobotAutomationIn.query should not be query.

There are two scenarios that can trigger percolation:

  • On deduplication, if the active decision has changed

  • On added enhancement

Structure#

Each percolated document contains two fields: reference and changeset. Both of these fields map to Reference objects. reference is the complete reference, deduplicated, and changeset is the change that was just applied. The repository is append-only, and so is the changeset - it only represents newly available information to the reference.

Automations trigger on reference - note the implications of this below.

Some examples:

  • After deduplicating a reference, if the reference is canonical, reference and changeset will be identical: the imported reference. Automations trigger on that reference.

  • After deduplicating a reference, if the reference is a duplicate, reference will be the deduplicated view of its canonical reference, and changeset will be the duplicate reference. Automations trigger on the canonical reference.

  • After adding an enhancement, reference will be the reference with the new enhancement applied, and changeset will be an empty reference just including the new enhancement. Automations trigger on the reference that was enhanced, canonical or not. reference is still deduplicated - if it is canonical, its duplicate’s contents will be included.

For the exact structure of these inner documents, see ReferenceDomainMixin.

Query#

Automation queries must specify a filter against changeset, otherwise they risk matching against all documents.

Most use-cases will only need to lookup against changeset, to trigger upon some new dependent information. reference is provided for more complex use-cases, such as triggering on a combination of existing and new information.

The active DuplicateDetermination is included in both reference and changeset, however note this will not capture the previous duplicate decision if it has just changed. This can be used to filter automations based on if a reference has been determined to be definitely canonical, for instance.

Safeguards#

There is a simple cycle-checker in place to prevent an enhancement request from triggering an automatic enhancement request for the same robot.

Cycles involving multiple robots are however possible, so caution should be taken when considering robot automation criteria.

Examples#

The following examples are used in DESTINY to orchestrate robot automations.

Request Missing Abstract#

This percolator query matches on imported references that don’t have an abstract and have received a DOI.

{
    "bool": {
        "must": [
            {
                "nested": {
                    "path": "changeset.identifiers",
                    "query": {
                        "term": {"changeset.identifiers.identifier_type": "DOI"}
                    }
                }
            }
        ],
        "must_not": [
            {
                "nested": {
                    "path": "reference.enhancements",
                    "query": {
                        "term": {
                            "reference.enhancements.content.enhancement_type": "abstract"
                        }
                    }
                }
            }
        ]
    }
}

Request Domain Inclusion Annotation#

This percolator query matches on references that have received an abstract. This might either be on import, or on addition of a new abstract enhancement. This is an example of how the orchestration starts to piece together - if the above automation is executed, and an abstract is created, this automation will then be triggered.

{
    "bool": {
        "must": [
            {
                "nested": {
                    "path": "changeset.enhancements",
                    "query": {
                        "term": {
                            "changeset.enhancements.content.enhancement_type": "abstract"
                        }
                    },
                }
            },
        ],
    }
}

Advanced Rationale#

Automating enhancements added to duplicates#

When an enhancement is added to a duplicate, we trigger enhancements on the duplicate reference, not on the canonical reference.

Most likely, if we are fulfilling an enhancement on a duplicate, some other process has updated the duplicate determination between request & fulfillment.

Note

This also retains the option to enhance duplicates independently if desired (relevant automations will need to not filter for duplicate_determination=canonical; this is just a niche bonus).

Justification#

The rationale for this behavior is as follows:

  • [A] Enhancements should be generated on canonical references where possible, as this provides the most context to the enhancing robot.

  • [B] Because this reference is a duplicate, we can be confident that automations were triggered on the canonical reference when the duplicate decision was made.

Thus:

  • Because of [A], we would rather not trigger automation on enhancements derived purely from a duplicate reference.

  • Because of [B], we can be confident that there is no missed automation pathway by not bubbling up the automation trigger to the canonical reference.

Example scenario#

  1. E is a domain inclusion example, requiring a DOI and an abstract.

  2. C is an existing reference with a good abstract but no DOI.

  3. D is a newly ingested reference with a DOI and a partial abstract.

  4. D is imported and incorrectly marked as canonical.

  5. Automation is fired on D and sent to the robot to add E (let’s call this E(D)).

  6. A new duplicate decision is made marking D as a duplicate of C.

  7. Automation is fired on C with D as the changeset (let’s call this E(C, D)). This is statement [B] above.

  8. Robot processes and returns E(D) on D:

  9. We automate E(D) on D. (Likely this is a no-op as most automations will filter for canonicals.)

  10. Robot processes and returns E(C, D) on C:

  11. We automate E(C, D) on C (our preferred path per [A]).

        sequenceDiagram
    participant I as External
    participant D as Reference D (has DOI, bad abstract)
    participant C as Reference C (no DOI, good abstract)
    participant R as Robot


    I->>D: Import D, mark as canonical (incorrectly)
    D->>R: Fire automation for E(D) based on import of D
    I->>D: Deduplication reruns and marks D as duplicate of C
    C->>R: Fire automation for E(C,D) based on duplicate decision ([B])
    R->>D: Process & return E(D)
    D-->>D: Automate E(D) on D
    Note over D: Likely terminal
    R->>C: Process & return E(C, D)
    C-->>C: Automate E(C,D) on D
    Note over C: Preferred automation route ([A])