Arcanna Ontology

EXPERIMENTAL

This feature is currently experimental. The DSL syntax and behavior may change in future releases. Use it in non-production environments first and provide feedback to the Arcanna team.

Arcanna Ontology lets you define how security alerts are transformed into a knowledge graph. You write a JSON-based DSL that describes what entities (nodes) to extract from your alerts, how they relate to each other, and what asset inventory data to seed — Arcanna handles the rest.

The result is a Neo4j graph that you can query, visualize, and reason over to find attack patterns, policy violations, and lateral movement across your infrastructure.

What is an Ontology

Before writing a DSL, it helps to understand three core concepts from Neo4j.

Nodes

A node represents an entity — an IP address, a host, a user, a process, a signature. Each node has a label (its type) and properties (key-value pairs that describe it).

(:IP {address: "176.65.149.236", geo_country: "Russia"})
(:Host {name: "web-server-01", environment: "production"})
(:User {name: "root"})

Think of labels as table names and properties as columns. A node labeled IP with property address: "176.65.149.236" is one specific IP in your graph.

Relationships

A relationship connects two nodes with a direction and a type. Relationships can also carry properties.

(:IP)-[:COMMUNICATED_WITH {transport: "tcp", bytes: 4096}]->(:IP)
(:Process)-[:EXECUTED_ON]->(:Host)
(:User)-[:RAN_BY]->(:Process)

The relationship type (e.g. COMMUNICATED_WITH) describes how two entities are connected. Direction matters — A -[:COMMUNICATED_WITH]-> B means A initiated communication toward B.

Properties

Both nodes and relationships store properties as key-value pairs. In Arcanna's DSL, you map these properties to fields in your alert data using source paths — dot-separated paths into the alert JSON.

For example, source.geo.country_name extracts the country from this alert structure:

{
  "source": {
    "ip": "176.65.149.236",
    "geo": {
      "country_name": "Russia"
    }
  }
}

MERGE and Deduplication

Arcanna uses Neo4j's MERGE operation, which means: find this node if it exists, create it if it doesn't. If three alerts reference the same IP 176.65.149.236, you get one IP node with updated seen_count and last_seen — not three duplicates.

This is the key reason a graph database works well for security data. Over time, your graph accumulates a connected picture of your infrastructure without duplication.

DSL Structure

A DSL configuration is a JSON object with four top-level sections:

{
  "timestamp_field": "@timestamp",
  "include_job_id": true,
  "assets": [],
  "nodes": [],
  "relationships": []
}

Field	Required	Description
`timestamp_field`	No	Which alert field contains the timestamp. Defaults to `@timestamp`.
`include_job_id`	No	If `true` (default), all nodes are scoped by job ID — each job has its own isolated graph. Set to `false` for a shared graph across jobs.
`assets`	No	Static inventory data (hosts, allowed ports) seeded before alert processing.
`nodes`	Yes	Defines what entities to extract from each alert.
`relationships`	Yes	Defines how extracted entities connect to each other.

Defining Nodes

Each entry in the nodes array tells Arcanna: "Extract this type of entity from each alert."

{
  "id": "ip_source",
  "label": "IP",
  "properties": [
    {"name": "address", "source_path": "source.ip", "is_label": true},
    {"name": "geo_country", "source_path": "source.geo.country_name"},
    {"name": "as_org", "source_path": "source.as.organization.name"}
  ]
}

Field	Required	Description
`id`	Yes	Unique identifier for this node definition. Used by relationships to reference it.
`label`	Yes	The Neo4j label (node type). Multiple node definitions can share the same label.
`properties`	Yes	List of properties to extract. Exactly one must have `"is_label": true` — this is the property used to identify and deduplicate the node.

The `id` and `label` Distinction

This is the most important concept in the DSL. The id is an internal reference for wiring relationships. The label is what Neo4j sees.

Two node definitions can share the same label but have different ids and source_paths. This is how you handle the same entity type appearing in different fields:

{
  "id": "ip_source",
  "label": "IP",
  "properties": [
    {"name": "address", "source_path": "source.ip", "is_label": true}
  ]
},
{
  "id": "ip_dest",
  "label": "IP",
  "properties": [
    {"name": "address", "source_path": "destination.ip", "is_label": true}
  ]
}

Both create :IP nodes in Neo4j. If source.ip and destination.ip are the same value (e.g. 10.128.0.77), MERGE produces one node — no duplicates. But during relationship building, ip_source and ip_dest are tracked separately so Arcanna knows which end of COMMUNICATED_WITH is which.

Properties

Each property maps a Neo4j property name to a source path in the alert JSON.

{"name": "geo_country", "source_path": "source.geo.country_name"}

Field	Required	Description
`name`	Yes	The property name stored in Neo4j. Must not contain dots.
`source_path`	Yes	Dot-separated path into the alert JSON to extract the value.
`is_label`	No	If `true`, this property is used as the node's identity for MERGE. Exactly one per node definition.
`is_timestamp`	No	If `true`, the value is stored as a Neo4j `datetime`.

Defining Relationships

Each entry in relationships connects two node definitions:

{
  "type": "COMMUNICATED_WITH",
  "source_id": "ip_source",
  "target_id": "ip_dest",
  "properties": [
    {"name": "transport", "source_path": "network.transport"}
  ],
  "condition": {"type": "field_exists", "field": "destination.ip"}
}

Field	Required	Description
`type`	Yes	The Neo4j relationship type.
`source_id`	Yes	The `id` of the source node definition.
`target_id`	Yes	The `id` of the target node definition.
`properties`	No	Properties to extract onto the relationship.
`condition`	No	Only create this relationship if the condition is met.

Conditions

Conditions prevent Arcanna from creating relationships when the relevant data is missing from an alert. Without conditions, a Suricata network alert would try to create process relationships (which don't exist in it), and vice versa.

field_exists
field_equals
and
or

{"type": "field_exists", "field": "destination.ip"}

Creates the relationship only if destination.ip is present in the alert.

{"type": "field_equals", "field": "event.category", "value": "process"}

Creates the relationship only if the field matches the exact value.

{
  "type": "and",
  "conditions": [
    {"type": "field_exists", "field": "source.ip"},
    {"type": "field_exists", "field": "destination.ip"}
  ]
}

All conditions must be true.

{
  "type": "or",
  "conditions": [
    {"type": "field_exists", "field": "rule.name"},
    {"type": "field_exists", "field": "kibana.alert.rule.name"}
  ]
}

At least one condition must be true.

Available condition types: field_exists, field_equals, field_not_equals, field_contains, fields_equal, fields_not_equal, and, or.

Defining Assets

Assets are static inventory data — hosts, their IPs, allowed ports — that get seeded into the graph before alert processing. This lets you answer questions like "Was this traffic to a known host?" or "Is this port allowed?" by traversing the graph structure.

{
  "assets": [
    {
      "id": "asset_host",
      "label": "Host",
      "key_property": "name",
      "data": [
        {
          "name": "installer-arcanna-ai",
          "environment": "production",
          "cloud_provider": "gcp",
          "private_ips": ["10.128.0.77"],
          "public_ips": ["34.69.67.117"],
          "allowed_ports": [22, 80, 443]
        }
      ],
      "relationships": [
        {
          "type": "HAS_PRIVATE_IP",
          "target_node_id": "ip_source",
          "source_field": "private_ips"
        },
        {
          "type": "ALLOWS_PORT",
          "target_node_id": "port",
          "source_field": "allowed_ports"
        }
      ]
    }
  ]
}

Field	Required	Description
`id`	Yes	Unique identifier for this asset definition.
`label`	Yes	The Neo4j label. Should match a node label if you want MERGE to unify them (e.g. `Host` from assets merges with `Host` from alerts).
`key_property`	Yes	Which property uniquely identifies this asset (used as MERGE key).
`data`	Yes	Array of asset records. Scalar values become node properties. List values are used by relationships.
`relationships`	No	Links from this asset to nodes defined in the `nodes` section.

Asset Relationships

Asset relationships use target_node_id to reference a node definition by its id. Arcanna resolves the target's label and key property automatically.

{
  "type": "HAS_PRIVATE_IP",
  "target_node_id": "ip_source",
  "source_field": "private_ips"
}

This creates one HAS_PRIVATE_IP relationship per value in the private_ips list, pointing to an :IP node (because ip_source is defined with label IP and key property address).

Complete Example

This DSL handles both Suricata network alerts and Elastic endpoint process alerts in a single configuration:

{
  "timestamp_field": "@timestamp",
  "include_job_id": true,

  "assets": [
    {
      "id": "asset_host",
      "label": "Host",
      "key_property": "name",
      "data": [
        {
          "name": "installer-arcanna-ai",
          "environment": "production",
          "cloud_provider": "gcp",
          "cloud_region": "us-central1",
          "private_ips": ["10.128.0.77"],
          "public_ips": ["34.69.67.117"],
          "allowed_ports": [22, 80, 443]
        }
      ],
      "relationships": [
        {"type": "HAS_PRIVATE_IP", "target_node_id": "ip_source", "source_field": "private_ips"},
        {"type": "HAS_PUBLIC_IP", "target_node_id": "ip_source", "source_field": "public_ips"},
        {"type": "ALLOWS_PORT", "target_node_id": "port", "source_field": "allowed_ports"}
      ]
    }
  ],

  "nodes": [
    {
      "id": "ip_source",
      "label": "IP",
      "properties": [
        {"name": "address", "source_path": "source.ip", "is_label": true},
        {"name": "geo_country", "source_path": "source.geo.country_name"},
        {"name": "geo_city", "source_path": "source.geo.city_name"},
        {"name": "as_org", "source_path": "source.as.organization.name"}
      ]
    },
    {
      "id": "ip_dest",
      "label": "IP",
      "properties": [
        {"name": "address", "source_path": "destination.ip", "is_label": true}
      ]
    },
    {
      "id": "host",
      "label": "Host",
      "properties": [
        {"name": "name", "source_path": "host.hostname", "is_label": true},
        {"name": "os_full", "source_path": "host.os.full"},
        {"name": "architecture", "source_path": "host.architecture"}
      ]
    },
    {
      "id": "signature",
      "label": "Signature",
      "properties": [
        {"name": "name", "source_path": "rule.name", "is_label": true},
        {"name": "sid", "source_path": "rule.id"},
        {"name": "category", "source_path": "rule.category"}
      ]
    },
    {
      "id": "attack_category",
      "label": "AttackCategory",
      "properties": [
        {"name": "name", "source_path": "rule.category", "is_label": true}
      ]
    },
    {
      "id": "country_source",
      "label": "Country",
      "properties": [
        {"name": "name", "source_path": "source.geo.country_name", "is_label": true},
        {"name": "iso_code", "source_path": "source.geo.country_iso_code"}
      ]
    },
    {
      "id": "country_dest",
      "label": "Country",
      "properties": [
        {"name": "name", "source_path": "destination.geo.country_name", "is_label": true},
        {"name": "iso_code", "source_path": "destination.geo.country_iso_code"}
      ]
    },
    {
      "id": "port",
      "label": "Port",
      "properties": [
        {"name": "number", "source_path": "destination.port", "is_label": true}
      ]
    },
    {
      "id": "process",
      "label": "Process",
      "properties": [
        {"name": "executable", "source_path": "process.executable", "is_label": true},
        {"name": "name", "source_path": "process.name"},
        {"name": "command_line", "source_path": "process.command_line"},
        {"name": "hash_sha256", "source_path": "process.hash.sha256"}
      ]
    },
    {
      "id": "parent_process",
      "label": "Process",
      "properties": [
        {"name": "executable", "source_path": "process.parent.executable", "is_label": true},
        {"name": "name", "source_path": "process.parent.name"}
      ]
    },
    {
      "id": "user",
      "label": "User",
      "properties": [
        {"name": "name", "source_path": "user.name", "is_label": true}
      ]
    },
    {
      "id": "alert_rule",
      "label": "AlertRule",
      "properties": [
        {"name": "name", "source_path": "kibana.alert.rule.name", "is_label": true},
        {"name": "severity", "source_path": "kibana.alert.severity"},
        {"name": "risk_score", "source_path": "kibana.alert.risk_score"}
      ]
    }
  ],

  "relationships": [
    {
      "type": "COMMUNICATED_WITH",
      "source_id": "ip_source",
      "target_id": "ip_dest",
      "properties": [
        {"name": "transport", "source_path": "network.transport"},
        {"name": "bytes", "source_path": "network.bytes"}
      ],
      "condition": {"type": "field_exists", "field": "destination.ip"}
    },
    {
      "type": "TRIGGERED",
      "source_id": "ip_source",
      "target_id": "signature",
      "properties": [],
      "condition": {"type": "field_exists", "field": "rule.name"}
    },
    {
      "type": "CATEGORIZED_AS",
      "source_id": "signature",
      "target_id": "attack_category",
      "properties": [],
      "condition": {"type": "field_exists", "field": "rule.category"}
    },
    {
      "type": "SRC_ORIGINATES_IN",
      "source_id": "ip_source",
      "target_id": "country_source",
      "properties": [],
      "condition": {"type": "field_exists", "field": "source.geo.country_name"}
    },
    {
      "type": "DST_ORIGINATES_IN",
      "source_id": "ip_dest",
      "target_id": "country_dest",
      "properties": [],
      "condition": {"type": "field_exists", "field": "destination.geo.country_name"}
    },
    {
      "type": "TARGETED_PORT",
      "source_id": "ip_source",
      "target_id": "port",
      "properties": [],
      "condition": {"type": "field_exists", "field": "destination.port"}
    },
    {
      "type": "EXECUTED_ON",
      "source_id": "process",
      "target_id": "host",
      "properties": [],
      "condition": {"type": "field_exists", "field": "process.executable"}
    },
    {
      "type": "RAN_BY",
      "source_id": "process",
      "target_id": "user",
      "properties": [],
      "condition": {"type": "field_exists", "field": "process.executable"}
    },
    {
      "type": "SPAWNED",
      "source_id": "parent_process",
      "target_id": "process",
      "properties": [],
      "condition": {"type": "field_exists", "field": "process.parent.executable"}
    },
    {
      "type": "RAISED_ALERT",
      "source_id": "process",
      "target_id": "alert_rule",
      "properties": [
        {"name": "risk_score", "source_path": "kibana.alert.risk_score"}
      ],
      "condition": {"type": "field_exists", "field": "kibana.alert.rule.name"}
    },
    {
      "type": "ALERT_ON_HOST",
      "source_id": "alert_rule",
      "target_id": "host",
      "properties": [],
      "condition": {"type": "field_exists", "field": "host.hostname"}
    },
    {
      "type": "USER_ON_HOST",
      "source_id": "user",
      "target_id": "host",
      "properties": [],
      "condition": {"type": "field_exists", "field": "user.name"}
    }
  ]
}

What This Produces

When a Suricata alert like "source.ip": "176.65.149.236", "destination.ip": "10.128.0.77", "destination.port": 22 is ingested, the graph gets:

An :IP node for 176.65.149.236 with geo enrichment
An :IP node for 10.128.0.77 (merged with the asset-seeded one)
A :Port node for 22 (merged with the asset-seeded one)
A :Signature node for the rule that fired
COMMUNICATED_WITH, TRIGGERED, TARGETED_PORT, SRC_ORIGINATES_IN relationships

When an Elastic endpoint process alert from the same host arrives, the graph adds:

A :Process node for /usr/bin/systemctl
A :Process node for /bin/sh (parent)
SPAWNED, EXECUTED_ON, RAN_BY, RAISED_ALERT relationships
The :Host node is the same node as the one from assets — MERGE unifies them

Over time, the graph builds a connected picture where you can traverse from an attacking IP through its communication to a host, see what processes ran on that host, who ran them, and what alerts they triggered.

Querying the Graph

The graph structure encodes the answers — no pre-computation needed. Here are some example Cypher queries you can run in Neo4j Browser.

Port Policy Violations

Find traffic to known hosts on ports they don't allow:

MATCH (src:IP)-[:COMMUNICATED_WITH]->(dst:IP)<-[:HAS_PRIVATE_IP]-(h:Host)
MATCH (src)-[:TARGETED_PORT]->(p:Port)
WHERE NOT (h)-[:ALLOWS_PORT]->(p)
RETURN src.address, dst.address, p.number, h.name

Cross-Country Communication

Find IPs talking across country borders:

MATCH (src:IP)-[:COMMUNICATED_WITH]->(dst:IP),
      (src)-[:SRC_ORIGINATES_IN]->(c1:Country),
      (dst)-[:DST_ORIGINATES_IN]->(c2:Country)
WHERE c1 <> c2
RETURN src.address, c1.name, dst.address, c2.name

Process Trees with Alerts

See parent-child process chains and the alerts they triggered:

MATCH (pp:Process)-[:SPAWNED]->(p:Process)-[:EXECUTED_ON]->(h:Host)
OPTIONAL MATCH (p)-[:RAISED_ALERT]->(a:AlertRule)
OPTIONAL MATCH (p)-[:RAN_BY]->(u:User)
RETURN pp.executable, p.executable, u.name, h.name, a.name, a.risk_score

High Risk Users

Find users whose processes triggered the most severe alerts:

MATCH (u:User)<-[:RAN_BY]-(p:Process)-[:RAISED_ALERT]->(a:AlertRule)
RETURN u.name, count(DISTINCT a) AS alerts, max(toInteger(a.risk_score)) AS max_risk
ORDER BY max_risk DESC

Unknown Hosts Receiving Traffic

Find destination IPs that don't belong to any known host:

MATCH (src:IP)-[:COMMUNICATED_WITH]->(dst:IP)
WHERE NOT (dst)<-[:HAS_PRIVATE_IP|HAS_PUBLIC_IP]-(:Host)
RETURN dst.address, count(src) AS sources
ORDER BY sources DESC

Playground

EXPERIMENTAL

The Playground feature allows you to test your DSL configuration against real alerts before deploying it to production.

The Playground inserts temporary data into Neo4j with an isolated identifier, lets you query and visualize the result, and cleans up when you're done. No permanent data is modified.

The flow is:

Select one or more alerts from your job
Click Simulate — Arcanna inserts nodes and relationships into Neo4j tagged with a temporary ID
View the resulting graph visualization
Click Cleanup to remove all temporary data, or adjust your DSL and simulate again

Validation Rules

The DSL is validated before processing. Common validation errors:

Missing id: Every node and asset must have a unique id.
Missing is_label: Every node must have exactly one property with "is_label": true.
Dots in property names: Use source_path for nested fields. "name": "source.ip" is invalid; use "name": "address", "source_path": "source.ip".
Unknown source_id / target_id: Relationship references must match a node id.
Unknown target_node_id: Asset relationships must reference a valid node id.
Duplicate id: Each node id must be unique across the entire configuration.

What is an Ontology​

Nodes​

Relationships​

Properties​

MERGE and Deduplication​

DSL Structure​

Defining Nodes​

The id and label Distinction​

Properties​

Defining Relationships​

Conditions​

Defining Assets​

Asset Relationships​

Complete Example​

What This Produces​

Querying the Graph​

Port Policy Violations​

Cross-Country Communication​

Process Trees with Alerts​

High Risk Users​

Unknown Hosts Receiving Traffic​

Playground​

Validation Rules​

What is an Ontology

Nodes

Relationships

Properties

MERGE and Deduplication

DSL Structure

Defining Nodes

The `id` and `label` Distinction

Properties

Defining Relationships

Conditions

Defining Assets

Asset Relationships

Complete Example

What This Produces

Querying the Graph

Port Policy Violations

Cross-Country Communication

Process Trees with Alerts

High Risk Users

Unknown Hosts Receiving Traffic

Playground

Validation Rules