Arcanna Ontology
This feature is currently experimental. The DSL syntax and behavior may change in future releases. Use it in non-production environments first and provide feedback to the Arcanna team.
Arcanna Ontology lets you define how security alerts are transformed into a knowledge graph. You write a JSON-based DSL that describes what entities (nodes) to extract from your alerts, how they relate to each other, and what asset inventory data to seed — Arcanna handles the rest.
The result is a Neo4j graph that you can query, visualize, and reason over to find attack patterns, policy violations, and lateral movement across your infrastructure.
What is an Ontology
Before writing a DSL, it helps to understand three core concepts from Neo4j.
Nodes
A node represents an entity — an IP address, a host, a user, a process, a signature. Each node has a label (its type) and properties (key-value pairs that describe it).
(:IP {address: "176.65.149.236", geo_country: "Russia"})
(:Host {name: "web-server-01", environment: "production"})
(:User {name: "root"})
Think of labels as table names and properties as columns. A node labeled IP with property address: "176.65.149.236" is one specific IP in your graph.
Relationships
A relationship connects two nodes with a direction and a type. Relationships can also carry properties.
(:IP)-[:COMMUNICATED_WITH {transport: "tcp", bytes: 4096}]->(:IP)
(:Process)-[:EXECUTED_ON]->(:Host)
(:User)-[:RAN_BY]->(:Process)
The relationship type (e.g. COMMUNICATED_WITH) describes how two entities are connected. Direction matters — A -[:COMMUNICATED_WITH]-> B means A initiated communication toward B.
Properties
Both nodes and relationships store properties as key-value pairs. In Arcanna's DSL, you map these properties to fields in your alert data using source paths — dot-separated paths into the alert JSON.
For example, source.geo.country_name extracts the country from this alert structure:
{
"source": {
"ip": "176.65.149.236",
"geo": {
"country_name": "Russia"
}
}
}
MERGE and Deduplication
Arcanna uses Neo4j's MERGE operation, which means: find this node if it exists, create it if it doesn't. If three alerts reference the same IP 176.65.149.236, you get one IP node with updated seen_count and last_seen — not three duplicates.
This is the key reason a graph database works well for security data. Over time, your graph accumulates a connected picture of your infrastructure without duplication.
DSL Structure
A DSL configuration is a JSON object with four top-level sections:
{
"timestamp_field": "@timestamp",
"include_job_id": true,
"assets": [],
"nodes": [],
"relationships": []
}
| Field | Required | Description |
|---|---|---|
timestamp_field | No | Which alert field contains the timestamp. Defaults to @timestamp. |
include_job_id | No | If true (default), all nodes are scoped by job ID — each job has its own isolated graph. Set to false for a shared graph across jobs. |
assets | No | Static inventory data (hosts, allowed ports) seeded before alert processing. |
nodes | Yes | Defines what entities to extract from each alert. |
relationships | Yes | Defines how extracted entities connect to each other. |
Defining Nodes
Each entry in the nodes array tells Arcanna: "Extract this type of entity from each alert."
{
"id": "ip_source",
"label": "IP",
"properties": [
{"name": "address", "source_path": "source.ip", "is_label": true},
{"name": "geo_country", "source_path": "source.geo.country_name"},
{"name": "as_org", "source_path": "source.as.organization.name"}
]
}
| Field | Required | Description |
|---|---|---|
id | Yes | Unique identifier for this node definition. Used by relationships to reference it. |
label | Yes | The Neo4j label (node type). Multiple node definitions can share the same label. |
properties | Yes | List of properties to extract. Exactly one must have "is_label": true — this is the property used to identify and deduplicate the node. |
The id and label Distinction
This is the most important concept in the DSL. The id is an internal reference for wiring relationships. The label is what Neo4j sees.
Two node definitions can share the same label but have different ids and source_paths. This is how you handle the same entity type appearing in different fields:
{
"id": "ip_source",
"label": "IP",
"properties": [
{"name": "address", "source_path": "source.ip", "is_label": true}
]
},
{
"id": "ip_dest",
"label": "IP",
"properties": [
{"name": "address", "source_path": "destination.ip", "is_label": true}
]
}
Both create :IP nodes in Neo4j. If source.ip and destination.ip are the same value (e.g. 10.128.0.77), MERGE produces one node — no duplicates. But during relationship building, ip_source and ip_dest are tracked separately so Arcanna knows which end of COMMUNICATED_WITH is which.
Properties
Each property maps a Neo4j property name to a source path in the alert JSON.
{"name": "geo_country", "source_path": "source.geo.country_name"}
| Field | Required | Description |
|---|---|---|
name | Yes | The property name stored in Neo4j. Must not contain dots. |
source_path | Yes | Dot-separated path into the alert JSON to extract the value. |
is_label | No | If true, this property is used as the node's identity for MERGE. Exactly one per node definition. |
is_timestamp | No | If true, the value is stored as a Neo4j datetime. |
Defining Relationships
Each entry in relationships connects two node definitions:
{
"type": "COMMUNICATED_WITH",
"source_id": "ip_source",
"target_id": "ip_dest",
"properties": [
{"name": "transport", "source_path": "network.transport"}
],
"condition": {"type": "field_exists", "field": "destination.ip"}
}
| Field | Required | Description |
|---|---|---|
type | Yes | The Neo4j relationship type. |
source_id | Yes | The id of the source node definition. |
target_id | Yes | The id of the target node definition. |
properties | No | Properties to extract onto the relationship. |
condition | No | Only create this relationship if the condition is met. |
Conditions
Conditions prevent Arcanna from creating relationships when the relevant data is missing from an alert. Without conditions, a Suricata network alert would try to create process relationships (which don't exist in it), and vice versa.
- field_exists
- field_equals
- and
- or
{"type": "field_exists", "field": "destination.ip"}
Creates the relationship only if destination.ip is present in the alert.
{"type": "field_equals", "field": "event.category", "value": "process"}
Creates the relationship only if the field matches the exact value.
{
"type": "and",
"conditions": [
{"type": "field_exists", "field": "source.ip"},
{"type": "field_exists", "field": "destination.ip"}
]
}
All conditions must be true.
{
"type": "or",
"conditions": [
{"type": "field_exists", "field": "rule.name"},
{"type": "field_exists", "field": "kibana.alert.rule.name"}
]
}
At least one condition must be true.
Available condition types: field_exists, field_equals, field_not_equals, field_contains, fields_equal, fields_not_equal, and, or.
Defining Assets
Assets are static inventory data — hosts, their IPs, allowed ports — that get seeded into the graph before alert processing. This lets you answer questions like "Was this traffic to a known host?" or "Is this port allowed?" by traversing the graph structure.
{
"assets": [
{
"id": "asset_host",
"label": "Host",
"key_property": "name",
"data": [
{
"name": "installer-arcanna-ai",
"environment": "production",
"cloud_provider": "gcp",
"private_ips": ["10.128.0.77"],
"public_ips": ["34.69.67.117"],
"allowed_ports": [22, 80, 443]
}
],
"relationships": [
{
"type": "HAS_PRIVATE_IP",
"target_node_id": "ip_source",
"source_field": "private_ips"
},
{
"type": "ALLOWS_PORT",
"target_node_id": "port",
"source_field": "allowed_ports"
}
]
}
]
}
| Field | Required | Description |
|---|---|---|
id | Yes | Unique identifier for this asset definition. |
label | Yes | The Neo4j label. Should match a node label if you want MERGE to unify them (e.g. Host from assets merges with Host from alerts). |
key_property | Yes | Which property uniquely identifies this asset (used as MERGE key). |
data | Yes | Array of asset records. Scalar values become node properties. List values are used by relationships. |
relationships | No | Links from this asset to nodes defined in the nodes section. |
Asset Relationships
Asset relationships use target_node_id to reference a node definition by its id. Arcanna resolves the target's label and key property automatically.
{
"type": "HAS_PRIVATE_IP",
"target_node_id": "ip_source",
"source_field": "private_ips"
}
This creates one HAS_PRIVATE_IP relationship per value in the private_ips list, pointing to an :IP node (because ip_source is defined with label IP and key property address).
Complete Example
This DSL handles both Suricata network alerts and Elastic endpoint process alerts in a single configuration:
{
"timestamp_field": "@timestamp",
"include_job_id": true,
"assets": [
{
"id": "asset_host",
"label": "Host",
"key_property": "name",
"data": [
{
"name": "installer-arcanna-ai",
"environment": "production",
"cloud_provider": "gcp",
"cloud_region": "us-central1",
"private_ips": ["10.128.0.77"],
"public_ips": ["34.69.67.117"],
"allowed_ports": [22, 80, 443]
}
],
"relationships": [
{"type": "HAS_PRIVATE_IP", "target_node_id": "ip_source", "source_field": "private_ips"},
{"type": "HAS_PUBLIC_IP", "target_node_id": "ip_source", "source_field": "public_ips"},
{"type": "ALLOWS_PORT", "target_node_id": "port", "source_field": "allowed_ports"}
]
}
],
"nodes": [
{
"id": "ip_source",
"label": "IP",
"properties": [
{"name": "address", "source_path": "source.ip", "is_label": true},
{"name": "geo_country", "source_path": "source.geo.country_name"},
{"name": "geo_city", "source_path": "source.geo.city_name"},
{"name": "as_org", "source_path": "source.as.organization.name"}
]
},
{
"id": "ip_dest",
"label": "IP",
"properties": [
{"name": "address", "source_path": "destination.ip", "is_label": true}
]
},
{
"id": "host",
"label": "Host",
"properties": [
{"name": "name", "source_path": "host.hostname", "is_label": true},
{"name": "os_full", "source_path": "host.os.full"},
{"name": "architecture", "source_path": "host.architecture"}
]
},
{
"id": "signature",
"label": "Signature",
"properties": [
{"name": "name", "source_path": "rule.name", "is_label": true},
{"name": "sid", "source_path": "rule.id"},
{"name": "category", "source_path": "rule.category"}
]
},
{
"id": "attack_category",
"label": "AttackCategory",
"properties": [
{"name": "name", "source_path": "rule.category", "is_label": true}
]
},
{
"id": "country_source",
"label": "Country",
"properties": [
{"name": "name", "source_path": "source.geo.country_name", "is_label": true},
{"name": "iso_code", "source_path": "source.geo.country_iso_code"}
]
},
{
"id": "country_dest",
"label": "Country",
"properties": [
{"name": "name", "source_path": "destination.geo.country_name", "is_label": true},
{"name": "iso_code", "source_path": "destination.geo.country_iso_code"}
]
},
{
"id": "port",
"label": "Port",
"properties": [
{"name": "number", "source_path": "destination.port", "is_label": true}
]
},
{
"id": "process",
"label": "Process",
"properties": [
{"name": "executable", "source_path": "process.executable", "is_label": true},
{"name": "name", "source_path": "process.name"},
{"name": "command_line", "source_path": "process.command_line"},
{"name": "hash_sha256", "source_path": "process.hash.sha256"}
]
},
{
"id": "parent_process",
"label": "Process",
"properties": [
{"name": "executable", "source_path": "process.parent.executable", "is_label": true},
{"name": "name", "source_path": "process.parent.name"}
]
},
{
"id": "user",
"label": "User",
"properties": [
{"name": "name", "source_path": "user.name", "is_label": true}
]
},
{
"id": "alert_rule",
"label": "AlertRule",
"properties": [
{"name": "name", "source_path": "kibana.alert.rule.name", "is_label": true},
{"name": "severity", "source_path": "kibana.alert.severity"},
{"name": "risk_score", "source_path": "kibana.alert.risk_score"}
]
}
],
"relationships": [
{
"type": "COMMUNICATED_WITH",
"source_id": "ip_source",
"target_id": "ip_dest",
"properties": [
{"name": "transport", "source_path": "network.transport"},
{"name": "bytes", "source_path": "network.bytes"}
],
"condition": {"type": "field_exists", "field": "destination.ip"}
},
{
"type": "TRIGGERED",
"source_id": "ip_source",
"target_id": "signature",
"properties": [],
"condition": {"type": "field_exists", "field": "rule.name"}
},
{
"type": "CATEGORIZED_AS",
"source_id": "signature",
"target_id": "attack_category",
"properties": [],
"condition": {"type": "field_exists", "field": "rule.category"}
},
{
"type": "SRC_ORIGINATES_IN",
"source_id": "ip_source",
"target_id": "country_source",
"properties": [],
"condition": {"type": "field_exists", "field": "source.geo.country_name"}
},
{
"type": "DST_ORIGINATES_IN",
"source_id": "ip_dest",
"target_id": "country_dest",
"properties": [],
"condition": {"type": "field_exists", "field": "destination.geo.country_name"}
},
{
"type": "TARGETED_PORT",
"source_id": "ip_source",
"target_id": "port",
"properties": [],
"condition": {"type": "field_exists", "field": "destination.port"}
},
{
"type": "EXECUTED_ON",
"source_id": "process",
"target_id": "host",
"properties": [],
"condition": {"type": "field_exists", "field": "process.executable"}
},
{
"type": "RAN_BY",
"source_id": "process",
"target_id": "user",
"properties": [],
"condition": {"type": "field_exists", "field": "process.executable"}
},
{
"type": "SPAWNED",
"source_id": "parent_process",
"target_id": "process",
"properties": [],
"condition": {"type": "field_exists", "field": "process.parent.executable"}
},
{
"type": "RAISED_ALERT",
"source_id": "process",
"target_id": "alert_rule",
"properties": [
{"name": "risk_score", "source_path": "kibana.alert.risk_score"}
],
"condition": {"type": "field_exists", "field": "kibana.alert.rule.name"}
},
{
"type": "ALERT_ON_HOST",
"source_id": "alert_rule",
"target_id": "host",
"properties": [],
"condition": {"type": "field_exists", "field": "host.hostname"}
},
{
"type": "USER_ON_HOST",
"source_id": "user",
"target_id": "host",
"properties": [],
"condition": {"type": "field_exists", "field": "user.name"}
}
]
}
What This Produces
When a Suricata alert like "source.ip": "176.65.149.236", "destination.ip": "10.128.0.77", "destination.port": 22 is ingested, the graph gets:
- An
:IPnode for176.65.149.236with geo enrichment - An
:IPnode for10.128.0.77(merged with the asset-seeded one) - A
:Portnode for22(merged with the asset-seeded one) - A
:Signaturenode for the rule that fired COMMUNICATED_WITH,TRIGGERED,TARGETED_PORT,SRC_ORIGINATES_INrelationships
When an Elastic endpoint process alert from the same host arrives, the graph adds:
- A
:Processnode for/usr/bin/systemctl - A
:Processnode for/bin/sh(parent) SPAWNED,EXECUTED_ON,RAN_BY,RAISED_ALERTrelationships- The
:Hostnode is the same node as the one from assets — MERGE unifies them
Over time, the graph builds a connected picture where you can traverse from an attacking IP through its communication to a host, see what processes ran on that host, who ran them, and what alerts they triggered.
Querying the Graph
The graph structure encodes the answers — no pre-computation needed. Here are some example Cypher queries you can run in Neo4j Browser.
Port Policy Violations
Find traffic to known hosts on ports they don't allow:
MATCH (src:IP)-[:COMMUNICATED_WITH]->(dst:IP)<-[:HAS_PRIVATE_IP]-(h:Host)
MATCH (src)-[:TARGETED_PORT]->(p:Port)
WHERE NOT (h)-[:ALLOWS_PORT]->(p)
RETURN src.address, dst.address, p.number, h.name
Cross-Country Communication
Find IPs talking across country borders:
MATCH (src:IP)-[:COMMUNICATED_WITH]->(dst:IP),
(src)-[:SRC_ORIGINATES_IN]->(c1:Country),
(dst)-[:DST_ORIGINATES_IN]->(c2:Country)
WHERE c1 <> c2
RETURN src.address, c1.name, dst.address, c2.name
Process Trees with Alerts
See parent-child process chains and the alerts they triggered:
MATCH (pp:Process)-[:SPAWNED]->(p:Process)-[:EXECUTED_ON]->(h:Host)
OPTIONAL MATCH (p)-[:RAISED_ALERT]->(a:AlertRule)
OPTIONAL MATCH (p)-[:RAN_BY]->(u:User)
RETURN pp.executable, p.executable, u.name, h.name, a.name, a.risk_score
High Risk Users
Find users whose processes triggered the most severe alerts:
MATCH (u:User)<-[:RAN_BY]-(p:Process)-[:RAISED_ALERT]->(a:AlertRule)
RETURN u.name, count(DISTINCT a) AS alerts, max(toInteger(a.risk_score)) AS max_risk
ORDER BY max_risk DESC
Unknown Hosts Receiving Traffic
Find destination IPs that don't belong to any known host:
MATCH (src:IP)-[:COMMUNICATED_WITH]->(dst:IP)
WHERE NOT (dst)<-[:HAS_PRIVATE_IP|HAS_PUBLIC_IP]-(:Host)
RETURN dst.address, count(src) AS sources
ORDER BY sources DESC
Playground
The Playground feature allows you to test your DSL configuration against real alerts before deploying it to production.
The Playground inserts temporary data into Neo4j with an isolated identifier, lets you query and visualize the result, and cleans up when you're done. No permanent data is modified.
The flow is:
- Select one or more alerts from your job
- Click Simulate — Arcanna inserts nodes and relationships into Neo4j tagged with a temporary ID
- View the resulting graph visualization
- Click Cleanup to remove all temporary data, or adjust your DSL and simulate again
Validation Rules
The DSL is validated before processing. Common validation errors:
- Missing
id: Every node and asset must have a uniqueid. - Missing
is_label: Every node must have exactly one property with"is_label": true. - Dots in property names: Use
source_pathfor nested fields."name": "source.ip"is invalid; use"name": "address", "source_path": "source.ip". - Unknown
source_id/target_id: Relationship references must match a nodeid. - Unknown
target_node_id: Asset relationships must reference a valid nodeid. - Duplicate
id: Each nodeidmust be unique across the entire configuration.