Skip to main content

Arcanna Ontology

EXPERIMENTAL

This feature is currently experimental. The DSL syntax and behavior may change in future releases. Use it in non-production environments first and provide feedback to the Arcanna team.

Arcanna Ontology lets you define how security alerts are transformed into a knowledge graph. You write a JSON-based DSL that describes what entities (nodes) to extract from your alerts, how they relate to each other, and what asset inventory data to seed — Arcanna handles the rest.

The result is a Neo4j graph that you can query, visualize, and reason over to find attack patterns, policy violations, and lateral movement across your infrastructure.

What is an Ontology

Before writing a DSL, it helps to understand three core concepts from Neo4j.

Nodes

A node represents an entity — an IP address, a host, a user, a process, a signature. Each node has a label (its type) and properties (key-value pairs that describe it).

(:IP {address: "176.65.149.236", geo_country: "Russia"})
(:Host {name: "web-server-01", environment: "production"})
(:User {name: "root"})

Think of labels as table names and properties as columns. A node labeled IP with property address: "176.65.149.236" is one specific IP in your graph.

Relationships

A relationship connects two nodes with a direction and a type. Relationships can also carry properties.

(:IP)-[:COMMUNICATED_WITH {transport: "tcp", bytes: 4096}]->(:IP)
(:Process)-[:EXECUTED_ON]->(:Host)
(:User)-[:RAN_BY]->(:Process)

The relationship type (e.g. COMMUNICATED_WITH) describes how two entities are connected. Direction matters — A -[:COMMUNICATED_WITH]-> B means A initiated communication toward B.

Properties

Both nodes and relationships store properties as key-value pairs. In Arcanna's DSL, you map these properties to fields in your alert data using source paths — dot-separated paths into the alert JSON.

For example, source.geo.country_name extracts the country from this alert structure:

{
"source": {
"ip": "176.65.149.236",
"geo": {
"country_name": "Russia"
}
}
}

MERGE and Deduplication

Arcanna uses Neo4j's MERGE operation, which means: find this node if it exists, create it if it doesn't. If three alerts reference the same IP 176.65.149.236, you get one IP node with updated seen_count and last_seen — not three duplicates.

This is the key reason a graph database works well for security data. Over time, your graph accumulates a connected picture of your infrastructure without duplication.

DSL Structure

A DSL configuration is a JSON object with four top-level sections:

{
"timestamp_field": "@timestamp",
"include_job_id": true,
"assets": [],
"nodes": [],
"relationships": []
}
FieldRequiredDescription
timestamp_fieldNoWhich alert field contains the timestamp. Defaults to @timestamp.
include_job_idNoIf true (default), all nodes are scoped by job ID — each job has its own isolated graph. Set to false for a shared graph across jobs.
assetsNoStatic inventory data (hosts, allowed ports) seeded before alert processing.
nodesYesDefines what entities to extract from each alert.
relationshipsYesDefines how extracted entities connect to each other.

Defining Nodes

Each entry in the nodes array tells Arcanna: "Extract this type of entity from each alert."

{
"id": "ip_source",
"label": "IP",
"properties": [
{"name": "address", "source_path": "source.ip", "is_label": true},
{"name": "geo_country", "source_path": "source.geo.country_name"},
{"name": "as_org", "source_path": "source.as.organization.name"}
]
}
FieldRequiredDescription
idYesUnique identifier for this node definition. Used by relationships to reference it.
labelYesThe Neo4j label (node type). Multiple node definitions can share the same label.
propertiesYesList of properties to extract. Exactly one must have "is_label": true — this is the property used to identify and deduplicate the node.

The id and label Distinction

This is the most important concept in the DSL. The id is an internal reference for wiring relationships. The label is what Neo4j sees.

Two node definitions can share the same label but have different ids and source_paths. This is how you handle the same entity type appearing in different fields:

{
"id": "ip_source",
"label": "IP",
"properties": [
{"name": "address", "source_path": "source.ip", "is_label": true}
]
},
{
"id": "ip_dest",
"label": "IP",
"properties": [
{"name": "address", "source_path": "destination.ip", "is_label": true}
]
}

Both create :IP nodes in Neo4j. If source.ip and destination.ip are the same value (e.g. 10.128.0.77), MERGE produces one node — no duplicates. But during relationship building, ip_source and ip_dest are tracked separately so Arcanna knows which end of COMMUNICATED_WITH is which.

Properties

Each property maps a Neo4j property name to a source path in the alert JSON.

{"name": "geo_country", "source_path": "source.geo.country_name"}
FieldRequiredDescription
nameYesThe property name stored in Neo4j. Must not contain dots.
source_pathYesDot-separated path into the alert JSON to extract the value.
is_labelNoIf true, this property is used as the node's identity for MERGE. Exactly one per node definition.
is_timestampNoIf true, the value is stored as a Neo4j datetime.

Defining Relationships

Each entry in relationships connects two node definitions:

{
"type": "COMMUNICATED_WITH",
"source_id": "ip_source",
"target_id": "ip_dest",
"properties": [
{"name": "transport", "source_path": "network.transport"}
],
"condition": {"type": "field_exists", "field": "destination.ip"}
}
FieldRequiredDescription
typeYesThe Neo4j relationship type.
source_idYesThe id of the source node definition.
target_idYesThe id of the target node definition.
propertiesNoProperties to extract onto the relationship.
conditionNoOnly create this relationship if the condition is met.

Conditions

Conditions prevent Arcanna from creating relationships when the relevant data is missing from an alert. Without conditions, a Suricata network alert would try to create process relationships (which don't exist in it), and vice versa.

{"type": "field_exists", "field": "destination.ip"}

Creates the relationship only if destination.ip is present in the alert.

Available condition types: field_exists, field_equals, field_not_equals, field_contains, fields_equal, fields_not_equal, and, or.

Defining Assets

Assets are static inventory data — hosts, their IPs, allowed ports — that get seeded into the graph before alert processing. This lets you answer questions like "Was this traffic to a known host?" or "Is this port allowed?" by traversing the graph structure.

{
"assets": [
{
"id": "asset_host",
"label": "Host",
"key_property": "name",
"data": [
{
"name": "installer-arcanna-ai",
"environment": "production",
"cloud_provider": "gcp",
"private_ips": ["10.128.0.77"],
"public_ips": ["34.69.67.117"],
"allowed_ports": [22, 80, 443]
}
],
"relationships": [
{
"type": "HAS_PRIVATE_IP",
"target_node_id": "ip_source",
"source_field": "private_ips"
},
{
"type": "ALLOWS_PORT",
"target_node_id": "port",
"source_field": "allowed_ports"
}
]
}
]
}
FieldRequiredDescription
idYesUnique identifier for this asset definition.
labelYesThe Neo4j label. Should match a node label if you want MERGE to unify them (e.g. Host from assets merges with Host from alerts).
key_propertyYesWhich property uniquely identifies this asset (used as MERGE key).
dataYesArray of asset records. Scalar values become node properties. List values are used by relationships.
relationshipsNoLinks from this asset to nodes defined in the nodes section.

Asset Relationships

Asset relationships use target_node_id to reference a node definition by its id. Arcanna resolves the target's label and key property automatically.

{
"type": "HAS_PRIVATE_IP",
"target_node_id": "ip_source",
"source_field": "private_ips"
}

This creates one HAS_PRIVATE_IP relationship per value in the private_ips list, pointing to an :IP node (because ip_source is defined with label IP and key property address).

Complete Example

This DSL handles both Suricata network alerts and Elastic endpoint process alerts in a single configuration:

{
"timestamp_field": "@timestamp",
"include_job_id": true,

"assets": [
{
"id": "asset_host",
"label": "Host",
"key_property": "name",
"data": [
{
"name": "installer-arcanna-ai",
"environment": "production",
"cloud_provider": "gcp",
"cloud_region": "us-central1",
"private_ips": ["10.128.0.77"],
"public_ips": ["34.69.67.117"],
"allowed_ports": [22, 80, 443]
}
],
"relationships": [
{"type": "HAS_PRIVATE_IP", "target_node_id": "ip_source", "source_field": "private_ips"},
{"type": "HAS_PUBLIC_IP", "target_node_id": "ip_source", "source_field": "public_ips"},
{"type": "ALLOWS_PORT", "target_node_id": "port", "source_field": "allowed_ports"}
]
}
],

"nodes": [
{
"id": "ip_source",
"label": "IP",
"properties": [
{"name": "address", "source_path": "source.ip", "is_label": true},
{"name": "geo_country", "source_path": "source.geo.country_name"},
{"name": "geo_city", "source_path": "source.geo.city_name"},
{"name": "as_org", "source_path": "source.as.organization.name"}
]
},
{
"id": "ip_dest",
"label": "IP",
"properties": [
{"name": "address", "source_path": "destination.ip", "is_label": true}
]
},
{
"id": "host",
"label": "Host",
"properties": [
{"name": "name", "source_path": "host.hostname", "is_label": true},
{"name": "os_full", "source_path": "host.os.full"},
{"name": "architecture", "source_path": "host.architecture"}
]
},
{
"id": "signature",
"label": "Signature",
"properties": [
{"name": "name", "source_path": "rule.name", "is_label": true},
{"name": "sid", "source_path": "rule.id"},
{"name": "category", "source_path": "rule.category"}
]
},
{
"id": "attack_category",
"label": "AttackCategory",
"properties": [
{"name": "name", "source_path": "rule.category", "is_label": true}
]
},
{
"id": "country_source",
"label": "Country",
"properties": [
{"name": "name", "source_path": "source.geo.country_name", "is_label": true},
{"name": "iso_code", "source_path": "source.geo.country_iso_code"}
]
},
{
"id": "country_dest",
"label": "Country",
"properties": [
{"name": "name", "source_path": "destination.geo.country_name", "is_label": true},
{"name": "iso_code", "source_path": "destination.geo.country_iso_code"}
]
},
{
"id": "port",
"label": "Port",
"properties": [
{"name": "number", "source_path": "destination.port", "is_label": true}
]
},
{
"id": "process",
"label": "Process",
"properties": [
{"name": "executable", "source_path": "process.executable", "is_label": true},
{"name": "name", "source_path": "process.name"},
{"name": "command_line", "source_path": "process.command_line"},
{"name": "hash_sha256", "source_path": "process.hash.sha256"}
]
},
{
"id": "parent_process",
"label": "Process",
"properties": [
{"name": "executable", "source_path": "process.parent.executable", "is_label": true},
{"name": "name", "source_path": "process.parent.name"}
]
},
{
"id": "user",
"label": "User",
"properties": [
{"name": "name", "source_path": "user.name", "is_label": true}
]
},
{
"id": "alert_rule",
"label": "AlertRule",
"properties": [
{"name": "name", "source_path": "kibana.alert.rule.name", "is_label": true},
{"name": "severity", "source_path": "kibana.alert.severity"},
{"name": "risk_score", "source_path": "kibana.alert.risk_score"}
]
}
],

"relationships": [
{
"type": "COMMUNICATED_WITH",
"source_id": "ip_source",
"target_id": "ip_dest",
"properties": [
{"name": "transport", "source_path": "network.transport"},
{"name": "bytes", "source_path": "network.bytes"}
],
"condition": {"type": "field_exists", "field": "destination.ip"}
},
{
"type": "TRIGGERED",
"source_id": "ip_source",
"target_id": "signature",
"properties": [],
"condition": {"type": "field_exists", "field": "rule.name"}
},
{
"type": "CATEGORIZED_AS",
"source_id": "signature",
"target_id": "attack_category",
"properties": [],
"condition": {"type": "field_exists", "field": "rule.category"}
},
{
"type": "SRC_ORIGINATES_IN",
"source_id": "ip_source",
"target_id": "country_source",
"properties": [],
"condition": {"type": "field_exists", "field": "source.geo.country_name"}
},
{
"type": "DST_ORIGINATES_IN",
"source_id": "ip_dest",
"target_id": "country_dest",
"properties": [],
"condition": {"type": "field_exists", "field": "destination.geo.country_name"}
},
{
"type": "TARGETED_PORT",
"source_id": "ip_source",
"target_id": "port",
"properties": [],
"condition": {"type": "field_exists", "field": "destination.port"}
},
{
"type": "EXECUTED_ON",
"source_id": "process",
"target_id": "host",
"properties": [],
"condition": {"type": "field_exists", "field": "process.executable"}
},
{
"type": "RAN_BY",
"source_id": "process",
"target_id": "user",
"properties": [],
"condition": {"type": "field_exists", "field": "process.executable"}
},
{
"type": "SPAWNED",
"source_id": "parent_process",
"target_id": "process",
"properties": [],
"condition": {"type": "field_exists", "field": "process.parent.executable"}
},
{
"type": "RAISED_ALERT",
"source_id": "process",
"target_id": "alert_rule",
"properties": [
{"name": "risk_score", "source_path": "kibana.alert.risk_score"}
],
"condition": {"type": "field_exists", "field": "kibana.alert.rule.name"}
},
{
"type": "ALERT_ON_HOST",
"source_id": "alert_rule",
"target_id": "host",
"properties": [],
"condition": {"type": "field_exists", "field": "host.hostname"}
},
{
"type": "USER_ON_HOST",
"source_id": "user",
"target_id": "host",
"properties": [],
"condition": {"type": "field_exists", "field": "user.name"}
}
]
}

What This Produces

When a Suricata alert like "source.ip": "176.65.149.236", "destination.ip": "10.128.0.77", "destination.port": 22 is ingested, the graph gets:

  • An :IP node for 176.65.149.236 with geo enrichment
  • An :IP node for 10.128.0.77 (merged with the asset-seeded one)
  • A :Port node for 22 (merged with the asset-seeded one)
  • A :Signature node for the rule that fired
  • COMMUNICATED_WITH, TRIGGERED, TARGETED_PORT, SRC_ORIGINATES_IN relationships

When an Elastic endpoint process alert from the same host arrives, the graph adds:

  • A :Process node for /usr/bin/systemctl
  • A :Process node for /bin/sh (parent)
  • SPAWNED, EXECUTED_ON, RAN_BY, RAISED_ALERT relationships
  • The :Host node is the same node as the one from assets — MERGE unifies them

Over time, the graph builds a connected picture where you can traverse from an attacking IP through its communication to a host, see what processes ran on that host, who ran them, and what alerts they triggered.

Querying the Graph

The graph structure encodes the answers — no pre-computation needed. Here are some example Cypher queries you can run in Neo4j Browser.

Port Policy Violations

Find traffic to known hosts on ports they don't allow:

MATCH (src:IP)-[:COMMUNICATED_WITH]->(dst:IP)<-[:HAS_PRIVATE_IP]-(h:Host)
MATCH (src)-[:TARGETED_PORT]->(p:Port)
WHERE NOT (h)-[:ALLOWS_PORT]->(p)
RETURN src.address, dst.address, p.number, h.name

Cross-Country Communication

Find IPs talking across country borders:

MATCH (src:IP)-[:COMMUNICATED_WITH]->(dst:IP),
(src)-[:SRC_ORIGINATES_IN]->(c1:Country),
(dst)-[:DST_ORIGINATES_IN]->(c2:Country)
WHERE c1 <> c2
RETURN src.address, c1.name, dst.address, c2.name

Process Trees with Alerts

See parent-child process chains and the alerts they triggered:

MATCH (pp:Process)-[:SPAWNED]->(p:Process)-[:EXECUTED_ON]->(h:Host)
OPTIONAL MATCH (p)-[:RAISED_ALERT]->(a:AlertRule)
OPTIONAL MATCH (p)-[:RAN_BY]->(u:User)
RETURN pp.executable, p.executable, u.name, h.name, a.name, a.risk_score

High Risk Users

Find users whose processes triggered the most severe alerts:

MATCH (u:User)<-[:RAN_BY]-(p:Process)-[:RAISED_ALERT]->(a:AlertRule)
RETURN u.name, count(DISTINCT a) AS alerts, max(toInteger(a.risk_score)) AS max_risk
ORDER BY max_risk DESC

Unknown Hosts Receiving Traffic

Find destination IPs that don't belong to any known host:

MATCH (src:IP)-[:COMMUNICATED_WITH]->(dst:IP)
WHERE NOT (dst)<-[:HAS_PRIVATE_IP|HAS_PUBLIC_IP]-(:Host)
RETURN dst.address, count(src) AS sources
ORDER BY sources DESC

Playground

EXPERIMENTAL

The Playground feature allows you to test your DSL configuration against real alerts before deploying it to production.

The Playground inserts temporary data into Neo4j with an isolated identifier, lets you query and visualize the result, and cleans up when you're done. No permanent data is modified.

The flow is:

  1. Select one or more alerts from your job
  2. Click Simulate — Arcanna inserts nodes and relationships into Neo4j tagged with a temporary ID
  3. View the resulting graph visualization
  4. Click Cleanup to remove all temporary data, or adjust your DSL and simulate again

Validation Rules

The DSL is validated before processing. Common validation errors:

  • Missing id: Every node and asset must have a unique id.
  • Missing is_label: Every node must have exactly one property with "is_label": true.
  • Dots in property names: Use source_path for nested fields. "name": "source.ip" is invalid; use "name": "address", "source_path": "source.ip".
  • Unknown source_id / target_id: Relationship references must match a node id.
  • Unknown target_node_id: Asset relationships must reference a valid node id.
  • Duplicate id: Each node id must be unique across the entire configuration.