Metrics API
The Metrics APIs are used to compute statistics regarding Arcanna AI model behavior or other platform metrics.
POST /api/v2/metrics/internal_metrics
Headers:
X-Arcanna-Api-Key
- Arcanna Management API Key
Parameters:
job_ids
: A list of specific job IDs to filter the results by. If this parameter is omitted or set to None, metrics for all jobs will be included.aggregation_time_interval
: The duration, specified using Elasticsearch date math intervals (e.g., "1h" for 1 hour, "24h" for 24 hours, or "7d" for 7 days), used to group and compute aggregated metrics. The default value is "7d".obfuscate_decision_points_names
: A boolean flag that controls the display of decision point names. When set to True (the default value), decision point names are hidden for privacy reasons. Setting it to False will reveal these names.start_datetime
: The beginning of the time range for which to retrieve metrics. This should be provided in a valid datetime format (e.g., ISO 8601).end_datetime
: The end of the time range for which to retrieve metrics. This should be provided in a valid datetime format (e.g., ISO 8601)
Body:
filters
- List of filters to be applied to the metrics events query- If multiple filters apply, they act as an AND operation between them
{
"filters": [
{
"field": "string",
"operator": "string",
"value": "string"
}
]
}
field
: The specific data field to apply the filter to.operator
: The comparison method to use for filtering. Choose one of the following:- "is" (equal to)
- "is not" (not equal to)
- "is one of" (matches any in a given list)
- "is not one of" (does not match any in a given list)
- "starts with"
- "not starts with"
- "contains"
- "not contains"
- "exists" (field has a value)
- "not exists" (field does not have a value)
- "lt" (less than)
- "lte" (less than or equal to)
- "gt" (greater than)
- "gte" (greater than or equal to)
value
: The criteria or data point used by the operator to filter the "field".
Response:
[
{
"job_id": <JOB_ID_1>,
"in_use_model": "<MODEL_PATH_1>",
"evaluation": {...},
"scores": {...},
"training": {...},
"decision_points": [...]
},
{
"job_id": <JOB_ID_2>,
"in_use_model": "<MODEL_PATH_2>",
"evaluation": {...},
"scores": {...},
"training": {...},
"decision_points": [...]
}
]
The API returns a list of metrics, either for the specified jobs or, by default, for all jobs if no selection is made.
Metrics:
For each job, the API provides the following details:
job_id
: The unique identifier of the job.in_use_model
: The identifier of the model currently active and in use for this job.evaluation
: Statistics detailing the performance of the model.scores
: Aggregated statistics derived from the model's performance.training
: Statistics related to the training process of the model.decision_points
: Statistics concerning the specific decision points utilized during the model's training.
Detailed metrics explanation
Evaluation & Scores Metrics
{
"overall": {
"alerts": {
"in_kb": {...},
"new": {...}
}
},
"per_model_metrics": [
{
"alerts": {
"in_kb": {...},
"new": {...}
},
"buckets": {
"in_kb": {...},
"new": {...}
},
"model_path": "<MODEL_PATH_N>"
},
...
]
}
- These metrics are categorized, first by overall and per-model metrics. Within these, they are further segmented by alerts and buckets, and then finally by alert status: new or in knowledge base (in_kb).:
overall
Metrics- These metrics are computed across all alerts, encompassing both new alerts and alerts already within the knowledge base, irrespective of the model that processed them.
- Each alert state is new or in knowledge base and processed by a model at a point in time
- Bucket-level metrics are omitted here because a single bucket can be processed by multiple models, which could lead to inaccurate overall scores.
per_model_metrics
Metrics- These provide the same set of metrics, but they are computed individually for each model.
- Note on Buckets: Unlike the overall metrics, bucket-level metrics are included here. This is feasible because the metrics are tied to a specific model, and the state of a bucket (new or in knowledge base) is considered disjoint within the context of that single model.
Evaluation Metrics
{
"confusion_matrix": [
[
4490,
0,
0
],
[
0,
749,
0
],
[
0,
0,
37
]
],
"overall_accuracy": 1,
"overall_f1_score": 1,
"overall_recall": 1,
"overall_precision": 1,
"metrics_per_decision": {
"Drop": {
"precision": 1,
"recall": 1,
"f1_score": 1,
"true_positives": 4490,
"true_negatives": 786,
"false_positives": 0,
"false_negatives": 0
},
"Investigate": {
"precision": 1,
"recall": 1,
"f1_score": 1,
"true_positives": 749,
"true_negatives": 4527,
"false_positives": 0,
"false_negatives": 0
},
"Escalate": {
"precision": 1,
"recall": 1,
"f1_score": 1,
"true_positives": 37,
"true_negatives": 5239,
"false_positives": 0,
"false_negatives": 0
}
}
}
For each specific combination of categories (overall/per model, alerts/buckets, and new/in_kb), the API returns the following metrics:
confusion_matrix
overall_accuracy
overall_f1_score
overall_recall
overall_precision
metrics_per_decision
- containing the following metrics for each label (eg. 'Drop', 'Investigate', 'Escalate'):precision
recall
f1_score
true_positives
true_negatives
false_positives
false_negatives
Scores Metrics
[
{
"timestamp": "2025-07-10T00:00:00.000Z",
"mean_outlier_score": 0.8972147808355444,
"median_outlier_score": 1.0999241471290588,
"max_outlier_score": 2.498734951019287,
"min_outlier_score": 0,
"mean_confidence_score": 48.484920986114986,
"median_confidence_score": 36.70000076293945,
"max_confidence_score": 100,
"min_confidence_score": 0,
"mean_confidence_model_score": 40.22700912233383,
"median_confidence_model_score": 0,
"max_confidence_model_score": 99.6323471069336,
"min_confidence_model_score": 0,
"outlier_flag_count": 0
},
...
]
Note: The API returns a list of aggregated metrics, with each entry corresponding to a time bucket defined by the aggregation_time_interval
.
For each specific combination of categories (overall/per model, alerts/buckets, and new/in_kb), the API returns the following time-aggregated metrics:
timestamp
: Aggregation buckets timestampmean_outlier_score
: The average outlier scoremedian_outlier_score
: The median outlier scoremax_outlier_score
: The highest outlier scoremin_outlier_score
: The lowest outlier scoremean_confidence_score
: The average confidence scoremedian_confidence_score
: The median confidence scoremax_confidence_score
: The highest confidence scoremin_confidence_score
: The lowest confidence scoremean_confidence_model_score
: The average confidence score reported by the modelmedian_confidence_model_score
: The median confidence score reported by the modelmax_confidence_model_score
: The highest confidence score reported by the modelmin_confidence_model_score
: The lowest confidence score reported by the modeloutlier_flag_count
: The total number of alerts/buckets flagged as outliers
Training Metrics
"training": {
"per_model_metrics": [
{
"kb_count_per_decision": {
"Drop": {
"alerts_count": 2066,
"buckets_count": 10
},
"Investigate": {
"alerts_count": 429,
"buckets_count": 5
},
"Escalate": {
"alerts_count": 21,
"buckets_count": 4
}
},
"buckets_in_kb": 19,
"events_in_kb": 2516,
"model_path": "<MODEL_PATH>",
"vocabulary_size": 88,
"max_sequence_length": 15,
"model_params_count": 40627
},
...
]
}
The training metrics present detailed statistics on the training process for each model, structured as follows:
kb_count_per_decision
: An object detailing the count of alerts and buckets that are currently in the knowledge base, broken down by the decision outcome (e.g., "Drop", "Investigate", "Escalate").alerts_count
: The number of alerts in the knowledge base for this specific decision category.buckets_count
: The number of buckets in the knowledge base associated with this specific decision category.
buckets_in_kb
: The total number buckets in the model knowledge base.events_in_kb
: The total number of alerts in the model knowledge base.model_path
: The model identifier.vocabulary_size
: Model vocabulary size.max_sequence_length
: Model max sequence length.model_params_count
: The total number of trainable parameters within the model.
Decision Points Metrics
"decision_points": [
{
"model_path": "<MODEL_PATH>",
"decision_points_count": 2,
"decision_points_metrics": [
{
"name": "0a136c1fb1",
"mean_word_count": 3.5555555555555554,
"median_word_count": 4,
"max_word_count": 5,
"min_word_count": 2,
"character_set": [
"n",
"o",
"p",
"r",
"s",
"t",
"u",
"v",
"w",
"y"
]
},
......
]
},
{
...
}
...
]
The decision points metrics present detailed statistics on the decision points used during the training process for each model, structured as follows:
model_path
: Model identifierdecision_points_count
: The total number of decision pointsdecision_points_metrics
: A list of metrics, with each object detailing statistics for a specific decision pointname
: The unique identifier or name of the decision point (obfuscated by default)mean_word_count
: The average number of words associated with this decision pointmedian_word_count
: The median number of words associated with this decision pointmax_word_count
: The maximum number of words observed for this decision pointmin_word_count
: The minimum number of words observed for this decision pointcharacter_set
: A list of unique characters found within the data related to this decision point