Skip to main content

Metrics API

The Metrics APIs are used to compute statistics regarding Arcanna AI model behavior or other platform metrics.

POST /api/v2/metrics/internal_metrics

Headers:

Parameters:

  • job_ids: A list of specific job IDs to filter the results by. If this parameter is omitted or set to None, metrics for all jobs will be included.
  • aggregation_time_interval: The duration, specified using Elasticsearch date math intervals (e.g., "1h" for 1 hour, "24h" for 24 hours, or "7d" for 7 days), used to group and compute aggregated metrics. The default value is "7d".
  • obfuscate_decision_points_names: A boolean flag that controls the display of decision point names. When set to True (the default value), decision point names are hidden for privacy reasons. Setting it to False will reveal these names.
  • start_datetime: The beginning of the time range for which to retrieve metrics. This should be provided in a valid datetime format (e.g., ISO 8601).
  • end_datetime: The end of the time range for which to retrieve metrics. This should be provided in a valid datetime format (e.g., ISO 8601)

Body:

  • filters - List of filters to be applied to the metrics events query
  • If multiple filters apply, they act as an AND operation between them
{
"filters": [
{
"field": "string",
"operator": "string",
"value": "string"
}
]
}
  • field: The specific data field to apply the filter to.
  • operator: The comparison method to use for filtering. Choose one of the following:
    • "is" (equal to)
    • "is not" (not equal to)
    • "is one of" (matches any in a given list)
    • "is not one of" (does not match any in a given list)
    • "starts with"
    • "not starts with"
    • "contains"
    • "not contains"
    • "exists" (field has a value)
    • "not exists" (field does not have a value)
    • "lt" (less than)
    • "lte" (less than or equal to)
    • "gt" (greater than)
    • "gte" (greater than or equal to)
  • value: The criteria or data point used by the operator to filter the "field".

Response:

[
{
"job_id": <JOB_ID_1>,
"in_use_model": "<MODEL_PATH_1>",
"evaluation": {...},
"scores": {...},
"training": {...},
"decision_points": [...]
},
{
"job_id": <JOB_ID_2>,
"in_use_model": "<MODEL_PATH_2>",
"evaluation": {...},
"scores": {...},
"training": {...},
"decision_points": [...]
}
]

The API returns a list of metrics, either for the specified jobs or, by default, for all jobs if no selection is made.

Metrics:

For each job, the API provides the following details:

  • job_id: The unique identifier of the job.
  • in_use_model: The identifier of the model currently active and in use for this job.
  • evaluation: Statistics detailing the performance of the model.
  • scores: Aggregated statistics derived from the model's performance.
  • training: Statistics related to the training process of the model.
  • decision_points: Statistics concerning the specific decision points utilized during the model's training.

Detailed metrics explanation

Evaluation & Scores Metrics

{
"overall": {
"alerts": {
"in_kb": {...},
"new": {...}
}
},
"per_model_metrics": [
{
"alerts": {
"in_kb": {...},
"new": {...}
},
"buckets": {
"in_kb": {...},
"new": {...}
},
"model_path": "<MODEL_PATH_N>"
},
...
]
}
  • These metrics are categorized, first by overall and per-model metrics. Within these, they are further segmented by alerts and buckets, and then finally by alert status: new or in knowledge base (in_kb).:
  • overall Metrics
    • These metrics are computed across all alerts, encompassing both new alerts and alerts already within the knowledge base, irrespective of the model that processed them.
    • Each alert state is new or in knowledge base and processed by a model at a point in time
    • Bucket-level metrics are omitted here because a single bucket can be processed by multiple models, which could lead to inaccurate overall scores.
  • per_model_metrics Metrics
    • These provide the same set of metrics, but they are computed individually for each model.
    • Note on Buckets: Unlike the overall metrics, bucket-level metrics are included here. This is feasible because the metrics are tied to a specific model, and the state of a bucket (new or in knowledge base) is considered disjoint within the context of that single model.

Evaluation Metrics

{
"confusion_matrix": [
[
4490,
0,
0
],
[
0,
749,
0
],
[
0,
0,
37
]
],
"overall_accuracy": 1,
"overall_f1_score": 1,
"overall_recall": 1,
"overall_precision": 1,
"metrics_per_decision": {
"Drop": {
"precision": 1,
"recall": 1,
"f1_score": 1,
"true_positives": 4490,
"true_negatives": 786,
"false_positives": 0,
"false_negatives": 0
},
"Investigate": {
"precision": 1,
"recall": 1,
"f1_score": 1,
"true_positives": 749,
"true_negatives": 4527,
"false_positives": 0,
"false_negatives": 0
},
"Escalate": {
"precision": 1,
"recall": 1,
"f1_score": 1,
"true_positives": 37,
"true_negatives": 5239,
"false_positives": 0,
"false_negatives": 0
}
}
}

For each specific combination of categories (overall/per model, alerts/buckets, and new/in_kb), the API returns the following metrics:

  • confusion_matrix
  • overall_accuracy
  • overall_f1_score
  • overall_recall
  • overall_precision
  • metrics_per_decision - containing the following metrics for each label (eg. 'Drop', 'Investigate', 'Escalate'):
    • precision
    • recall
    • f1_score
    • true_positives
    • true_negatives
    • false_positives
    • false_negatives

Scores Metrics

[
{
"timestamp": "2025-07-10T00:00:00.000Z",
"mean_outlier_score": 0.8972147808355444,
"median_outlier_score": 1.0999241471290588,
"max_outlier_score": 2.498734951019287,
"min_outlier_score": 0,
"mean_confidence_score": 48.484920986114986,
"median_confidence_score": 36.70000076293945,
"max_confidence_score": 100,
"min_confidence_score": 0,
"mean_confidence_model_score": 40.22700912233383,
"median_confidence_model_score": 0,
"max_confidence_model_score": 99.6323471069336,
"min_confidence_model_score": 0,
"outlier_flag_count": 0
},
...
]

Note: The API returns a list of aggregated metrics, with each entry corresponding to a time bucket defined by the aggregation_time_interval.

For each specific combination of categories (overall/per model, alerts/buckets, and new/in_kb), the API returns the following time-aggregated metrics:

  • timestamp: Aggregation buckets timestamp
  • mean_outlier_score: The average outlier score
  • median_outlier_score: The median outlier score
  • max_outlier_score: The highest outlier score
  • min_outlier_score: The lowest outlier score
  • mean_confidence_score: The average confidence score
  • median_confidence_score: The median confidence score
  • max_confidence_score: The highest confidence score
  • min_confidence_score: The lowest confidence score
  • mean_confidence_model_score: The average confidence score reported by the model
  • median_confidence_model_score: The median confidence score reported by the model
  • max_confidence_model_score: The highest confidence score reported by the model
  • min_confidence_model_score: The lowest confidence score reported by the model
  • outlier_flag_count: The total number of alerts/buckets flagged as outliers

Training Metrics

"training": {
"per_model_metrics": [
{
"kb_count_per_decision": {
"Drop": {
"alerts_count": 2066,
"buckets_count": 10
},
"Investigate": {
"alerts_count": 429,
"buckets_count": 5
},
"Escalate": {
"alerts_count": 21,
"buckets_count": 4
}
},
"buckets_in_kb": 19,
"events_in_kb": 2516,
"model_path": "<MODEL_PATH>",
"vocabulary_size": 88,
"max_sequence_length": 15,
"model_params_count": 40627
},
...
]
}

The training metrics present detailed statistics on the training process for each model, structured as follows:

  • kb_count_per_decision: An object detailing the count of alerts and buckets that are currently in the knowledge base, broken down by the decision outcome (e.g., "Drop", "Investigate", "Escalate").
    • alerts_count: The number of alerts in the knowledge base for this specific decision category.
    • buckets_count: The number of buckets in the knowledge base associated with this specific decision category.
  • buckets_in_kb: The total number buckets in the model knowledge base.
  • events_in_kb: The total number of alerts in the model knowledge base.
  • model_path: The model identifier.
  • vocabulary_size: Model vocabulary size.
  • max_sequence_length: Model max sequence length.
  • model_params_count: The total number of trainable parameters within the model.

Decision Points Metrics

"decision_points": [
{
"model_path": "<MODEL_PATH>",
"decision_points_count": 2,
"decision_points_metrics": [
{
"name": "0a136c1fb1",
"mean_word_count": 3.5555555555555554,
"median_word_count": 4,
"max_word_count": 5,
"min_word_count": 2,
"character_set": [
"n",
"o",
"p",
"r",
"s",
"t",
"u",
"v",
"w",
"y"
]
},
......
]
},
{
...
}
...
]

The decision points metrics present detailed statistics on the decision points used during the training process for each model, structured as follows:

  • model_path: Model identifier
  • decision_points_count: The total number of decision points
  • decision_points_metrics: A list of metrics, with each object detailing statistics for a specific decision point
    • name: The unique identifier or name of the decision point (obfuscated by default)
    • mean_word_count: The average number of words associated with this decision point
    • median_word_count: The median number of words associated with this decision point
    • max_word_count: The maximum number of words observed for this decision point
    • min_word_count: The minimum number of words observed for this decision point
    • character_set: A list of unique characters found within the data related to this decision point