Experiments Tests

Acceptance tests

A tutorial for creating and managing acceptance tests on Nextmv Cloud.

Acceptance tests are formal offline tests that verify if a system satisfies business requirements. In an optimization context, an acceptance test determines if business goals or key performance indicators (KPIs) are met by a new model (typically on a set of inputs always used for testing) and allow you to determine whether or not to deploy a model update to production.

When an acceptance test is run, data is collected from runs made with the baseline and candidate instances, and then the candidate instance's metrics are compared to the baseline instance’s metrics. The result of these comparisons is determined by the operator for how the metric should be evaluated. For example, if should increase is set for a metric, it means that the value of the metric in the output returned for the candidate instance should be greater than the same metric value returned from the baseline instance.

To be concise, an acceptance test is based on a batch experiment. For the metrics specified in the acceptance test, it compares the results of two instances: candidate vs baseline. For each metric (comparison), the acceptance test gives a pass/fail result based on the operator. An acceptance test can be thought of a view on a batch experiment, with a focus on the metrics.

The ID and name of the acceptance test and the underlying batch experiment must be the same.

When using subscription apps, make sure the candidate and baseline instances do not use a major version, i.e.: v1 or v2. Instead, assign a complete (specific) version to your instances, i.e.: v1.1.0.

Acceptance tests are designed to be visualized in the Console web interface. Go to the app, Experiments > Acceptance tab.

Acceptance test

There are several interfaces for creating batch experiments:

Defining metrics

When you are creating an acceptance test you must define the metrics you want to analyze. At least one metric is required to run an acceptance test. These metrics are user-defined, though if you are using a subscription app or a custom app based on a template, there are some pre-defined metrics available to you.

The metrics are governed by the statistics convention. Any item under the statistics block is a valid entry for an acceptance test metric. To specify a metric, use object dot notation for path reference, starting from the .statistics field of the output. The metric is specified relative to the parent statistics block.

To compare metrics, you must define the operator for the comparison.

OperatorSymbolDescription
eq==Equal to
gt>Greater than
ge>=Greater than or equal to
lt<Less than
le<=Less than or equal to
ne!=Not equal to

Consider the meal allocation output as an example.

{
  "options": {
    "solve": {
      "control": {
        "bool": [],
        "float": [],
        "int": [],
        "string": []
      },
      "duration": 10000000000,
      "mip": {
        "gap": {
          "absolute": 0.000001,
          "relative": 0.0001
        }
      },
      "verbosity": "off"
    }
  },
  "solutions": [
    {
      "meals": [
        {
          "name": "A",
          "quantity": 2
        },
        {
          "name": "B",
          "quantity": 3
        }
      ]
    }
  ],
  "statistics": {
    "result": {
      "custom": {
        "constraints": 2,
        "provider": "HiGHS",
        "status": "optimal",
        "variables": 2
      },
      "duration": 0.123,
      "value": 27
    },
    "run": {
      "duration": 0.123
    },
    "schema": "v1"
  },
  "version": {
    "go-mip": "VERSION",
    "sdk": "VERSION"
  }
}
Copy

These are valid metrics for the acceptance test:

  • result.value with le: the value of the result in the candidate must be less than or equal to the baseline.
  • result.custom.constraints with eq: the number of constraints in the candidate must be equal to the baseline.
  • result.custom.variables with eq: the number of variables in the candidate must be equal to the baseline.
  • run.duration with ge: the run duration of the candidate must be greter than or equal to the baseline.

Console

Go to the Console web interface, and open your app. Go to the Experiments > Acceptance tab. Click on New Acceptance Test. Fill in the fields.

Acceptance tests

A new batch experiment will be created with the same ID and name as the acceptance test.

To specify many metrics, it is recommended that you use the Free-form tab in the Metrics section. In this view, each metric is specified as a new line in this format:

path: operator

For example:

result.value: le
result.custom.constraints: eq
result.custom.variables: eq
run.duration: ge
Copy

Nextmv CLI

Define the desired acceptance test ID and name. As mentioned above, an acceptance test is based on a batch experiment.

  • If you already started a batch experiment, you don't need to provide the -s, --input-set-id flag. In that case, the ID and name of the acceptance test and the underlying batch experiment must be the same.
  • If you didn't start a batch experiment, you need to provide the -s, --input-set-id flag and a new batch experiment will be created for you, with the same ID and name as the acceptance test.

Start by defining the metrics you want the acceptance test to use.

nextmv experiment acceptance init
Copy

The command will produce a metrics.json file. Edit this file to include the metrics you want to use.

Once the metrics.json file is ready, run the following command to create the acceptance test.

nextmv experiment acceptance start \
    --app-id $APP_ID \
    --experiment-id $EXPERIMENT_ID \
    --name "YOUR_EXPERIMENT_NAME" \
    --baseline-instance-id $INSTANCE_ID \
    --candidate-instance-id $INSTANCE_ID \
    --input-set-id $INPUT_SET_ID \
    --description "An optional description" \
    --metrics metrics.json \
    --confirm
Copy

Python SDK

Define the desired acceptance test ID and name. As mentioned above, an acceptance test is based on a batch experiment.

  • If you already started a batch experiment, you don't need to provide the input_set_id parameter. In that case, the ID and name of the acceptance test and the underlying batch experiment must be the same.
  • If you didn't start a batch experiment, you need to provide the input_set_id and a new batch experiment will be created for you, with the same ID and name as the acceptance test.
import json
import os

from nextmv.cloud import (
    Application,
    Client,
    Comparison,
    Metric,
    MetricParams,
    MetricType,
)

client = Client(api_key=os.getenv("NEXTMV_API_KEY"))
app = Application(client=client, id=os.getenv("APP_ID"))
acceptance_test = app.new_acceptance_test(
    candidate_instance_id="latest-2",
    control_instance_id="latest",
    id=os.getenv("ACCEPTANCE_TEST_ID"),
    name=os.getenv("ACCEPTANCE_TEST_ID"),
    metrics=[
        Metric(
            field="result.value",
            metric_type=MetricType.direct_comparison,
            params=MetricParams(operator=Comparison.less_than),
            statistic="mean",
        ),
        Metric(
            field="result.custom.activated_vehicles",
            metric_type=MetricType.direct_comparison,
            params=MetricParams(operator=Comparison.greater_than),
            statistic="mean",
        ),
    ],
    # input_set_id=os.getenv("INPUT_SET_ID"), # Defining this would create a new batch experiment.
    description="An optional description",
)
print(json.dumps(acceptance_test.to_dict(), indent=2))  # Pretty print.
Copy

Cloud API

Define the desired acceptance test ID and name. As mentioned above, an acceptance test is based on a batch experiment. The acceptance test ID and name must be the same as the batch experiment ID and name, respectively.

POSThttps://api.cloud.nextmv.io/v1/applications/{application_id}/experiments/acceptance

Create and start acceptance test.

Create an acceptance test. The test relies on a batch experiment, so a batch experiment must already exist.

curl -sS -L -X POST \
    "https://api.cloud.nextmv.io/v1/applications/$APP_ID/experiments/acceptance" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $NEXTMV_API_KEY" \
    -d "{
      \"id\": \"YOUR-ACCEPTANCE-TEST\",
      \"experiment_id\": \"$EXPERIMENT_ID\",
      \"name\": \"$EXPERIMENT_ID\",
      \"control\": {
        \"instance_id\": \"$INSTANCE_ID\"
      },
      \"candidate\": {
        \"instance_id\": \"$INSTANCE_ID\"
      },
      \"metrics\": [
            {
                \"field\": \"result.value\",
                \"metric_type\": \"direct-comparison\",
                \"params\": {
                    \"operator\": \"lt\"
                },
                \"statistic\": \"mean\"
            },
            {
                \"field\": \"result.custom.activated_vehicles\",
                \"metric_type\": \"direct-comparison\",
                \"params\": {
                    \"operator\": \"gt\"
                },
                \"statistic\": \"mean\"
            }
        ],
      \"description\": \"An optional description\"
    }" | jq

Copy

You can get the information of an acceptance test or list the information for all acceptance tests in an app using the following endpoints:

GEThttps://api.cloud.nextmv.io/v1/applications/{application_id}/experiments/acceptance/{acceptance_id}

Get acceptance test information.

Get the information of an acceptance test.

GEThttps://api.cloud.nextmv.io/v1/applications/{application_id}/experiments/acceptance

List all the existing acceptance tests.

List acceptance tests.

Results

Also included is a Statistical Results table that can be used as an aid when interpreting the significance of the results. It includes the difference between the mean value of the metric for the candidate and the baseline instance, the percentage change of that difference, and the associated p-value.

The p-value is calculated with the Wilcoxon signed rank test with continuity correction. This value gives an indication of whether the change in value is statistically significant, but does not account for the intended direction of the test. If there is no difference in the data, the p-value is not provided.

Note that in cases where one of the runs from either the candidate or baseline instances failed, the paired observation for this input will be excluded from analysis.

Delete an acceptance test

Deleting an acceptance test will also delete all of the associated information. Note that deleting it will not delete the underlying batch experiment and its runs.

This action is permanent and cannot be undone.

To delete an acceptance test, you can use the following interfaces:

  • Console: use the web interface. Use the Delete button in the acceptance test details.

  • Nextmv CLI: use your terminal.

    nextmv experiment acceptance delete \
      --app-id "<YOUR-APP-ID>" \
      --experiment-id "<YOUR-ACCEPTANCE-TEST-ID>"
    
    Copy

    The command will prompt you to confirm the deletion. Use the --confirm flag to skip the confirmation.

  • Python SDK: use Python.

    import os
    
    from nextmv.cloud import Application, Client
    
    client = Client(api_key=os.getenv("NEXTMV_API_KEY"))
    app = Application(client=client, id="<YOUR-APP-ID>")
    app.delete_acceptance_test(acceptance_test_id="<YOUR-ACCEPTANCE-TEST-ID>")
    
    Copy

    You will not be prompted to confirm the deletion.

  • Cloud API: use this HTTP endpoint.

    DELETEhttps://api.cloud.nextmv.io/v1/applications/{application_id}/experiments/acceptance/{acceptance_id}

    Delete acceptance test.

    Delete an acceptance test.

    curl -L -X DELETE \
        "https://api.cloud.nextmv.io/v1/applications/$APP_ID/experiments/acceptance/$ACCEPTANCE_TEST_ID" \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer $NEXTMV_API_KEY"
    
    Copy

    You will not be prompted to confirm the deletion.

Page last updated

Go to on-page nav menu