Guidance on how to identify and fix some common errors.
There are a few reasons why your application may be returning default predictions. Your first debugging clue is the “default_explanation” field in the response body.
This means that Clipper could not find any model containers to route your request to. Here are some steps to debug this problem.
Is your model linked to your application?
Run the following command to see whether your application has a model linked to it:
If you forgot to link your model to your application, you can do so with:
Is your model container running?
As with any code, your model container may have contained
a bug which caused it to crash or render it unable to initialize correctly.
For example, if you are using the Python model deployer and your deployed Python model referenced a library
that the container was unable to resolve, your container will crash due to an uncaught
ImportError when it tries
to load your model. You can check the number of live Docker containers for your model with this command:
If this returns 0, your model container crashed for some reason. To identify why the container crashed, inspect the container logs for an error message about why the container exited.
You can fetch the logs for all Docker containers associated with Clipper
(including containers that have crashed) with
clipper_conn.get_clipper_logs(). You will need
to hunt around in the logs a little bit to identify the right log file, but each log file name includes
the Docker container ID of the container it was collected from. If you are running Clipper using the
DockerContainerManager, you can use
docker ps -a --filter label=ai.clipper.container.label to list
all of the Docker containers associated with Clipper. This may help you identify the correct log file more easily.
Has your model container finished initializing?
Model containers can take a long time to initialize. For large models, it can take tens of seconds
to load and deserialize the model state, and some machine-learning frameworks have non-trivial initialization
overheads. If you’ve determined that your model container is still running (from step 2), inspect the logs again
to see if it has finished initializing. When the container has fully initialized, it will log “Serving predictions for
If you see this message in the “default_explanation” field, it means that at some point a model container for your model successfully connected to Clipper, but Clipper did not receive a prediction from the container in time for the current request. Here are some steps to debug this problem.
Is your application SLO too low?
When you register an application, you set the latency SLO – the amount of time that Clipper will wait for a prediction from the model container before returning the default response. If you set this value too low, Clipper will return a response before your model is done rendering a prediction. Each model container logs how long each prediction took, and the Clipper metrics track the latency distribution of each model container. Run the following command to inspect the Clipper metrics:
This command will return a JSON object. The “histograms” field includes latency histograms for every application and every model registered in Clipper. Find the relevant histograms for your model and application. If the mean prediction latency for your model is higher than the latency SLO you set, your model is too slow. You can fix this by creating a new application with a higher SLO and linking your model to that application.
Did your model container crash?
It’s possible that your model container initialized without problems and connected to Clipper, but then crashed during actual prediction processing. To determine whether your model container has crashed, repeat step 2 from the previous section to get the number of replicas for a model and inspect the container logs.
If you determine that your model container has crashed, the container log should have a stack trace that will help you identify the problem. One common reason that model containers crash, especially when deploying using one of the provided model deployers, is that the prediction function has the wrong interface. Remember, the function must accept a list of inputs of the specified input type. And it must return a list of strings. A common mistake is to deploy a prediction function that operates on a single input at a time, rather than processsing a list of inputs at a time as a batch.
The commands to inspect the number of model containers and fetch container logs are just convenience
wrappers around Docker or Kubernetes commands. If you are comfortable with the
command line tools, you can just inspect the containers directly.