Metis Machine's Skafos

Machine Learning Delivered. A Machine Learning automation platform for those that want to focus on their work.

Welcome to the Metis Machine documentation hub. You'll find comprehensive guides and documentation to help you start working with Metis Machine's Skafos platform as quickly as possible, as well as support if you get stuck. Fire it up!

Get Started    

The Project is the most central construct of Skafos.
Regardless of the complexity of your business use case, there are probably multiple steps in your end-to-end machine learning pipeline:

  • External data needs to be ingested, processed, and stored
  • ML features need to be engineered and a model needs to be trained
  • New incoming data needs to be scored by a live model(s)
  • Model output needs to be exposed via REST API
  • Predictions need to be monitored for model drift

Moreover, these steps each serves a different function within your ML pipeline, making each step a prime candidate to be treated as an independent microservice rather than a single monolithic python file. Skafos provides tooling that makes it easy to discretize each step as a single job, and then orchestrate them as a pipeline. Each step, or chunk of code, is called a Job, and they can work either independently or together to form a Project.

What's in a Project

A Project is a user-managed repository containing the following items:

  • Code for each Job
  • Run-time Configuration
  • Project Dependencies & Requirements

You can create a brand new Project to deploy and monitor your ML pipeline directly from the command line interface (CLI).

$ skafos init my_new_project

The Skafos CLI section contains detailed information about installation and usage details.
Once you’ve created a project, you’ll need to create jobs, define any dependencies, configure the project, and deploy the project into your operational systems.


We also provide a series of starter Templates to help get you moving quickly!


Because a project is a collection of Jobs, Skafos enables you to configure the way each job runs. When designing an end-to-end ML pipeline, sometimes it may be useful to:

  • Schedule jobs to run at specific times (hourly, daily, weekly, every 2 mins, etc).
  • Chain jobs to run one after the other.
  • Parallelize jobs by running multiple instances at once.
  • Scale up a job’s computational limits with more CPUs and Memory resources.
  • Define a job’s unique entrypoint.
  • Utilize an AddOn such as a Skafos Queue or Spark Cluster to increase speed and performance.
    Skafos provides a simple means to manage your project’s configuration options. Each project comes with it’s own metis.config.yml file that outlines user-defined runtime behavior of deployed Jobs.

The Config File

The config file, is the central orchestration component of each project. Living at the top-level of the project code repository, it outlines the user-defined runtime behavior of deployed Jobs. When your project is first initialized, a metis.config.yml file is also generated.

Basic Example

project_token: <project_token>
name: my_new_project
  - job_id: <job_id>
    language: python
    name: Main
    entrypoint: ""

Complex Example
If your project contains multiple Jobs, each may require specific run-time settings. Below is an example that contains several jobs that work together:

project_token: <project_token>
name: my_new_project
  - job_id: <job_id_1>
    language: python
    name: ingest
    entrypoint: ""
    schedule: “0 11 * * *”
  - job_id: <job_id_2>
    language: python
    name: train
    entrypoint: ""
    dependencies: ["<job_id_1>"]
        cpu: 6            # CPU workloads
        memory: 6Gi       # CPU workloads 1 # GPU workloads
  - job_id: <job_id_3>
    language: python
    name: score
    entrypoint: ""
    dependencies: ["<job_id_2>"]
        cpu: 1            # CPU workloads
        memory: 4Gi       # CPU workloads 1 # GPU workloads
  - job_id: <job_id_4>
    language: python
    name: report
    entrypoint: ""
    dependencies: ["<job_id_3>"]

In the complex example, we show 4 jobs that have been scheduled & chained together (dependencies), requiring different resource allocations. Most ML pipelines will require configuration files of this type. Each job can have requirements (e.g. scheduling, resources, etc) that differ from the other jobs within the project, forming a powerful constellation of microservices that operationalize your ML pipeline.

CPU / GPU workloads

Not every model can take advantage of a GPU, and for those jobs we recommend not adding the GPU workloads resource key to your config file, if you have one, or selecting it in the dashboard. However, if your model can take advantage of CUDA based GPUs, we do highly recommend that you enable that key. The CPU and Memory limits can be omitted for GPU workloads only, and you will get 3.5 vCPU, and 54 GiB of memory. You will only get 1 vCPU and 2 GiB of memory if you don't use a GPU and do not supply CPU / Memory limits.

Dependencies & Requirements

In addition to run-time configurations, a project contains a list of requirements that are needed to deploy your pipeline. Skafos abstracts away ugly dependency & environment management so that you can focus on your models. List out each dependency in the requirements.txt file included in your project repository:


If your project requires a more sophisticated environment, Skafos also supports an environment.yml file included in your project repo. This relies on Conda to manage both pythonic packages and specific system-level dependencies. Go here to see how to create one.


After you’ve created a new project, written several Jobs, and outlined configurations & dependencies, you are ready to deploy. Skafos handles all of the backend orchestration for you so that deployment is 100% serverless. Read more about this in the Deployments section! Once your project is deployed, head over to the Dashboard section to learn how you can stay on top of issues/failures as they arise.


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.