The Project is the most central construct of Skafos.
Regardless of the complexity of your business use case, there are probably multiple steps in your end-to-end machine learning pipeline:
- External data needs to be ingested, processed, and stored
- ML features need to be engineered and a model needs to be trained
- New incoming data needs to be scored by a live model(s)
- Model output needs to be exposed via REST API
- Predictions need to be monitored for model drift
Moreover, these steps each serves a different function within your ML pipeline, making each step a prime candidate to be treated as an independent microservice rather than a single monolithic python file. Skafos provides tooling that makes it easy to discretize each step as a single job, and then orchestrate them as a pipeline. Each step, or chunk of code, is called a Job, and they can work either independently or together to form a Project.
A Project is a user-managed repository containing the following items:
- Code for each Job
- Run-time Configuration
- Project Dependencies & Requirements
You can create a brand new Project to deploy and monitor your ML pipeline directly from the command line interface (CLI).
$ skafos init my_new_project
The Skafos CLI section contains detailed information about installation and usage details.
Once you’ve created a project, you’ll need to create jobs, define any dependencies, configure the project, and deploy the project into your operational systems.
We also provide a series of starter Templates to help get you moving quickly!
Because a project is a collection of Jobs, Skafos enables you to configure the way each job runs. When designing an end-to-end ML pipeline, sometimes it may be useful to:
- Schedule jobs to run at specific times (hourly, daily, weekly, every 2 mins, etc).
- Chain jobs to run one after the other.
- Parallelize jobs by running multiple instances at once.
- Scale up a job’s computational limits with more CPUs and Memory resources.
- Define a job’s unique entrypoint.
- Utilize an AddOn such as a Skafos Queue or Spark Cluster to increase speed and performance.
Skafos provides a simple means to manage your project’s configuration options. Each project comes with it’s own
metis.config.ymlfile that outlines user-defined runtime behavior of deployed Jobs.
The config file, is the central orchestration component of each project. Living at the top-level of the project code repository, it outlines the user-defined runtime behavior of deployed Jobs. When your project is first initialized, a
metis.config.yml file is also generated.
project_token<project_token> namemy_new_project jobs job_id<job_id> languagepython nameMain entrypoint"main.py"
If your project contains multiple Jobs, each may require specific run-time settings. Below is an example that contains several jobs that work together:
project_token<project_token> namemy_new_project jobs job_id<job_id_1> languagepython nameingest entrypoint"data-ingest.py" schedule“0 11 * * *” job_id<job_id_2> languagepython nametrain entrypoint"model-train.py" dependencies"<job_id_1>" resources limits cpu6 # CPU workloads memory6Gi # CPU workloads nvidia.com/gpu1 # GPU workloads job_id<job_id_3> languagepython namescore entrypoint"score.py" dependencies"<job_id_2>" resources limits cpu1 # CPU workloads memory4Gi # CPU workloads nvidia.com/gpu1 # GPU workloads job_id<job_id_4> languagepython namereport entrypoint"report.py" dependencies"<job_id_3>"
In the complex example, we show 4 jobs that have been scheduled & chained together (dependencies), requiring different resource allocations. Most ML pipelines will require configuration files of this type. Each job can have requirements (e.g. scheduling, resources, etc) that differ from the other jobs within the project, forming a powerful constellation of microservices that operationalize your ML pipeline.
CPU / GPU workloads
Not every model can take advantage of a GPU, and for those jobs we recommend not adding the GPU workloads resource key to your config file, if you have one, or selecting it in the dashboard. However, if your model can take advantage of CUDA based GPUs, we do highly recommend that you enable that key. The CPU and Memory limits can be omitted for GPU workloads only, and you will get 3.5 vCPU, and 54 GiB of memory. You will only get 1 vCPU and 2 GiB of memory if you don't use a GPU and do not supply CPU / Memory limits.
In addition to run-time configurations, a project contains a list of requirements that are needed to deploy your pipeline. Skafos abstracts away ugly dependency & environment management so that you can focus on your models. List out each dependency in the
requirements.txt file included in your project repository:
skafossdk==1.1.2 pandas==0.23.4 scikit-learn numpy
If your project requires a more sophisticated environment, Skafos also supports an
environment.yml file included in your project repo. This relies on Conda to manage both pythonic packages and specific system-level dependencies. Go here to see how to create one.
After you’ve created a new project, written several Jobs, and outlined configurations & dependencies, you are ready to deploy. Skafos handles all of the backend orchestration for you so that deployment is 100% serverless. Read more about this in the Deployments section! Once your project is deployed, head over to the Dashboard section to learn how you can stay on top of issues/failures as they arise.