Installing Kubeflow (MLOPS Platform) on Robin Cloud Native Platform preview

Installing Kubeflow (MLOPS Platform) on Robin Cloud Native Platform preview

INTRODUCTION

Kubeflow is an open-source Machine Learning platform native to Kubernetes. The Kubeflow project has multiple distinct software components that each address specific stages of the machine learning lifecycle, including model development, model training, model serving, and automated machine learning.

Kubeflow is a platform for data scientists who want to build and experiment with ML pipelines. Kubeflow is also for ML engineers and operational teams who want to deploy ML systems to various environments for development, testing, and production-level serving.  Installing Kubeflow requires an enterprise-grade Kubernetes platform capable of serving highly scalable heterogeneous ML ecosystem applications.

Some of the key platform considerations for deploying Kubeflow are as stated below:

Cloud Native Persistent Storage Layer

  1. Kubeflow requires an enterprise-grade, cloud-native persistent storage layer, which is scalable, reliable, resilient, performant and secured.
  2. Kubeflow install deploys various deployments, STS and PVCs that require enterprise-grade storage class (CSI) to support  snapshots, clone, backups, replication of data and applications etc.
  3. The Kubeflow platform requires shared (ReadWriteMany) PVCs that can be mounted on multiple containers that are part of the ML pipelines.

Advance Networking & Compute Support

Kubeflow deploys various custom resources, services, installs and configures Istio service mesh, and configures load balancer services. The Kubernetes platform should support load balancers, like metalLB, support for CNI, like Calico, and OVS for network communication between various pods. The platform must support the discovering of GPU resources and be able to allocate it to Jupyter Notebooks and ML applications. The platform should have observability built-in to monitor system performance and utilization.

Robin Cloud Native Platform provides native integration between Kubernetes, storage, network, and application management layers, which enables full automation to manage both clusters and applications with all the advantages of a true hybrid cloud experience.    Robin.io has built-in the capability to create managed application snapshots that enable cloning, backup, and migration of applications between on-prem and cloud or between data centers within an enterprise. Robin.io fully automates the end-to-end cluster provisioning process for the most challenging platform deployments, including Cloudera, Apache Spark, Kafka, TensorFlow, Pytorch, Kubeflow, MLflow, Scikit-learn, Caffe, Torch, and even custom application configurations.

Kubeflow components on Robin CNP (Kubernetes Platform)

Steps for Installing Kubeflow on Robin Cloud Native Platform

  1. Install Robin Native Platform (Symworld Cloud Native Platform) – https://docs.robin.io/platform/5.4.1/install.html#
  1. Setup metalLB load balancer to use a specific IP-pool range for load balancer services.MetalLB can be installed during Robin.io installation or we can perform post install using:
  1. Prepare the PVC YAML to reflect the right storage class, config storage management options, like replication, encryption, mediatype etc.

4.  Installing Kubeflow

The official Kubeflow install documentation:

https://github.com/kubeflow/manifests/tree/master

Kubeflow release version: v1.6.0

https://github.com/kubeflow/manifests/tree/v1.6-branch#kubeflow-components-versions

(the latest release is https://github.com/kubeflow/manifests/tree/v1.6.1)

     4.1  Download the Kubeflow Release Tar File, extract and cd into manifests directory:

    1.a  Download Kustomize and add to the host PATH:

4.2.  Using single command Kubeflow installation:

cd manifests:

Check if all the pods are running:

      4.3  Deploy Metal Load Balancer to have external IP for Kubeflow   (Refer to step 2 if not already done)

Ensure external LB IP is allocated for Istio-ingress gateway service:

      4.4. Check if all Kubeflow features are working:

          Go to the browser and open the Kubeflow UI app using load balancer service IP:

                                        http://10.9.232.190

Default  username /password  for Kubeflow application is   user@example.com/12341234

Create a Jupyter Notebook using Robin.io PeristentVolumeClaim –

Select the new workspace volume using Robin.io storage class.

Select the datavolume with accessmode as “ReadWriteMany” using Robin.io immediate storage class.

You can choose to create custom volume with Robin.io PVC spec and launch notebook.

The notebook will now have the access to both the PVCs, shared and local, provisioned by Robin.io CSI (storageclass)

Robin PVC spec example –  Advanced PVC parameters can be configured by adding appropriate annotations to the spec file. (for advance storage options)

https://docs.robin.io/platform/latest/manage_storage.html#readwritemany-rwx-volumes)

You can create and run Kubeflow pipelines from Jupyer Notebook using Kale

  1. Create and run Katib hyperparameter tuning experiments from Jupyer Notebook using Kale
  2. Create model servers using KServe

For more information about Robin Cloud Native Platform, please visit: https://robin.io


Share with: