Setup#

This is what you need to have in place first, in order to deploy CoCalc.

Introduction#

To ensure successful management of your service, it is important to have a some understanding of the following tools:

  • A Kubernetes cluster and experience in managing it. This guide provides information on how to manage the service, respond to issues, plan resource requirements, and scale various services to meet your usage needs.

  • Some experience working with HELM charts.

  • For the two above, you should know at least a bit of YAML.

  • A (sub)domain that points to the service, as well as a TLS certificate (Let’s Encrypt is a possibility).

  • A standard PostgreSQL database.

  • For storage, a network filesystem like NFS – which supports ReadWriteMany – is necessary to hold data for all projects.

  • A standard E-Mail service for sending notifications and password resets (SMTP server).

  • Familiarity with CoCalc in its CoCalc Documentation may be necessary, since your users may have questions.

Compatibility#

There are notes and config file examples in this repository, next to the /cocalc directory. The latest update of these version numbers has been on 2022-12-01. Please check each directory for the latest information.

  • Kubernetes cluster: the primary cluster to test this setup runs on GKE version 1.24 (see GKE Setup). Other recent versions should work fine as well. Regarding specific variants, this setup is known to work on

    • Google GCP – Google Kubernetes Engine in their GCP cloud

    • Amazon AWS – Amazon Elastic Kubernetes Service in their AWS cloud

    • Minikube – a local Kubernetes cluster, for testing and experimentation. Please check /cocalc/minikube.yaml for a config example – YMMV!

    and besides that, it should also work on bare metal clusters, created by kubeadm and using MetalLB as a LoadBalancer. There is a broad spectrum of possible cluster types and configurations. Setting up a cluster is beyond the scope of this document, though. Please read the remainder of the requirements to see if your setup is compatible. From the point of view of CoCalc Cloud, there are no special requirements for the cluster on top of what a Kubernetes setup can provide.

  • HELM charts: version 3.10

  • Kubectl: version 1.21 or higher. Testing and dev is done using version 1.23. Set the version string in your values.yaml in global.kubectl.

  • NGINX Ingress Controller: the HELM chart installing version 1.5.1 is known to work.

  • Let’s Encrypt as a Certificate Manager (optional, if you don’t set key and cert manually): the HELM chart installing version 1.10.1 is known to work.

  • NFS (optional, you can bring your own ReadWriteMany provider): nfs-ganesha-server-and-external-provisioner version 3.0.0 is known to work – please check out details at GKE/NFS.

Prerequisites#

  • Kubernetes cluster with at least 2 nodes (see below for details)

  • Helm: 3.7 or later (recommended way)

Note

As an alternative, there is a kustomization.yaml example file for Kustomize. You can render the Helm charts via kustomize build --enable-helm  . > cocalc.yaml. This is not actively maintained and might be broken.

  • Cluster setup: everything runs in a single namespace. Throughout this guide this will be cocalc. Regarding nodes, you can either start with a minimal setup for testing, or setup the recommended “partitioned” setup.

    • Minimal: 2 nodes with 2 CPU cores, 8GB of RAM, and 100GB disk space each. This is enough to run all services with a redundancy of 2 and a few projects.

    • Partitioned: two nodes hold CoCalc services – like above – while all other nodes get a Taint, such that only CoCalc projects are allowed to run there. This avoids interferences beyond what containers provide. Also, this makes it easy to only scale the part of the cluster that runs the projects.

  • Networking: there are two service groups that define Ingress rules. Included is a standard setup of an NGINX Ingress Controller. It usually runs behind a LoadBalancer (provided by the cluster infrastructure).

    • CoCalc also requires its own domain or sub-domain, i.e. it’s currently not possible to run CoCalc with a “base path”. This is the DNS setting.

    • TLS is configured via standard Ingress TLS – it’s straight forward to setup crypt-manager and letsencrypt, managing this for you.

  • Local disk: the nodes running the CoCalc projects need to be able to load and run Docker images with more than 10 GB of size. The software users want to be able to run takes more space than usual. 50GB per node should be enough, and having at least 100GB free for the nodes running projects is recommended.

Access to HELM Charts#

These are the steps to setup a local Git repository, which has its own public/private keys. The public key will be added by the repo admins, in order to give you access.

  1. Create a new public/private key (that way, there won’t be a conflict with already registered keys at GitHub): ssh-keygen -t ed25519 -f my-key

  2. Send your new my-key.pub public key to the contact person at Sagemath, Inc. This key will be added as a deployment key.

  3. Then, in a new empty directory (e.g. cocalc-cloud):

git init .
git config --local core.sshCommand "ssh -i ~/path/to/my-key"
git remote add github [email protected]:sagemathinc/cocalc-cloud-helm.git
git pull github main
git branch --set-upstream-to=github/main

(replace ~/path/to/my-key by the path to the private key generated in step 2.)

Once this is done, pull periodically to get updates and update the deployed HELM chart. In particular, the tags for the deployed docker images will change – see Versions for more details.

Namespace#

Create a namespace for CoCalc in your Kubernetes cluster. Throughout the guide this will be cocalc:

kubectl create namespace cocalc

→ expected output: namespace/cocalc created

Private docker registry#

The compiled Docker Images are stored in a private registry. You need to setup Docker Credentials in order to be able to pull these images.

One of the first steps is to setup a secret to access this registry.

  1. You’ll get a credentials file from Sagemath, Inc. – it will be called regcred.json below.

  2. Tell Kubernetes about this registry secret, in the namespace where you’ll deploy CoCalc, with the name: regcred:

kubectl create secret docker-registry regcred \
   --docker-server=europe-docker.pkg.dev \
   --docker-username=_json_key \
   --docker-password="$(cat ./regcred.json)" \
   --docker-email=[email protected]

→ expected output: secret/regcred created

Note

If you name this secret differently than regcred, you need to adjust the imagePullSecrets in the values.yaml file.

Ref:

Networking#

  • All access is defined via standard Ingress specification files. They’re tested to work with the NGINX ingress controller, which is pretty standard. To make them work, one way is to setup this controller as shown in the /ingress-nginx directory.

  • The bulk of all traffic is via https, hence you need a certificate as well. Included are examples for Let’s Encrypt in the /letsencrypt directory, but you can also use your own certificate.

  • As a minimum, you need a domain/subdomain and a LoadBalancer for your cluster, which routes the traffic to the NGINX controller. It’s beyond the scope of this guide, but such a LoadBalancer is either provided by your public cloud provider, or you can use a MetalLB setup.

Beyond the HTTPS traffic, there is also (optionally) an SSH Gateway. It makes it possible to access projects via SSH. That traffic is pure TCP on port 22, which needs a special configuration for the NGINX controller.

If you’re not using the NGINX Ingress Controller, you might have to adjust some details. Please check both ingress.yaml HELM Chart templates in hub and static for up-to-date details. The relevant settings are in the annotations, prefixed with nginx.ingress.kubernetes.io – see NGINX Annotations for more details. Basically:

  • Session Affinity: reconnecting the websockets is more stable and faster, if they’re sticky with specific hubs.

  • Body Size: this is relevant for uploading files. The uploader uses chunking, so, it’s just important to allow more than the size of a chunk.

  • /metrics endpoint: you don’t want to expose that endpoint to the public. That’s why this snippet is added:

    nginx.ingress.kubernetes.io/server-snippet: |
      location = "/metrics" {
        deny all;
        return 404;
      }
    

Database#

You need PostgreSQL version 11 or higher – suggested is version 14. Setting it up is out of scope for this guide, but there are many ways to install it.

Version 14 should have the least amount of surprises. Development and testing is done using version 14, and thinking long-term, this is the best version to start with. Ideally, you also have something in place, to make this a robust database service.

If you already have a PostgreSQL instance, you could also just create a new “database” and “user”. CoCalc’s services only needs to know the IP (or host name), the username and of course, the password as a Secret. This secret is called postgresql-password (if you name it differently, you have to configure it in global.database.secretName in your my-values.yaml).

  • Directory /database contains some notes about running the Postgres HELM chart from bitnami. Could be outdated, hence please refer to the upstream documentation!

  • One eventually important setting is to increase the max_connections = 100.

Note

CoCalc needs pretty extensive permissions to the database (aka “superuser”). One reason is that upon startup, it checks if the database exists, creates tables and schema, later checks up on all tables, indices, etc.

Storage#

That’s probably the trickiest part. CoCalc requires a filesystem supporting ReadWriteMany (short: "RWX") with VolumeMode: "Filesystem" (not block storage!). This is used to share data across the pods in the cluster. This could be an NFS server, but depending on what you already have or which public cloud provider you use, there are various options.

CoCalc’s Kubernetes setup just needs to know the names of two PersistentVolumeClaim with specific purposes: “project data” and “software”. Their names must be different from each other!

Projects Data#

The projects-data PVC stores all files of CoCalc projects. There are two main directories:

  • projects/[uuid]: each project’s data is mounted as a subdirectory with the project’s UUID.

  • shares/[uuid]: data shared from a project is copied over to that directory.

  • If necessary, global.shared_files_path configures the path schema for shared projects (but this isn’t tested). The idea is to maybe use a different storage backend for these particular files.

Projects Software#

The projects-software PVC is shared across all project pods. The idea is to share data and software globally. It mounts global files with read-only access in /ext. (Use the $EXT environment variable from within a project to resolve that directory).

Via a special License, some projects can get read/write access to this global software directory. Such licenses have the ext_rw quota enabled.

Then hand the License ID to the user, who has to add it to their project, or just add the license directly.

Next time the the project starts, the underlying Pod’s readOnly flag of that particular volume will be false and everyone having having access to this project will be able to modify the files in /ext.

See Global Software about how to make full use of this.

Install NFS Server#

In order to setup a flexible NFS server and storage provisioner, you can use the HELM chart nfs-ganesha-server-and-external-provisioner. There are notes about installing it in GKE/NFS Server.

Under the hood#

Under the hood, Kubernetes sees the following for a project – and equivalent configs appear in hub-next, or manage-copy/-share, …:

volumes:
- name: home
  persistentVolumeClaim:
    claimName: projects-data
[...]

and mounts it via:

volumeMounts:
- mountPath: /home/user
  name: home
  subPath: projects/09c119f5-....
[...]

Where projects-data is bound to a volume and has access mode RWX and storage class nfs.