Setup#

This is what you need to have in place first, in order to deploy CoCalc.

Introduction#

To ensure successful management of your service, it is important to have a some understanding of the following tools:

  • A Kubernetes cluster and experience in managing it. This guide provides information on how to manage the service, respond to issues, plan resource requirements, and scale various services to meet your usage needs.

  • Some experience working with HELM charts.

  • For the two above, you should know at least a bit of YAML.

  • A (sub)domain that points to the service, as well as a TLS certificate (Let’s Encrypt is a possibility).

  • A standard PostgreSQL database.

  • For storage, a network filesystem like NFS – which supports ReadWriteMany – is necessary to hold data for all projects.

  • A standard E-Mail service for sending notifications and password resets (SMTP server).

  • Familiarity with CoCalc in its CoCalc Documentation may be necessary, since your users may have questions.

Compatibility#

The table below gives you a rough overview about the compatibility of this setup. The latest update of these version numbers has been on 2023-11-13 for version 2.11.4.

In general, this is just a recent collection of known versions – other versions, especially older, should work as well. Beyond this list, adjacent to the /cocalc directory in this repository, there are notes and config file examples. Please check each directory for the latest information.

Cluster

Details

Kubernetes

The primary cluster to test this setup runs on GKE version 1.26 (see GKE Setup). Other recent versions below that one should work fine as well.

GKE

This setup runs on Google Kubernetes Engine in Google’s GCP cloud: Google GCP/GKE

EKS

It also runs on Amazon’s Elastic Kubernetes Service in their AWS cloud: Amazon AWS/EKS

AKS

Microsoft’s Azure Kubernetes Service in their Azure cloud works as well: Microsoft Azure/AKS

Minikube

Local Kubernetes cluster, for testing and experimentation, currently not maintained. See example /cocalc/minikube.yamlYMMV!

Bare metal

Bare metal clusters, created by kubeadm, using MetalLB as a LoadBalancer. There is a broad spectrum of possible cluster types and configurations. Setting up a cluster is beyond the scope of this document, though. Please read the remainder of the requirements to see, if your setup is compatible. From the point of view of CoCalc Cloud, there are no complex requirements for the cluster. Mainly, it must support storing data, have network access, and support a LoadBalancer and Ingress rules.

Dependencies#

Tool/Service

Details

HELM charts

Version 3.13.1

Kubectl

Version 1.21 or higher. Testing and dev is done using version 1.27. Make sure to set the version string in global.kubectl, such that kubectl running configuration jobs roughly match your version of Kubernetes.

NGINX Ingress Controller

The HELM chart 4.8.3 installing version 1.9.4 is known to work.

Let’s Encrypt

As a Certificate Manager (optional, if you don’t set key and cert manually): The HELM chart installing version v1.13.2 is known to work.

NFS

(optional, you can bring your own ReadWriteMany provider): nfs-ganesha-server-and-external-provisioner version 4.0.8 is known to work – please check out details at GKE/NFS.

PostgreSQL

Version 14 is known to work. Version 11 is the minimum requirement.

Prerequisites#

Aspect

Details

Kubernetes

Everything runs in a single namespace. Throughout this guide this will be cocalc.

HELM Charts

As an alternative to HELM, there is a kustomization.yaml example file for Kustomize. Render the Helm charts via kustomize build --enable-helm  . > cocalc.yaml. This is not actively maintained and might be broken.

Cluster/VMs

Regarding nodes, you can either start with a “minimal” setup for testing, or start with the slightly more complex but recommended “partitioned” setup.

  • Minimal

2 nodes with 2 CPU cores, 8GB of RAM, and 100GB disk space each. This is enough to run all services with a redundancy of 2 and a few projects, just for getting started.

  • Partitioned

Two nodes hold CoCalc services – same specs as above – while all other nodes get a Taint, such that only CoCalc projects are allowed to run there. Start with two 2 CPU cores, 16GB of RAM, and 100GB disk space for these tainted project nodes. This avoids interferences beyond what containers provide. Also, this makes it easy to only scale the part of the cluster that runs the projects, without interfering the workload on the “service” nodes.

Networking

There are two groups of Services, which define Ingress rules. Included is a standard setup of an NGINX Ingress Controller. It usually runs behind a LoadBalancer, provided by the cluster infrastructure.

Domain or Sub-Domain

CoCalc also requires its own domain or sub-domain, i.e. it’s currently not possible to run CoCalc with a “base path”.

TLS configuration

TLS is configured via standard Ingress TLS – it’s straightforward to setup Certificate Manager and Let’s Encrypt, managing this for you.

Local disk

The nodes running the CoCalc projects need to be able to load and run Docker images with more than 10 GB of size. The software users want to be able to run takes more space than usual. 50GB per node should the lower limit, while at least 100GB for the nodes running projects is recommended.

Access to HELM Charts#

These are the steps to setup a local Git repository, which has its own public/private keys. The public key will be added by the repo admins, in order to give you access.

  1. Create a new public/private key (that way, there won’t be a conflict with already registered keys at GitHub): ssh-keygen -t ed25519 -f my-key

  2. Send your new my-key.pub public key to the contact person at Sagemath, Inc. This key will be added as a deployment key.

  3. Then, in a new empty directory (e.g. cocalc-cloud):

git init .
git config --local core.sshCommand "ssh -i ~/path/to/my-key"
git remote add github [email protected]:sagemathinc/cocalc-cloud-helm.git
git pull github main
git branch --set-upstream-to=github/main

(replace ~/path/to/my-key by the path to the private key generated in step 2.)

Once this is done, pull periodically to get updates and update the deployed HELM chart. In particular, the tags for the deployed docker images will change – see Versions for more details.

Namespace#

Create a namespace for CoCalc in your Kubernetes cluster. Throughout the guide this will be cocalc:

kubectl create namespace cocalc

→ expected output: namespace/cocalc created

Private docker registry#

The compiled Docker Images are stored in a private registry. You need to setup Docker Credentials in order to be able to pull these images.

One of the first steps is to setup a secret to access this registry.

  1. You’ll get a credentials file from Sagemath, Inc. – it will be called regcred.json below.

  2. Tell Kubernetes about this registry secret, in the namespace where you’ll deploy CoCalc, with the name: regcred:

kubectl create secret docker-registry regcred \
   --docker-server=europe-docker.pkg.dev \
   --docker-username=_json_key \
   --docker-password="$(cat ./regcred.json)" \
   --docker-email=[email protected]

→ expected output: secret/regcred created

Note

If you name this secret differently than regcred, you need to adjust the imagePullSecrets in the values.yaml file.

Ref:

Networking#

  • All access is defined via standard Ingress specification files. They’re tested to work with the NGINX ingress controller, which is pretty standard. To make them work, one way is to setup this controller as shown in the /ingress-nginx directory.

  • The bulk of all traffic is via https, hence you need a certificate as well. Included are examples for Let’s Encrypt in the /letsencrypt directory, but you can also use your own certificate. However you set this up, this is a general Kubernetes question and goes beyond the scope of this guide. (see Ingress TLS)

  • As a minimum, you need a domain/subdomain and a LoadBalancer for your cluster, which routes the traffic to the NGINX controller. It’s beyond the scope of this guide, but such a LoadBalancer is either provided by your public cloud provider, or you can use a MetalLB setup.

Beyond the HTTPS traffic, there is also (optionally) an SSH Gateway. It makes it possible to access projects via SSH. That traffic is pure TCP on port 22, which needs a special configuration for the NGINX controller.

If you’re not using the NGINX Ingress Controller, you might have to adjust some details. Please check both ingress.yaml HELM Chart templates in hub and static for up-to-date details. The relevant settings are in the annotations, prefixed with nginx.ingress.kubernetes.io – see NGINX Annotations for more details. Basically:

  • Session Affinity: reconnecting the websockets is more stable and faster, if they’re sticky with specific hubs.

  • Body Size: this is relevant for uploading files. The uploader uses chunking, so, it’s just important to allow more than the size of a chunk.

  • /metrics endpoint: you don’t want to expose that endpoint to the public. That’s why this snippet is added:

    nginx.ingress.kubernetes.io/server-snippet: |
      location = "/metrics" {
        deny all;
        return 404;
      }
    

Database#

You need PostgreSQL version 11 or higher – suggested is version 14. Setting it up is out of scope for this guide, but there are many ways to install it.

Version 14 should have the least amount of surprises. Development and testing is done using version 14, and thinking long-term, this is the best version to start with. Ideally, you also have something in place, to make this a robust database service.

If you already have a PostgreSQL instance, you could also just create a new “database” and “user”. CoCalc’s services only needs to know the IP (or host name), the username and of course, the password as a Secret. This secret is called postgresql-password (if you name it differently, you have to configure it in global.database.secretName in your my-values.yaml).

To create the secret, you can use the following command:

kubectl create secret generic postgresql-password --from-literal=postgresql-password=[password]
  • Directory /database contains some notes about running the Postgres HELM chart from bitnami. Could be outdated, hence please refer to the upstream documentation!

  • One eventually important setting is to increase the max_connections = 100.

Note

CoCalc needs pretty extensive permissions to the database (aka “superuser”). One reason is that upon startup, it checks if the database exists, creates tables and schema, later checks up on all tables, indices, etc.

Storage#

That’s probably the trickiest part. CoCalc requires a filesystem supporting ReadWriteMany (short: "RWX") with VolumeMode: "Filesystem" (not block storage!). This is used to share data across the pods in the cluster. This could be an NFS server, but depending on what you already have or which public cloud provider you use, there are various options.

Note

CoCalc Cloud primarily needs to know the names of two PersistentVolumeClaim with specific purposes: Projects Data and Projects Software. Their PVC names must be different from each other as well!

See Deployment/Storage for how to configure this.

Projects Data#

The projects-data PVC stores all files of CoCalc projects. There are two main directories:

  • projects/[uuid]: each project’s data is mounted as a subdirectory with the project’s UUID.

  • shares/[uuid]: data shared from a project is copied over to that directory.

  • If necessary, global.shared_files_path configures the path schema for shared projects (but this isn’t tested). The idea is to maybe use a different storage backend for these particular files.

Projects Software#

The projects-software PVC is shared across all project pods. The idea is to share data and software globally. It mounts global files with read-only access in /ext. (Use the $EXT environment variable from within a project to resolve that directory).

Write Access

Via a special License, some projects can get read/write access to this global software directory. Such licenses have the ext_rw quota enabled.

Then hand the License ID to the user, who has to add it to their project, or just add the license directly.

Next time the the project starts, the underlying Pod’s readOnly flag of that particular volume will be false and everyone having having access to this project will be able to modify the files in /ext.

See Global Software about how to make full use of this.

Install NFS Server#

If you need to setup a flexible NFS server and storage provisioner, you can use the HELM chart nfs-ganesha-server-and-external-provisioner. There are notes about installing it in GKE/NFS Server.

Under the hood#

Under the hood, Kubernetes sees the following for a project – and equivalent configs appear in hub-next, or manage-copy/-share, …:

volumes:
- name: home
  persistentVolumeClaim:
    claimName: projects-data
[...]

and mounts it via:

volumeMounts:
- mountPath: /home/user
  name: home
  subPath: projects/09c119f5-....
[...]

Where projects-data is bound to a volume and has access mode RWX and storage class nfs.