Setup#
This is what you need to have in place first, in order to deploy CoCalc.
Introduction#
To ensure successful management of your service, it is important to have a some understanding of the following tools:
A Kubernetes cluster and experience in managing it. This guide provides information on how to manage the service, respond to issues, plan resource requirements, and scale various services to meet your usage needs.
Some experience working with HELM charts.
For the two above, you should know at least a bit of YAML.
A (sub)domain that points to the service, as well as a TLS certificate (Let’s Encrypt is a possibility).
A standard PostgreSQL database.
For storage, a network filesystem like NFS – which supports ReadWriteMany – is necessary to hold data for all projects.
A standard E-Mail service for sending notifications and password resets (SMTP server).
Familiarity with CoCalc in its CoCalc Documentation may be necessary, since your users may have questions.
Compatibility#
The table below gives you a rough overview about the compatibility of this setup.
The latest update of these version numbers has been on 2023-11-13 for version 2.11.4
.
In general, this is just a recent collection of known versions – other versions, especially older, should work as well.
Beyond this list, adjacent to the /cocalc
directory in this repository, there are notes and config file examples.
Please check each directory for the latest information.
Cluster |
Details |
---|---|
The primary cluster to test this setup runs on GKE version |
|
This setup runs on Google Kubernetes Engine in Google’s GCP cloud: Google GCP/GKE |
|
It also runs on Amazon’s Elastic Kubernetes Service in their AWS cloud: Amazon AWS/EKS |
|
Microsoft’s Azure Kubernetes Service in their Azure cloud works as well: Microsoft Azure/AKS |
|
Local Kubernetes cluster, for testing and experimentation, currently not maintained.
See example |
|
Bare metal clusters, created by kubeadm, using MetalLB as a LoadBalancer. There is a broad spectrum of possible cluster types and configurations. Setting up a cluster is beyond the scope of this document, though. Please read the remainder of the requirements to see, if your setup is compatible. From the point of view of CoCalc Cloud, there are no complex requirements for the cluster. Mainly, it must support storing data, have network access, and support a LoadBalancer and Ingress rules. |
Dependencies#
Tool/Service |
Details |
---|---|
Version |
|
Version |
|
The HELM chart |
|
As a Certificate Manager (optional, if you don’t set key and cert manually):
The HELM chart installing version |
|
(optional, you can bring your own ReadWriteMany provider):
|
|
Version |
Prerequisites#
Aspect |
Details |
---|---|
Everything runs in a single namespace. Throughout this guide this will be |
|
As an alternative to HELM, there is a |
|
Cluster/VMs |
Regarding nodes, you can either start with a “minimal” setup for testing, or start with the slightly more complex but recommended “partitioned” setup. |
|
2 nodes with 2 CPU cores, 8GB of RAM, and 100GB disk space each. This is enough to run all services with a redundancy of 2 and a few projects, just for getting started. |
|
Two nodes hold CoCalc services – same specs as above – while all other nodes get a Taint, such that only CoCalc projects are allowed to run there. Start with two 2 CPU cores, 16GB of RAM, and 100GB disk space for these tainted project nodes. This avoids interferences beyond what containers provide. Also, this makes it easy to only scale the part of the cluster that runs the projects, without interfering the workload on the “service” nodes. |
Networking |
There are two groups of Services, which define Ingress rules. Included is a standard setup of an NGINX Ingress Controller. It usually runs behind a LoadBalancer, provided by the cluster infrastructure. |
Domain or Sub-Domain |
CoCalc also requires its own domain or sub-domain, i.e. it’s currently not possible to run CoCalc with a “base path”. |
TLS configuration |
TLS is configured via standard Ingress TLS – it’s straightforward to setup Certificate Manager and Let’s Encrypt, managing this for you. |
Local disk |
The nodes running the CoCalc projects need to be able to load and run Docker images with more than 10 GB of size. The software users want to be able to run takes more space than usual. 50GB per node should the lower limit, while at least 100GB for the nodes running projects is recommended. |
Access to HELM Charts#
These are the steps to setup a local Git repository, which has its own public/private keys. The public key will be added by the repo admins, in order to give you access.
Create a new public/private key (that way, there won’t be a conflict with already registered keys at GitHub):
ssh-keygen -t ed25519 -f my-key
Send your new
my-key.pub
public key to the contact person at Sagemath, Inc. This key will be added as a deployment key.Then, in a new empty directory (e.g.
cocalc-cloud
):
git init .
git config --local core.sshCommand "ssh -i ~/path/to/my-key"
git remote add github [email protected]:sagemathinc/cocalc-cloud-helm.git
git pull github main
git branch --set-upstream-to=github/main
(replace ~/path/to/my-key
by the path to the private key generated in step 2.)
Once this is done, pull periodically to get updates and update the deployed HELM chart. In particular, the tags for the deployed docker images will change – see Versions for more details.
Namespace#
Create a namespace for CoCalc in your Kubernetes cluster.
Throughout the guide this will be cocalc
:
kubectl create namespace cocalc
→ expected output: namespace/cocalc created
Private docker registry#
The compiled Docker Images are stored in a private registry. You need to setup Docker Credentials in order to be able to pull these images.
One of the first steps is to setup a secret to access this registry.
You’ll get a credentials file from Sagemath, Inc. – it will be called
regcred.json
below.Tell Kubernetes about this registry secret, in the namespace where you’ll deploy CoCalc, with the name:
regcred
:
kubectl create secret docker-registry regcred \
--docker-server=europe-docker.pkg.dev \
--docker-username=_json_key \
--docker-password="$(cat ./regcred.json)" \
--docker-email=[email protected]
→ expected output: secret/regcred created
Note
If you name this secret differently than regcred
,
you need to adjust the imagePullSecrets
in the values.yaml
file.
Ref:
Networking#
All access is defined via standard Ingress specification files. They’re tested to work with the NGINX ingress controller, which is pretty standard. To make them work, one way is to setup this controller as shown in the
/ingress-nginx
directory.The bulk of all traffic is via https, hence you need a certificate as well. Included are examples for Let’s Encrypt in the
/letsencrypt
directory, but you can also use your own certificate. However you set this up, this is a general Kubernetes question and goes beyond the scope of this guide. (see Ingress TLS)As a minimum, you need a domain/subdomain and a LoadBalancer for your cluster, which routes the traffic to the NGINX controller. It’s beyond the scope of this guide, but such a LoadBalancer is either provided by your public cloud provider, or you can use a MetalLB setup.
Beyond the HTTPS traffic, there is also (optionally) an SSH Gateway. It makes it possible to access projects via SSH. That traffic is pure TCP on port 22, which needs a special configuration for the NGINX controller.
If you’re not using the NGINX Ingress Controller, you might have to adjust some details.
Please check both ingress.yaml
HELM Chart templates in hub
and static
for up-to-date details.
The relevant settings are in the annotations, prefixed with nginx.ingress.kubernetes.io
– see NGINX Annotations for more details. Basically:
Session Affinity: reconnecting the websockets is more stable and faster, if they’re sticky with specific hubs.
Body Size: this is relevant for uploading files. The uploader uses chunking, so, it’s just important to allow more than the size of a chunk.
/metrics endpoint: you don’t want to expose that endpoint to the public. That’s why this snippet is added:
nginx.ingress.kubernetes.io/server-snippet: | location = "/metrics" { deny all; return 404; }
Database#
You need PostgreSQL version 11 or higher – suggested is version 14. Setting it up is out of scope for this guide, but there are many ways to install it.
Version 14 should have the least amount of surprises. Development and testing is done using version 14, and thinking long-term, this is the best version to start with. Ideally, you also have something in place, to make this a robust database service.
If you already have a PostgreSQL instance, you could also just create a new “database” and “user”.
CoCalc’s services only needs to know the IP (or host name),
the username and of course, the password as a Secret.
This secret is called postgresql-password
(if you name it differently, you have to configure it in
global.database.secretName
in your my-values.yaml).
To create the secret, you can use the following command:
kubectl create secret generic postgresql-password --from-literal=postgresql-password=[password]
Directory
/database
contains some notes about running the Postgres HELM chart from bitnami. Could be outdated, hence please refer to the upstream documentation!One eventually important setting is to increase the
max_connections = 100
.
Note
CoCalc needs pretty extensive permissions to the database (aka “superuser”). One reason is that upon startup, it checks if the database exists, creates tables and schema, later checks up on all tables, indices, etc.
Storage#
That’s probably the trickiest part.
CoCalc requires a filesystem supporting ReadWriteMany (short: "RWX"
)
with VolumeMode
: "Filesystem"
(not block storage!).
This is used to share data across the pods in the cluster.
This could be an NFS server, but depending on what you already have
or which public cloud provider you use, there are various options.
Note
CoCalc Cloud primarily needs to know the names of two PersistentVolumeClaim with specific purposes: Projects Data and Projects Software. Their PVC names must be different from each other as well!
See Deployment/Storage for how to configure this.
Projects Data#
The projects-data
PVC stores all files of CoCalc projects. There are two main directories:
projects/[uuid]
: each project’s data is mounted as a subdirectory with the project’s UUID.shares/[uuid]
: data shared from a project is copied over to that directory.If necessary,
global.shared_files_path
configures the path schema for shared projects (but this isn’t tested). The idea is to maybe use a different storage backend for these particular files.
Projects Software#
The projects-software
PVC is shared across all project pods.
The idea is to share data and software globally.
It mounts global files with read-only access in /ext
.
(Use the $EXT
environment variable from within a project to resolve that directory).
Write Access
Via a special License, some projects can get read/write access to this global software directory.
Such licenses have the ext_rw
quota enabled.
Then hand the License ID to the user, who has to add it to their project, or just add the license directly.
Next time the the project starts, the underlying Pod’s readOnly
flag
of that particular volume will be false
and everyone having
having access to this project will be able to modify the files in /ext
.
See Global Software about how to make full use of this.
Install NFS Server#
If you need to setup a flexible NFS server and storage provisioner,
you can use the HELM chart nfs-ganesha-server-and-external-provisioner
.
There are notes about installing it in GKE/NFS Server.
Under the hood#
Under the hood, Kubernetes sees the following for a project – and equivalent configs appear in hub-next, or manage-copy/-share, …:
volumes:
- name: home
persistentVolumeClaim:
claimName: projects-data
[...]
and mounts it via:
volumeMounts:
- mountPath: /home/user
name: home
subPath: projects/09c119f5-....
[...]
Where projects-data
is bound to a volume and has access mode RWX
and storage class nfs
.