Architecture¶

This document describes the architecture of CoCalc in Kubernetes. It’s helpful to know a bit about the services it is composed of, but it’s not necessary to understand everything in every detail.

Here is a high-level overview:

Database: holds all the information about users, projects, and everything else.
Storage: holds all files edited and published by users via their projects.
Hub services: the entry points for users and projects from the internet. They are responsible for authentication, authorization, and routing.
Project pods: the actual containers running the user projects. These project pods are managed by a set of microservices called
Manage, which are responsible for starting, stopping, and monitoring projects.

In a nutshell, this is what they’re doing:

A couple of services will be running in pods in the kubernetes cluster → Hubs, Static and Manage,
Some of them will talk to the database → Database,
Some have endpoints, which are exposed via Ingress configuration files → Static and Hubs,
A few services will dynamically create and delete pods (aka “CoCalc Projects”) inside that cluster → Manage,
and many of these pods will mount directories of a shared filesystem → Storage.

After you have a basic understanding of the architecture, you can continue preparing your cluster.

Hubs¶

hub-websocket: clients from the web connect via websockets. This service also controls the database, etc.
- During normal usage, it is expected that up to 50 simultaneous connections are possible, 30 are great.
- It’s fine to run 5 or more websocket hubs.
hub-proxy: establishes a connection to the projects, requires a client with a valid authentication.
- During normal usage, at least 50 simultaneous connections are possible.
- It’s fine to run 5 or more proxy services.
hub-next:
- Serves dynamic pages like the landing page at /, /info, /policies, etc.
- This service also renders shared files of users at /share or even a custom name at /[user name]/[project nickname]/[share nickname].
- Finally, it also serves /api/v2.
- There should be at least two next services running.
hub-maintenance-*: single pods, remove/compact data in the database.
hub-stats: single pod, collects statistics about CoCalc itself
hub-api: serves the /api/v1 endpoint.

Restart an aspect e.g. via kubectl rollout restart deploy hub-next.

Restart all hubs via kubectl rollout restart deploy -l group=hub.

Manage¶

manage-action: triggered when a project is told to start, stop, restart, etc. Usually, the user is requesting to start a project, which is recorded in the database as a request to start the project. This microservice uses Postgres’s LISTEN/NOTIFY capability to listen for and react to such requests and actually starts the project. Behind the scenes, manage-action not only reads the database, but also the project-pod and project-image ConfigMaps to determine what to do. After compiling all the available information, it sends a Pod configuration to the Kubernetes API server, which then starts the project.
manage-state: this service listens to the k8s API for changes regarding projects and updates the database accordingly. It’s the companion of manage-action. In particular, once a project pod is started, it will update the database. This tells the user that the project is running and shortly it will connect. Also, the hub services will connect to that project to establish a communication channel.
manage-idle: periodically check if projects are idle and stop them. This also checks if there are any stopped “always running” projects that should be running and starts them, and checks if projects are stuck in pending, and stops them (cleanup).
manage-copy: this watches the copy_paths table of the database for requests to copy files, then starts projects if necessary and issues copy operations between projects, and finally writes out the status to the database. This basically issues rate-limited rsync operations.
manage-share: similar to manage-copy, but for shared files.

Restart all manage services via kubectl rollout restart deploy -l group=manage or one by one via kubectl rollout restart deploy manage-action, …

Project¶

User projects run as pods. You can think of them as a container running a Linux environment with an unprivileged user inside. Each project has their own $HOME-directory on a shared filesystem. Overall, these project pods will use up most of the resources, because the services mentioned above scale with a much smaller factor in the number of users.

Their resource requests and limits are configured via quota settings (only admins can do that), or via “licenses”. This means there could be projects requesting a significant chunk of available CPU or memory resources.

For this CoCalc setup, the “request” is calculated from the limits via an over-commit ratio. This is set via the global site configuration settings, i.e. global.settings.default_quotas or that same filed in Admin → Site Settings. The parameter cpu_oc: 10 means the cpu over-commit ratio is 1:10 – which is fine for interactive use, because most of the time projects wait for user input. Similarly mem_oc: 5 means the memory over-commit ratio is 1:5.

Main challenges

You have to plan/adjust the size and number of nodes to match the overall requests for projects.
Users are sensitive to interrupted projects, because they can’t continue working and their intermediate state in e.g. notebooks is lost. Hence you can’t just willy-nilly delete projects.
Users are also sensitive to slow startup times. That’s why the Prepull service exists, pulling the large project images before marking the node ready for running these projects.
You can also partition the cluster heterogeneously, such that some projects run only on specific nodes, while all other projects end up in a common pool of project nodes.

Storage¶

All three aspects mentioned above are using storage in the form of a shared filesystem. Usually, this is accomplished via an NFS server, but there are other options as well. The Kubernetes abstraction for this is a PersistentVolume (PV) with ReadWriteMany access mode.

Projects mount the /projects/[UUID] subdirectory as their Home Directory. Other services manage-copy mount this directory to copy between projects, while sharing (publishing) files is mounted by hub-next and serves rendered files at the /share path.

An important detail is that the UID/GUI is 2001. This is for security reasons and to be distinct from root.

For example, the AWS EKS setup does not work out of the box and must be configured to use 2001 as UID/GUI.

Project Nodes¶

It’s highly recommended to run all project pods on their dedicated VMs (via node taints), because users – even by accident – could be using a lot of RAM and/or CPU. So, even if containers do their jobs, there might be issues and this cleanly separates the projects from the system services.

To enable this, look into the values.yaml file, in manage.project. Below are labels and taints for service and project nodes.

Prepull¶

Related to the above, there is also a “prepull” service. It solves the issue of users facing a project in a “Pending” state for too long. This happens because the images of the project pods are very large and take some time to load on a new node.

The basic idea is to initially configure new project nodes via Taints to not be able to run projects. Prepulls loads the large project image first, before any project pod can be scheduled on a new pod. When it was successful, it does a quick check and changes the taint of the node it runs on, such that project pods can be scheduled on that node. This in turn removes itself, because of the taint configuration. Projects will now start quickly, because the large project image is already loaded.

When there is an update to the project image (new tag in manage.project.tag), the labels and taints of project nodes are reset, because of a post update Deployment Hook (which in turn runs manage/templates/prepull-update-script.yaml …).

The prepull service will then pull the new project image and once done, allows projects to schedule.

Projects that were already running before the updated are not affected. You can get a sense about what image they run by checking their project_tag label (or even delete old projects via kubectl delete pod -l run=project,project_tag=<old-tag> in order to get rid of these pods, which then allows kubelet to remove those old Docker images and avoid running into disk pressure issues).

Note

The prepull service needs cluster-wide permissions, because it must be able to modify the labels and taints of the nodes. Feel free read through cocalc/charts/manage/templates/prepull-update-script.yaml and cocalc/charts/manage/prepull.py in case you want to know what it does – it’s pretty simple, but since it has cluster-wide permissions, you might want to audit it.

Node Setup¶

To make use of this prepull service, you need two node pools. If you go ahead with the default names, configure the pools like this:

“service” pool:
- set the Kubernetes label to cocalc-role=services (that’s key=value)
“project” pool:
- set the Kubernetes label to cocalc-role=projects (that’s key=value)
- and the initial Kubernetes taint (this is key=value → effect) to:
  - cocalc-projects-init=false → NoExecute
  - cocalc-projects=init → NoSchedule

Static¶

This service just holds static files, which build up the front-end application – it’s served at /app and clients connect to the backed via a websocket served by hub-websocket. Overall, this is probably the service which will be updated most often.

SSH Gateway¶

If enabled (via global.ssh_gateway.enabled), this service runs an SSH server as a gateway to access projects. Users can add their public SSH key to a project or their account. With that, they’re allowed users to ssh inside a running project.

Use cases are:

simplifying running tasks, like periodic checks, etc.
up- or downloading files via scp or rsync
accessing scripts/software hosted on CoCalc from within another headless system, e.g. a cluster

Network setup: the connection to the outside world works by exposing the service’s endpoint. This is a “global” setup of your cluster, hence it is outside the scope of CoCalc’s HELM chart.

For the NGINX ingress controller, the TCP service of ssh-gateway must be added to the tcp-services configmap. If you’re using its HELM chart, see /ingress-nginx/values.yaml for a working example, under tcp: {...}.

Note

If you already had setup a LoadBalancer and update, it might not pick up the new configuration to include port 22. The easiest way to fix this is to delete the LoadBalancer and let it be recreated.

Datastore¶

If enabled (via global.datastore.enabled), in the Project Settings a configuration panel “Cloud storage & remote filesystems” appears. This allows users to mount remote filesystems into the particular project. This supports SSHFS, AWS S3 and Google Cloud Storage.

Under the hood, “Datastore” is a sidecar for the project, which mounts these filesystems according to their configuration in /data/[name] (where name is the name of the datastore). This mountpoint is propagated to the project container from the host. If the file ~/data is not taken, the project will automatically create a symlink to that global directory. Therefore, collaborators of the project can use and see this filesystem, but they do not know the secret, don’t see the raw configuration files, and also cannot interact with the actual process doing the FUSE mount. The “secret” is hidden in the user interface, it’s not sent to the web client.

The “read-only” mode enabled the ro mount option for the FUSE mount.
To make the filesystem perform well, it does a bit of caching, but only with a small timeout. This means if you give it a few seconds to read/write sync, it’s possible to do a bit of collaboration via the same mounted filesystem. It’s not really recommended, but possible. Also note that there is filesystem level polling of discovered directories in CoCalc’s projects, which means that remote changes to these files will eventually show up as well and update in an opened editor. Those projects are also cached on CoCalc’s side.
Requests to support other remote filesystems are welcome, and if there is a robust tool and a way to easily configure them, we certainly consider adding it.
Pro-tip: if a project is set to “Always Running”, you can use the SSHFS configuration in combination with the SSH Gateway to mount a directory from another project. This is a bit of a hack, but it works.