Architecture#
This document describes the architecture of CoCalc in Kubernetes. It’s helpful to know a bit about the services it is composed of, but it’s not necessary to understand everything in every detail.
Here is a high-level overview:
Database: holds all the information about users, projects, and everything else.
Storage: holds all files edited and published by users via their projects.
Hub services: the entry points for users and projects from the internet. They are responsible for authentication, authorization, and routing.
Project pods: the actual containers running the user projects. These project pods are managed by a set of microservices called
Manage, which are responsible for starting, stopping, and monitoring projects.
After you have a basic understanding of the architecture, you can continue preparing your cluster.
Hubs#
hub-websocket
: clients from the web connect via websockets. This service also controls the database, etc.During normal usage, it is expected that up to 50 simultaneous connections are possible, 30 are great.
It’s fine to run 5 or more websocket hubs.
hub-proxy
: establishes a connection to the projects, requires a client with a valid authentication.During normal usage, at least 50 simultaneous connections are possible.
It’s fine to run 5 or more proxy services.
hub-next
:Serves dynamic pages like the landing page at
/
,/info
,/policies
, etc.This service also renders shared files of users at
/share
or even a custom name at/[user name]/[project nickname]/[share nickname]
.Finally, it also serves
/api/v2
.There should be at least two next services running.
hub-maintenance-*
: single pods, remove/compact data in the database.hub-stats
: single pod, collects statistics about CoCalc itselfhub-api
: serves the/api/v1
endpoint.
Restart an aspect e.g. via k delete pod -l run=hub-next
.
Restart all hubs via k delete pod -l group=hub
.
Manage#
manage-action
: triggered when a project is told to start, stop, restart, etc. Usually, the user is requesting to start a project, which is recorded in the database as a request to start the project. This microservice uses Postgres’s LISTEN/NOTIFY capability to listen for and react to such requests and actually starts the project. Behind the scenes,manage-action
not only reads the database, but also theproject-pod
andproject-image
ConfigMaps
to determine what to do. After compiling all the available information, it sends aPod
configuration to the Kubernetes API server, which then starts the project.manage-state
: this service listens to the k8s API for changes regarding projects and updates the database accordingly. It’s the companion ofmanage-action
. In particular, once a project pod is started, it will update the database. This tells the user that the project is running and shortly it will connect. Also, the hub services will connect to that project to establish a communication channel.manage-idle
: periodically check if projects are idle and stop them. This also checks if there are any stopped “always running” projects that should be running and starts them, and checks if projects are stuck in pending, and stops them (cleanup).manage-copy
: this watches thecopy_paths
table of the database for requests to copy files, then starts projects if necessary and issues copy operations between projects, and finally writes out the status to the database. This basically issues rate-limitedrsync
operations.manage-share
: similar tomanage-copy
, but for shared files.
Restart all manage services via k delete pod -l group=manage
or one
by one via k delete pod -l run=manage-action
, …
Project#
User projects run as pods. Overall, they use up most of the resources, because all other services scale with a much smaller factor in the number of users.
Their resource requests and limits are configured via quota settings (only admins can do that), or via “licenses”. This means there could be projects requesting a significant chunk of available cpu or memory resources.
For this CoCalc setup, the “request” is calculated from the limits via
an overcommit ratio. This is set via the global site configuration
settings, i.e. global.settings.default_quotas
or that same filed in
Admin → Site Settings. The parameter cpu_oc: 10
means the cpu
overcommit ratio is 1:10 – which is fine for interactive use, because
most of the time projects wait for user input. Similarly mem_oc: 5
means the memory overcommit ratio is 1:5.
Main challenges
adjust the size and number of nodes to match the overall requests for projects
users are sensitive to interrupted projects, because they can’t continue working and their intermediate state in e.g. notebooks is lost. Hence you can’t just nilly-willy delete projects.
Storage#
All three aspects mentioned above are using storage in the form of a shared filesystem. Usually, this is accomplished via an NFS server, but there are other options as well. The Kubernetes abstraction for this is a PersistentVolume (PV) with ReadWriteMany access mode.
Projects mount the /projects/[UUID]
subdirectory as their Home Directory.
Other services manage-copy
mount this directory to copy between projects,
while sharing (publishing) files is mounted by hub-next
and serves rendered files at the /share
path.
An important detail is that the UID/GUI is 2001
.
This is for security reasons and to be distinct from root
.
For example, the AWS EKS setup does not work out of the box
and must be configured to use 2001
as UID/GUI.
Project Nodes#
It’s highly recommended to run all project pods on their dedicated VMs (via node taints), because users – even by accident – could be using a lot of RAM and/or CPU. So, even if containers do their jobs, there might be issues and this cleanly separates the projects from the system services.
To enable this, look into the values.yaml
file, in manage.project
.
Below are labels and taints for service and project nodes.
Prepull#
Related to the above, there is also a “prepull” service. It solves the issue of users facing a project in a “Pending” state for too long. This happens because the images of the project pods are very large and take some time to load on a new node.
The basic idea is to initially configure new project nodes via Taints to not be able to run projects. Prepulls loads the large project image first, before any project pod can be scheduled on a new pod. When it was sucessful, it does a quick check and changes the taint of the node it runs on, such that project pods can be scheduled on that node. This in turn removes itself, because of the taint configuration. Projects will now start quickly, because the large project image is already loaded.
When there is an update to the project image (new tag in
manage.project.tag
), the labels and taints of project nodes are
reset, because of a post update Deployment Hook (which in turn runs
manage/templates/prepull-update-script.yaml
…).
The prepull service will then pull the new project image and once done, allows projects to schedule.
Projects that were already running before the updated are not affected.
You can get a sense about what image they run by checking their
project_tag
label (or even delete old projects via
k delete pod -l run=project,project_tag=<old-tag>
in order to get
rid of these pods, which then allows kubelet to remove those old Docker
images and avoid runnig into disk pressure issues).
Note
The prepull service needs cluster-wide permissions,
because it must be able to modify the labels and taints of the nodes.
Feel free read through
cocalc/charts/manage/templates/prepull-update-script.yaml
and
cocalc/charts/manage/prepull.py
in case you want to know what it
does – it’s pretty simple, but since it has cluster-wide permissions,
you might want to audit it.
Node Setup#
To make use of this prepull service, you need two node pools. If you go ahead with the default names, configure the pools like this:
“service” pool:
set the Kubernetes label to
cocalc-role=services
(that’skey=value
)
“project” pool:
set the Kubernetes label to
cocalc-role=projects
(that’skey=value
)and the initial Kubernetes taint (this is
key=value → effect
) to:cocalc-projects-init=false
→NoExecute
cocalc-projects=init
→NoSchedule
Static#
This service just holds static files, which build up the frontend
application – it’s served at /app
and clients connect to the backed
via a websocket served by hub-websocket
. Overall, this is probably
the service which will be updated most often.
SSH Gateway#
If enabled (via global.ssh_gateway.enabled
), this service runs an
SSH server as a gateway to access projects. Users can add their public
SSH key to a project or their account. With that, they’re allowed users
to ssh inside a running project.
Use cases are:
simplifying running tasks, like periodic checks, etc.
up- or downloading files via
scp
orrsync
accessing scripts/software hosted on CoCalc from within another headless system, e.g. a cluster
Network setup: the connection to the outside world works by exposing the service’s endpoint. This is a “global” setup of your cluster, hence it is outside the scope of CoCalc’s HELM chart.
For the NGINX ingress controller, the TCP service of ssh-gateway
must be added to the tcp-services configmap.
If you’re using its HELM chart,
see /ingress-nginx/values.yaml
for a working example, under tcp: {...}
.
Note
If you already had setup a LoadBalancer
and update, it might
not pick up the new configuration to include port 22
. The easiest
way to fix this is to delete the LoadBalancer
and let it be
recreated.
Datastore#
If enabled (via global.datastore.enabled
), in the Project Settings a
configuration panel “Cloud storage & remote filesystems” appears. This
allows users to mount remote filesystems into the particular project.
This supports SSHFS, AWS S3 and Google Cloud Storage.
Under the hood, “Datastore” is a sidecar for the project, which mounts
these filesystems according to their configuration in /data/[name]
(where name
is the name of the datastore). This mountpoint is
propagated to the project container from the host. If the file
~/data
is not taken, the project will automatically create a symlink
to that global directory. Therefore, collaborators of the project can
use and see this filesystem, but they do not know the secret, don’t see
the raw configuration files, and also cannot interact with the actual
process doing the FUSE mount. The “secret” is hidden in the user
interface, it’s not sent to the web client.
The “read-only” mode enabled the
ro
mount option for the FUSE mount.To make the filesystem perform well, it does a bit of caching, but only with a small timeout. This means if you give it a few seconds to read/write sync, it’s possible to do a bit of collaboration via the same mounted filesystem. It’s not really recommended, but possible. Also note that there is filesystem level polling of discovered directories in CoCalc’s projects, which means that remote changes to these files will eventually show up as well and update in an opened editor. Those projects are also cached on CoCalc’s side.
Requests to support other remote filesystems are welcome, and if there is a robust tool and a way to easily configure them, we certainly consider adding it.
Pro-tip: if a project is set to “Always Running”, you can use the SSHFS configuration in combination with the SSH Gateway to mount a directory from another project. This is a bit of a hack, but it works.