Amazon AWS#

This guide helps you setting up CoCalc Cloud on AWS. It will use it’s EKS Kubernetes service to run CoCalc Cloud.


As of 2022-07-03, there is currently no out-of-the-box support for EKS. The following are notes based on the experience of setting everything up. This certainly assumes you have experience with AWS and Kubernetes. Some details could be out of date, but the general idea should still be valid.

There is also a guide for setting up CoCalc Cloud on Google GCP.

This also assumes you checked the general documentation for the CoCalc Cloud HELM deployment, e.g. to setup your own values.yaml file somewhere, to overwrite configuration values, know how to setup a secret storing the PostgreSQL database password, etc.

For more details look into Setup.

EKS configuration#

Setup your EKS cluster and make sure you can communicate with the cluster via your local kubectl client, etc. E.g. run

aws eks --region [your region] update-kubeconfig --name [name of cluster]

to get started.

Node Groups#

EKS should be configured to run two groups of nodes:

  • Services: the service nodes run hubs, manage, static, etc. To get started, two small nodes should be fine.

  • Projects: these nodes will hold the projects. They should be configured to have a certain taint and labels, right when they’re created.

Here is a minimal example to get started:

  1. “service”: this was good enough for a minimal setup:

    • 2x t3.medium (or t3a.medium), spot price, and 64GB of root disk (the project image is large!)

    • NOTE: “t3” might be a bad choice, because there is a low limit of IPs per node. Also, stuff like is not supported for t3 nodes (but not used at all, something to explore later on)

    • disk: 50GiB

    • set the Kubernetes label to cocalc-role=services (that’s key=value)

    • scaling: 2/2/2, such that you have two such nodes running.

  2. “project”: if you expand this to have separate nodes for the projects, create nodes with rather more ram than cpu, because memory is not elastic, but cpu is. Usually, in interactive usage, most of the time the project will wait for user input.

    • machine: t3.medium, disk: 100GiB. (the project image is large, and we might have to store two or more at the same time!)

    • then, to make full use of that prepull service:

      • activate it by setting it to “true” in your values.yaml configuration file and given you keep the label/taint values as they are by default:

    • set the Kubernetes label to cocalc-role=projects (that’s key=value)

    • and the initial Kubernetes taint (this is key=value effect) to:

      • cocalc-projects-init=falseNoExecute

      • cocalc-projects=initNoSchedule

    • The taints above signal the prepull service, that the node was not yet initialized (the daemon set will start pods on such nodes) and once the prepull pod is done it changes the taints to allow regular projects to run on these nodes and well, it also removes itself from that node. If you need to audit what prepull does (might be wise, since it needs cluster wide permissions to change the node taints), please check the included script.

    • scaling: 1/2/1 or whatever you need


The projects and some services need access to a storage volume, which allows ReadWriteMany. Commonly, this could be done via an NFS Server, but with AWS there is EKS – much better! To get EKS running in your EKS cluster, follow the instructions. In particular, I had to install eksctl, install an “OIDC” provider, then create a service account, etc.

Next step was to install the EFS driver via HELM, and actually create an EFS filesystem, give it access to all subnets (in my case there were 3), create a mount target, etc.

Now the important part: this EFS filesystem’s “access point” is only for root, by default. To make this work with CoCalc’s services, it must be for user/group with ID 2001:2001. To accomplish this, create a new StorageClass (you can choose the basePath as you wish, keeps this instance of CoCalc separate from other instances or other data you have on EFS):

  1. Create a file sc-2001.yaml with the following content:

    kind: StorageClass
      name: efs-2001
      provisioningMode: efs-ap
      fileSystemId: fs-[INSERT ID]
      directoryPerms: "700"
      uid: "2001"
      gid: "2001"
      basePath: "/cocalc1"
  2. Apply: kubectl apply -f sc-2001.yaml.

  3. Check: kubectl get sc should list efs-2001.

  4. Edit your values.yaml file: in the section for storage, enter this to reference the new StorageClass:

  class: "efs-2001"
    software: 10Gi
    data: 10Gi

which in turn will create the PersistentVolume + Claims as required. Size doesn’t matter, it’s unlimited.

Additional hints:

  1. You can change the Reclaim Policy to Retain, such that files aren’t accidentally deleted if these PVs are removed. See

  2. Set the life-cycle management of EFS to move unused files to long term (cheaper) storage and back if they’re accessed again. e.g.:

    • Transition into IA: 60 days since last access

    • Transition out of IA: On first access

Database / RDS PostgreSQL#

You could either run your own PostgreSQL server, or use the one from AWS: RDS PostgreSQL. Version 13 should be ok, you can also go ahead and use version 14.

Basically, the EKS cluster must be able to access the database (networking setup, security groups) and the database password will be stored in a Kubernetes secret. (see cocalc/values.yamlglobal.database.secretName)

Refer to the general instructions for the database how to do this, i.e. kubectl create secret generic postgresql-password --from-literal=postgresql-password=$PASSWORD should do the trick.

Docs that might help:

AWS Security Groups#

At this point, your service consists of a database, the EKS cluster (with its nodes and own VPC network), and the EFS filesystem. However, by default AWS isolates everything from each other. You have to make sure that there is a suitable setup of Security Groups that allows the EKS nodes to access the database and the EFS filesystem. This guide doesn’t contain a full description of how to do this, and this certainly depends on your overall usage of AWS. The common symptom is that Pods in EKS can’t access the database or the EFS filesystem, hence you see timeout errors trying to connect, etc. EFS manifests in pods not being able to initialize, can’t attach the volumes, etc., while the database manifests in the logs of “hub-websocket” pods. (it is responsible for setting up all tables/schemas in the database, hence this is the one to check first)



In the CoCalc HELM deployment, there are two ingress.yaml configurations, which are designed for K8S’s nginx ingress controller. The directory ingress-nginx/ has more details.

But just deploying it is not enough: the nginx ingress controller needs to be able to install a LoadBalancer. That’s done via an AWS Load Balancer Controller.

Once everything is running, you can check up on the Load Balancer via the AWS console: EC2 (new experience) → Load Balancing → Load balancer.

There, in the Basic Configuration, you see the DNS name - that’s the same you get via kubectl get -A svc

Once you have that (lengthy) automatically generated DNS name, copy it and setup your own sub-domain in your DNS provider. Basically add a CNAME entry to point to this DNS name.

What’s unclear to me, this did create a “classic” (deprecated) load balancer. Why not a more modern L4 network load balancer? Must be caused by whatever the load balancer controller is supposed to do.