Deployment#
After you completed the Setup, we can start configuring and deploying your CoCalc Cloud instance.
This consists of a main chart in the /cocalc
directory,
and a couple of related sub-charts in /cocalc/charts/*
.
To deploy your instance, you have to overwrite some global or chart-specific values.
For that, maintain your own values file – which will be called my-values.yaml
in the following.
We recommend using Git to keep track of your changes.
Please start by studying these general instructions. On top of that, there are also more specific notes for various public cloud providers:
Your my-values.yaml
#
In
/cocalc
is a centralvalues.yaml
file. It defines the configuration for the sub-charts and some global values. All parameters are explained in detail inside that file as comments.Feel free to check out the sub-directories in
./charts
in case you want to know more about all details.After familiarizing with that, create your own my-values.yaml file. This will overwrite the default values with the ones relevant for your setup using the
-f my-values.yaml
switch ofhelm
(ref: Helm Install).To overwrite values in sub-charts, write the values indented under their
"sub-chart-name":
section.To define global values, list them in the
global:
section.
For example, to configure the storage backend, the chart files are in
/cocalc/charts/storage/
, which means these settings come understorage:
. There are also global storage settings used by other charts, which come underglobal.storage:
.Learn more about HELM sub-charts and global values.
Note
Regarding YAML, global.storage
in the text above means that these values come under the global:
section in the my-values.yaml
file. In that large indented block is a storage:
section, which is indented even further. The actual values are defined inside that double-indented block.
Note
Feel free to copy cocalc-eu.yaml
and use it as a starting point.
Configuration#
Basics#
Set the
global.dns
to your (sub) domain name.You should start going through the Site Settings in
global.settings
section: give you site a nice name, etc.Here, you’ll also have to tell the services how to connect to the database in
global.database
.Don’t forget to set
global.imagePullSecrets
if the secret is notregcred
(see Private docker registry).Set
global.setup_registration_token
to restrict account creation – you probably want that.Via
global.setup_admin: {...}
you define your initial admin account credentials.Tweak the default resource requests and limits of projects by adjusting the
global.settings.default_quotas
parameters. See Architecture/Project for some context. Beyond that, Resource Management explains how to manage resource allocation for projects.Peek into
cocalc-eu.yaml
to see how things are setup for that cluster.
Warning
Do not enable the SSH Gateway for the first installation deployment. It won’t work. Instead, make sure your cluster works well, and then enable it in a subsequent update.
Storage#
Earlier, we discussed the possibilities to setup storage. Now, let’s see how to configure it!
By default, the HELM chart creates PersistentVolumeClaims for the
data and software using the nfs
StorageClass.
You can configure the storage class and size of the
PVCs in the values.yaml
file. Look out for a section like this:
storage:
class: "nfs"
size:
software: 10Gi
data: 10Gi
Alternatively, you can create the PVCs yourself and use them. For that, set these two aspects:
Don’t create them via the HELM charts, i.e.
storage.create: false
.Let CoCalc know about their names. E.g. if they’re called
pvc-data
andpvc-software
, the relevant part in the config file would look like this:
storage:
create: false
global:
[...]
storage:
data:
claimName: pvc-data
software:
claimName: pvc-software
The default names are projects-data
and projects-software
.
Example 1#
If you bring your own NFS server and want to setup everything manually, you could follow the notes of setting up PV/PVC on Azure. This walks you through the steps of creating a PV and PVC for software and project data in two subdirectories. Then it deploys a small setup job, which creates these subdirectories and sets their proper ownership and permissions.
Example 2#
As reference, on a cluster on Google’s GKE, the following PV, PVCs and StorageClasses exist:
$ kubectl get -o wide pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE VOLUMEMODE
pvc-58ada9eb-220d-46d9-ba5d-31ceb0c0fc45 10Gi ROX Retain Bound cocalc/projects-software nfs 111d Filesystem
pvc-ace52cd2-fb85-4fb9-96f7-19cd9575f5c2 20Gi RWO Retain Bound cocalc/data-nfs-nfs-server-provisioner-0 pd-standard 112d Filesystem
pvc-ceae334d-8f0a-447b-b5c6-fcae6843a498 10Gi RWX Retain Bound cocalc/projects-data nfs 111d Filesystem
$ kubectl get -o wide pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE
data-nfs-nfs-server-provisioner-0 Bound pvc-ace52cd2-fb85-4fb9-96f7-19cd9575f5c2 20Gi RWO pd-standard 112d Filesystem
projects-data Bound pvc-ceae334d-8f0a-447b-b5c6-fcae6843a498 10Gi RWX nfs 111d Filesystem
projects-software Bound pvc-58ada9eb-220d-46d9-ba5d-31ceb0c0fc45 10Gi ROX nfs 111d Filesystem
$ kubectl get -o wide storageclass nfs
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs cluster.local/nfs-nfs-server-provisioner Retain Immediate true 111d
$ kubectl get -o wide storageclass pd-standard
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
pd-standard pd.csi.storage.gke.io Retain Immediate true 111d
Note
The
nfs
storage class is created by thenfs-ganesha-server-and-external-provisioner
helm chart. It uses apd-standard
disk to store the data.projects-data
andprojects-software
are provided by that NFS service.There is no PV/PVC for a database, because this cluster uses GCP’s managed PostgreSQL service.
Advanced#
Here is some context for more advanced configuration options.
General#
global.registry
is the upstream docker registry. It’s used for pulling images. You can change it to your own registry if you mirror all your images. Note: if you just customize the project’s software environment, then you have to change themanage.project.registry
setting instead!global.imagePullSecrets
: see Private docker registry.global.setup_admin
: see Admin Setup, e.g.:global: setup_admin: email: your.admin@email.address password: R3pLaC3mE name: "Your Name"
You can also leave out the password and set HELM chart params on the command-line via
helm [...] --set global.setup_admin.password=[password]
.global.setup_registration_token
: if your server is publicly available, you probably don’t want anyone to be able to create an account. This sets an initial token, that must be known to a user to be able to sign in. This does not affect SSO logins, because with them you’re already in control about who is allowed to get access. See Admin Setup for more.global.kubectl
is the version tag string of an image. It’s used for running Jobs that need to runkubectl
commands. The version should roughly match the version of the Kubernetes cluster’s API server you’re running.global.ingress
: this is used to populate the Ingress rules. Look at theletsencrypt/README.md
file for more details. Obviously, this has to match whatever you have set up in Networking earlier.global.networkingConfiguration
allows you to disable all Ingress or NetworkPolicy rules. This is useful if you have a cluster with a different networking setup.global.datastore
: see Datastore.global.priorityClasses
andglobal.podDisruptionBudgets
: if enabled, this will define PriorityClasses to make some Pods more important than others, while the Pod Disruption Budget defines how many pods of a replicated service can be interrupted during maintenance, cluster changes, etc.
Site Settings#
Settings in global.settings
can either be configured as Admin in Admin → Site Settings or via environment variables, which are picked up by the hub-websocket
services. Setting them in your values file makes them read-only from the interface and hence you can track all changes explicitly in your Git repository.
In the future, there will be a more general Admin Guide.
Settings are explained in the values.yaml
file, though.
Storage#
The storage:
section is explained in Storage.
Project Pods#
manage.timeout_pending_projects_min
: All Manage services are responsible for starting and stopping project pods. If for some reason such a project pod is in the statePending
for too long, it will be killed. This is the timeout in minutes.manage.project.tag
is the default image tag, on which a project will run. It’s possible to customize the software environment in various ways, including changing the default project image. This is where the upstream project image tag will be set.The Prepull service can be enabled or disabled via
manage.prepull.enabled
. Associated with it are the taints and labels for project nodes, which are defined inmanage.project.dedicatedProjectNodesTaint
and*.dedicatedProjectNodesLabel
.manage.mem_request_adjustment
: basically a safeguard to avoid too large memory requests by project pods. Applied on the computed memory request of a project’s pod quotas – see Scaling/Projects for more context. Thelim
is a hard upper bound in MiB,pct
is dynamic in percent of memory limit.manage.timeout_pending_projects_min
: projects stuck in “Pending” are deleted after this timeout.manage.watchdog_stream_s
: “manage” listens to a stream of changes coming from your Kubernetes API. If it is behind a LoadBalancer, this stream might be interrupted, without even knowing that it is. So, if there is no new data coming in for that time, the stream is restarted.manage.project.serviceAccountName
customize the ServiceAccount used by the project (if this isn’t set, it will not set it in the template and hence use the default).Settings related to hub’s resources and
multiple_replicas
are explained in scaling frontend service.Projects are prohibited from accessing internal services. That’s accomplished using a set of NetworkPolicy rules. If you want to allow projects to access internal services, you can set
networkPolicy.allowProjectEgress
to a list of rules. Check out the template filecocalc/templates/network-policies.yaml
for more details about this.manage.project.init
: this is a setup script, which is sourced right before the local server of a project is started. It allows you to tune parameters, change environment variables, or call scripts from your compute environment. Use your powers wisely! The/cocalc/values.yaml
file shows two configurable aspects. Tune them to your needs based on your experience:Blobstore: this is a local store of data, which has been generated in Jupyter Notebooks – in particular, images. They are served from a small web-server for efficiency and e.g. TimeTravel uses them to show old plots, no longer part of the notebook. There are two implementations: an Sqlite3 based one (old, has been the only implementation) and a file-based store (new, now the default, especially made for NFS backed file-systems, which are known to cause problems with
sqlite
databases). You can selected the implementation by settingCOCALC_JUPYTER_BLOBSTORE_IMPL=sqlite
ordisk
. Disk is recommended and the default. The other tuning parameters for the disk based store are a size limit and number of files:JUPYTER_BLOBSTORE_DISK_PRUNE_SIZE_MB=100
: prune disk usage of jupyter blobstoreJUPYTER_BLOBSTORE_DISK_PRUNE_ENTRIES=1000
: max number of files in that blobstore cache
Jupyter Kernel Pool: This is a mechanism to spin up one or more kernels without a notebook, to make them ready for use. E.g. when you restart a kernel, it will pick one from the pool and you see the notebook running without a delay. The tradeoff is memory usage. By default, one kernel will be in the pool. You can tune this as well:
COCALC_JUPYTER_POOL_SIZE=1
: size of pool, set to0
to disable itCOCALC_JUPYTER_POOL_TIMEOUT_S=900
: after that time, clean up old kernels in the poolCOCALC_JUPYTER_POOL_LAUNCH_DELAY_MS=5000
: delay before spawning an additional kernel
Software Environments#
In a nutshell, the global.software
dict defines default settings and a list of environments,
where each one of them consists of a title, description, docker image tag and registry.
Users can then choose from these environments when creating a new project,
or changing the software environment of an existing project.
See Software Environment for more context about how to install additional software.
Single Sign On#
Besides signing up via email/password credentials (optionally restricted by the setup_registration_token
mentioned above), users can also sign in via SSO. This is configured in global.sso
. Each entry in that dict is the name
of an SSO provider. See values.yaml
for a detailed break down. In a nutshell, you have to set the type
and other general parameters in the conf
section, while user-facing parameters like a name/icon are set in the info
section.
Miscellaneous#
hub.deleteProjectsIntervalH
: there is a dedicated service to “unlink” a project, that has been marked as being deleted. The data itself is retained, though. You have to periodically check up in the database, which projects are deleted and remove the associated files.hub.debug
,manage.debug
,manage.project.debug
, …: tweak the$DEBUG
variable used with Debug JS, in order to control how much information is logged to the Pods.ssh-gateway.recent_projects
: to make a project start automatically when connecting via the SSH Gateway, there must be a prepared entry for the project – a limitation of how this is implemented. This setting controls how many projects are considered as “recent” and hence are prepared for that. Increase this to e.g.1 year
, if you do not have many projects – i.e. more than 1000 per year. Otherwise, you have to start the project first, wait up toloop_delay_secs
, and then try to connect via SSH.
Installation#
After setting up your config file my-values.yaml, run
helm install cocalc ~/path/to/cocalc-cloud-helm/cocalc --timeout 20m -f my-values.yaml
in your own directory. What’s happening is a standard Helm installation.
This means the HELM chart and sub-charts in the /cocalc
directory is rendered and populated with all configuration parameters – your my-values.yaml
parameters are merged into the default parameters.
The resulting Kubernetes YAML config files are installed under the
HELM deployment name cocalc
in the current namespace.
Since pulling the images and running some setup jobs could take a while, the timeout is increased.
If you want to check the status on HELM’s side, you can run:
helm status cocalc
Note
If something goes horribly wrong, you can always uninstall the deployment via helm uninstall cocalc
.
Only caveat, if PV/PVCs have been created, they might not be deleted automatically.
Admin Setup#
After the installation is complete, you should be able to access the CoCalc frontend via the
DNS
you specified in your my-values.yaml.Use the credentials of your initial admin user you configured in your my-values.yaml to sign in. Change your admin password in the account settings; it won’t be overwritten.
I would also recommend to remove your credentials by changing that field back to
global.setup_admin: {}
. Or, you could use this to create a second admin account during your next update.If you specified a
global.setup_registration_token
(which is highly recommended!) it will be setup initially. Open “Admin” → “Registration Token” to setup your own, or disable this one, etc.
Testing#
Of course, first just check if the Kubernetes Pods are showing up and are running. See Troubleshooting for some ideas how to investigate problems.
Once the services are running, there are very basic tests to check if they return some information:
helm test cocalc
(where cocalc
is the name of your CoCalc Cloud deployment).
This starts a few Kubernetes Jobs and checks if they succeed.
Beyond that, a good end-to-end test is to
Open the website, sign in as admin, and go to your projects.
Open a project of yours, or create one.
Open or create a
terminal.term
file and run basic Linux commands likeuptime
orhtop
.In that terminal, check if CoCalc related environment variables make sense:
$ env | grep -i COCALC
.Open or create a Jupyter Notebook, select a popular kernel like “Python 3” and eval a cell with code like:
import sys sys.version
or:
import pandas as pd print(pd.__version__) df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}) df.describe()
Similar for R, Octave and Sage (if installed).
For Sage, make sure evaluating code works. If it doesn’t, try running
sage
in a Terminal and if you get an “ILLEGAL INSTRUCTION” error, that means your hardware is too old for the Sage binary. Contact us.
Create a
latex.tex
document and check if it compiles.Once some files are opened in your project, hit the refresh button of the browser. The files should still be there after reloading them, ready to be edited.
Updating#
Make sure to update regularly!