Kubegres

With Kubegres operator, we are going to create a cluster of 3 PostgreSql instances which are replicating their data in real time.

1) Install Kubegres operator

The installation needs to be done once. Run the following command in a Kubernetes cluster:

kubectl apply -f https://raw.githubusercontent.com/reactive-tech/kubegres/v1.19/kubegres.yaml

During the installation, "Kubegres" creates the namespace "kubegres-system" where it installs its components. Check their status as follows:

kubectl get all -n kubegres-system

We should see the controller "kubegres-controller-manager" in a running state:

NAME                                                     STATUS
pod/kubegres-controller-manager-999786dd6-74tmb          Running

NAME                                                    TYPE
service/kubegres-controller-manager-metrics-service     ClusterIP

NAME                                                    READY
deployment.apps/kubegres-controller-manager             1/1

Once it is running, we can check the controller's logs, as follows:

kubectl logs pod/kubegres-controller-manager-999786dd6-74tmb -c manager -n kubegres-system -f

2) Check storage class

Kubegres requires a storage class in order to create PV (Persistent Volume) and PVC for each instance of PostgreSql. Please run the following command to check a storage class exist in the Kubernetes cluster:

kubectl get sc

For example, using Kind as a local Kubernetes cluster, the above command outputs:

NAME                 PROVISIONER             RECLAIMPOLICY
standard (default)   rancher.io/local-path   Delete

3) Create a Secret resource

Before creating a cluster of PostgreSql, we need to create a Secret resource in order to store the passwords of a PostgreSql's super user and a replication user:

Create a file:

vi my-postgres-secret.yaml

Add the following contents:

apiVersion: v1
kind: Secret
metadata:
  name: mypostgres-secret
  namespace: default
type: Opaque
stringData:
  superUserPassword: postgresSuperUserPsw
  replicationUserPassword: postgresReplicaPsw

Apply the changes:

kubectl apply -f my-postgres-secret.yaml

Note that in your Secret YAML, you can set any name that you would like for the keys "superUserPassword" and "replicationUserPassword". The password values in the YAML are also for example purpose.

4) Create a cluster of PostgreSql instances

To create a cluster of PostgreSql, we need to create a Kubegres resource. The Kubegres YAML below contains the minimum required configurations to set-up:

a cluster of PostgreSql pods with an official PostgreSql Docker container (version 12.4 or higher)
the property "replica: 3" means, Kubegres will create 1 Primary PostgreSql pod and 2 Replica PostgreSql pods
the data will be replicated in real time from the Primary PostgreSql pod to the 2 Replica PostgreSql pods

Create a file:

vi my-postgres.yaml

Add the following contents:

apiVersion: kubegres.reactive-tech.io/v1
kind: Kubegres
metadata:
  name: mypostgres
  namespace: default

spec:

   replicas: 3
   image: postgres:17.2

   database:
      size: 200Mi

   env:
      - name: POSTGRES_PASSWORD
        valueFrom:
           secretKeyRef:
              name: mypostgres-secret
              key: superUserPassword

      - name: POSTGRES_REPLICATION_PASSWORD
        valueFrom:
           secretKeyRef:
              name: mypostgres-secret
              key: replicationUserPassword

In the YAML above, under the field "spec.database" we only specified the field "size". But you can also specify the fields "storageClassName" and "volumeMount". There are more details about those fields in this page.
If you do not specify the field "storageClassName", Kubegres finds the default storage class of the Kubernetes cluster and assigns it. But you can also assign a value for that field manually. Once the storageClass is assigned, Kubernetes automatically provisions a PV and a PVC for each Postgres Pod.

The YAML above contains the minimum required configurations to deploy a cluster of PostgreSql. There are more configuration options available, such as the possibility to add annotations, set the name of a storageClass, specify a private repo in order to retrieve the container's image from it, enable backup, ... Please see the full list of options here.

Apply the changes:

kubectl apply -f my-postgres.yaml

A cluster of 3 PostgreSql instances is deploying in Kubernetes. Each PostgreSql instance is a pod. Check their status until they are "Running":

kubectl get pods -o wide -w

Kubegres logs events about the state of each PostgreSql cluster. Those events provide valuable information and would be very useful to debug any issue. We can access to them, as follows:

kubectl get events

5) The created resources

Kubegres operator will then create a cluster of PostgreSql pods as follows:

Let's check the created resources:

kubectl get pod,statefulset,svc,configmap,pv,pvc -o wide

Example of output:

NAME                 READY   STATUS    NODE
pod/mypostgres-1-0   1/1     Running   worker1
pod/mypostgres-2-0   1/1     Running   worker2
pod/mypostgres-3-0   1/1     Running   worker4

NAME                            READY
statefulset.apps/mypostgres-1   1/1
statefulset.apps/mypostgres-2   1/1
statefulset.apps/mypostgres-3   1/1

NAME                         TYPE
service/mypostgres           ClusterIP
service/mypostgres-replica   ClusterIP

NAME
configmap/base-kubegres-config

NAME                          CAPACITY
persistentvolume/pvc-838...   200Mi
persistentvolume/pvc-da6...   200Mi
persistentvolume/pvc-e25...   200Mi

NAME                                               CAPACITY
persistentvolumeclaim/postgres-db-mypostgres-1-0   200Mi
persistentvolumeclaim/postgres-db-mypostgres-2-0   200Mi
persistentvolumeclaim/postgres-db-mypostgres-3-0   200Mi

Kubegres has created 3 pods: "mypostgres-1-0", "mypostgres-2-0" and "mypostgres-3-0". Each of those pods runs a PostgreSql DB and are associated to a StatefulSet resource with the same name. And each pod is deployed in a distinct node to ensure strong resiliency in the case of a node failure.

Moreover, it created 2 Kubernetes clusterIP services: "mypostgres" and "mypostgres-replica". Those services allow client apps to access respectively to the Primary and Replica instances. There are more details about them in the next section below.

Kubegres created a base ConfigMap named " base-kubegres-config". It does that in each namespace where Kubegres resources are running. For more details about base ConfigMap, please see the page Override the default configurations.

And finally, Kubegres provisioned one PV and one PVC for each Postgres Pod using the StorageClass.

Kubegres uses predefined templates to create Kubernetes resources. Those templates are available in GitHub.

From that point, we have a resilient cluster of 3 PostgreSql databases. It is time to connect a client app to that PostgreSql cluster.

6) Connect client apps to PostgreSql

Based on the Kubegres YAML that we have created, a client app located inside the same Kubernetes cluster would use the following configurations to connect to a PostgreSql database:

host: mypostgres
port: 5432
username: postgres
password: [value of mypostgres-secret/superUserPassword]

Host

In this example, Kubegres created 2 Kubernetes Headless services (of default type ClusterIP) using the name defined in YAML (e.g. "mypostgres"):

a Kubernetes service "mypostgres" allowing to access to the Primary PostgreSql instances
a Kubernetes service "mypostgres-replica" allowing to access to the Replica PostgreSql instances

Consequently, a client app running inside a Kubernetes cluster, would use the hostname "mypostgres" to connect to the Primary PostgreSql for read and write requests, and optionally it can also use the hostname "mypostgres-replica" to connect to any of the available Replica PostgreSql for read requests.

This approach allows the client apps to access to the PostgreSql instances without any knowledge of IP addresses.

Note that because Kubegres creates Headless services and configure them with selectors, then no cluster IP addresses are created and the DNS resolves directly to the IPs of the Postgres Pods.

Please see the following diagram showing the 2 services that are created by Kubegres based on the YAML above:

If the Primary PostgreSql crashes, the automatic failover process will promote a Replica PostgreSql as a Primary. This process will be transparent for client apps as long as they are using the service name (e.g. "mypostgres") as the hostname. More details about this topic in the doc Replication and failover.

Port

By default, PostgreSql is accessible via the port 5432. It is possible to modify this by adding the property "port" in the YAML above. All possible YAML properties are defined in the doc all properties explained.

Username and Password

By default, the only available user is "postgres" which is a super user. Consequently, for your custom databases it is recommended to create an additional user with limited permissions, for example with specific access permissions to a set of database(s). It is possible to do so by creating:

an environment variable which contains the password of your custom PostgreSql user, as explained here.
a ConfigMap where we can override the bash script "primary_init_script.sh" as shown here. In that bash script, it is possible to execute any SQL queries to create custom database(s), user(s) and anything else required to initialise PostgreSql.

7) Delete a cluster of Postgres

A cluster of Postgres can be deleted with the command:

kubectl delete kubegres [unique name]

For example:

kubectl delete kubegres mypostgres

The above will delete all resources created for a cluster of Postgres identified by the name 'mypostgres'. It will delete resources such as Pods, Statefulsets, Services, ... The only resources that you have to remove manually are PV and PVC. There is one PV and one PVC created for each Pod. The reason why we don't to remove those resources is because they contain the database. In a future version of Kubegres it will be possible to set an option in the YAML so that they are automatically removed too.

Note that the command above will not delete other clusters of Postgres created with Kubegres. You have to run the command above with the name of the Kubegres resource for each cluster of Postgres to delete.

The above command does not delete the Kubegres controller/operator from your cluster. If you would like to delete it, you can run:

kubectl delete -f https://raw.githubusercontent.com/reactive-tech/kubegres/v1.19/kubegres.yaml

You can read about all properties that you can use in a YAML of "kind: Kubegres" in the page: All properties explained.

You can read about how Kubegres manages replication and failover.

17 December 2021