Cluster Configuration¶
The GKE Cluster Architecture¶
Google Kubernetes Engine makes several key architectural decisions for you that may differ from other Kubernetes cluster installations. These choices are made in the interest of ease of use, operational simplicity, and security:
- There is no direct access to the control plane systems - GKE manages the GCE instances running
etcdand theapi serverand supporting components, and does not expose them to you or an attacker via SSH, for example. In exchange for giving up direct control over all API server configuration flags, your GKE cluster offers very strong protection of the core components and sensitive data inetcdautomatically. This means your control over the control plane is restricted to the configuration options on the GKEclusterobject. - Upgrades are handled via GKE Operations - Upgrades and downgrades are handled via GKE operations through API calls. GKE versions (which map to Kubernetes versions) and worker
nodeOS security and version upgrades are all core features of the built-in upgrade process. - The default worker
nodeoperating system is COS - Container-optimized OS is a hardened, minimal operating system engineered specifically to run containers and to function as a Kubernetes workernode. While SSH is enabled by default and integrates with GCP SSH functionality, it's goal is to make SSHing into each workernodeunnecessary by handling patching and upgrades as a part of thenode poollifecycle. There is an Ubuntu workernodeoption for supporting unique use cases, but it requires additional administration in terms of upgrades and security patches. - The CNI is not configurable - The native GKE CNI is used to connect worker
nodesandpodsto the GCP Network and is not currently replacable. - Network Policy enforcement is done via Calico - Enabling Network Policy enforcement installs a Calico deployment and daemonset as a managed addon for you.
- Logging and Monitoring is done via Stackdriver - By default, all logs and performance metrics are sent to Stackdriver in the current GCP
project.
Resources¶
Geographical Placement and Cluster Types¶
GKE clusters are by default "zonal" clusters. That is, a single GCE instance running the control plane components is deployed in the same GCP zone as the node-pool nodes. "Regional" GKE clusters are deployed across three zones in the same region. Three GCE instances running the control plane components are deployed (one per zone) behind a single IP and load balancer. The node-pools spread evenly across the zones with a minimum of one node per zone.
The key benefits to a "regional" cluster:
- The
clustercontrol plane can handle a single GCPzonefailure more gracefully. - When the control plane is being upgraded, only one instance is down at a time which leaves two remaining instances. Upgrades on "zonal" clusters means a 4-10 minute downtime of the API/control plane while that takes place where you can't use
kubectlto interact with yourcluster.
Since there is no additional charge for running a "regional" GKE cluster with a highly-available control plane, why use a "zonal" cluster? The two primary reasons are to save on "cross-zone" network traffic costs and to support specific GPU node-pool needs.
Persistent Disks
The default StorageClass defines persistent disks that are "zonal". You may want to add a new StorageClass that makes "regional" persistent disks available for pod workloads that get rescheduled on another node in a different zone to still be able to access them.
Best Practices¶
- Use Regional Clusters - Unless you have specific needs that force you to use a "zonal"
cluster, using "regional"clustersoffers the best redundancy and availablility for a minor increase in network traffic costs for the majority of use cases. - Offer a Regional Persistent Disk
StorageClass- Allowspodsto attach and access persistent disk volumes regardless of where they are scheduled inside the cluster. This prevents azonefailure from allowing apodto be rescheduled and mount that disk on anodein anotherzone.
Resources¶
Google Cloud IAM and Kubernetes RBAC¶
- Google Cloud Identity and Access Management (IAM) - The system in GCP that grants permissions via GCP
IAM Rolesto users and service accounts to access GCP APIs. - Kubernetes Role-Based Access Control (RBAC) - The native system inside Kubernetes that grants permissions via Kubernetes
rolesandclusterrolesto users and service accounts to access the Kubernetes API Server.
The two APIs that can be used to interact with the GKE service and a GKE cluster are:
- GCP GKE API (container.googleapis.com) - Used to create/update/delete the
clusterandnode poolsthat comprise the GKEclusterand to obtain connection and credential information for how to access a givencluster. - Kubernetes API of the
cluster- The unique Kubernetes API Server endpoint running on the GKE Control Plane systems for a specificcluster. It controls resources and access to resources running inside the cluster.
This means that there are two main categories of permissions that you have to consider:
- Cluster Administration - Permissions associated with administering the
clusteritself. - Cluster Usage - Permissions associated with granting what is allowed to run inside the cluster.
There are a couple key points to understand about how Cloud IAM and Kubernetes RBAC can be used to grant permissions in GKE:
- GCP Cloud IAM is administered at the
projectlevel, and the granted permissions apply to all GKEclustersin theproject. They remain in-place even if aclusteris deleted from theproject. - Kubernetes RBAC permissions are administered per-
clusterand are stored inside eachcluster. When thatclusteris destroyed, those permissions are also destroyed. - GCP Cloud IAM permissions have no concept of Kubernetes
namespaces, so the permissions granted apply for allnamespaces. - Kubernetes RBAC can be used to grant permissions to access resources in all
namespacesor only in specificnamespaces. - Kubernetes RBAC cannot be used to grant permissions to the GCP GKE API (container.googleapis.com). This is performed solely by Cloud IAM.
- Both systems are additive in that they only grant permissions. They cannot remove or negate permissions from themselves or each other.
IAM and RBAC combine inside the cluster
When accessing a resource (e.g. list pods in the default namespace) via the Kubernetes API of a GKE cluster, it can be granted via IAM or RBAC. They are effectively combined. If either grants access, the request is permitted.
Best Practices¶
-
There are several predefined IAM
rolesfor GKEclustersthat can ease administration as you are getting started with GKE. However, the permissions they grant are often overly broad and violate the priciple of least privileges. You will want to create custom IAMroleswith just the permissions needed, but remember that IAM permissions apply to allnamespacesand cannot be limited to a singlenamespace. -
To maintain least privilege, it's recommended to leverage a minimal IAM
roleto gain access to the GKE API and then use Kubernetes RBACrolesandclusterrolesto define what resources they can access inside thecluster.In order for
usersandservice accountsto perform agcloud container clusters get-credentialscall to generate a validkubeconfigfor a GKEcluster, they need to have the following permissions:container.apiServices.getcontainer.apiServices.listcontainer.clusters.getcontainer.clusters.listcontainer.clusters.getCredentials
If these permissions are grouped into a custom IAM
role, that IAMrolecan be conveniently bound to a Gsuite/Cloud Identitygroupwhich includes all users that need access to thecluster.From this point, the
usersandservice accountscan be granted access to resources as needed withcluster-wide or per-namespacegranularity. This has the benefit of minimizing IAM changes needed, ensuring access granted is per-cluster, giving the highest granularity, and making troubleshooting permissions an RBAC-only process. -
With the above approach of deferring nearly all permissions to in-
clusterRBAC instead of IAM, there is one exception: assigning theKubernetes Engine Adminpredefined IAMroleat theprojectorfolderlevel to a small number of trusted administrators to ensure that they don't accidentally remove their own access via an RBAC configuration mistake.
Resources¶
Cluster Access¶
Control Plane Access¶
The GCE instances running the control plane components like the kube-apiserver and etcd are not directly accessible via SSH and the GCE API. They don't appear in gcloud command outputs nor in the GCP console. The GKE control plane is only accessible via the exposed Kubernetes API on tcp/443. By default, the API is assigned a public IP address and has firewall rules that allow access from 0.0.0.0/0. While this is conducive to ease of use and access from anywhere, it might not meet your security requirements.
Warning
Sufficient access to the Kubernetes API equates to root on all the worker nodes. Protecting the Kubernetes API from access misconfiguration, denial-of-service, and exploitation should be high on a security team's priority list.
There are controls available to improve on the default configuration, and they are covered in the next section.
Worker Node Access¶
Because GKE nodes are simply GCE instances inside instance groups, they are accessible via SSH in accordance with the normal GCE routing and VPC firewall rules. The standard GCE methods for using gcloud compute ssh to gain access to the underlying operating system work as expected.
There are two things to consider when granting SSH access to nodes:
- Does SSH access need to be available publicly? In many cases, additional firewall source restrictions are useful in limiting the allowed subnets from which SSH access can be initiated. For instance, from a set of external office IP addresses or from a specific VPC
subnetdesignated for management purposes. - Several primitive IAM roles like
Owner,Editor,Dataproc Service Agent,Compute OS Admin Login,Compute Admin, andCompute Instance Admininclude thecompute.instances.osAdminLoginpermission. Users and service accounts with those permissions and network level access can SSH into the workernodesand gainrootpermissions at the operating system level.
Warning
Gaining root access to a GKE worker node allows that user to view all secrets attached to pods running on that node. Those secrets may include tokens that belong to service accounts which have elevated permissions to the Kubernetes API. With that token, that user can access the Kubernetes API as that service account. If that service account has permissions to modify Role-Based Access Control or to view all secrets, that commonly equates to full cluster access (aka "cluster admin"). As "cluster admin" has full control of the Kubernetes API, that equates to root on all worker nodes.
Cluster Settings¶
For a full list of cluster configuration level items, the google_container_cluster terraform provider documentation is a fantastic resource for a list of each setting and its purpose. The list below aims to cover the availability and security-related items with additional supporting guidance:
- Location - The
region("us-central1") orzone("us-central1-a") where theclusterwill be located. Clusters cannot span multipleregions. If aregionis specified, the control plane will have aninstancein each of the threezones. If azoneis specified, the control plane will have a single instance in thatzoneonly. - Node Locations - The list of one to three
zoneswhere thenodeswill be located, and it must be in the sameregion. Recommend: specifying aregionto obtain a fault-tolerant control plane.
Mix-and-Match
If node locations is specified, it overrides the default behavior. It's possible to have a regional cluster but then use node locations to restrict the zones to only one or two instead of the default of three. It's also possible to create a zonal cluster with a single control plane instance but then use node locations to select one to three zones where the nodes will be located. Typically, this feature is used to help guide clusters into configurations to align with large CPU or GPU quota needs.
- Addons Config
- HTTP Load Balancing - Enabled by default, this installs the
ingress-gceHTTP/S Load Balancing and Ingress Controller. It can be disabled to allow for installation of a different ingress controller solution or left implemented but unused. Recommend: enabled. - Kubernetes Dashboard - Now disabled by default, this installs a Kubernetes dashboard deployment inside the
cluster. Historically, the dashboard has been misconfigured and has had a small number of security issues that have led to compromises. With the GCP console providing all the dashboard features needed, this can and should be disabled. Clusters that were originally created in the Kubernetes1.6era and upgraded to the present may still have this enabled. Recommend: Disabled. - Network Policy - Disabled by default, this installs the Calico CNI components which enables the configuration of
networkpolicyresources in the Kubernetes API that can restrictpod'-to-podcommunication. It is "safe" to enable by default as the default effect of thenetworkpolicyconfiguration in Kubernetes is "allow all traffic" until anetworkpolicy` is defined. Recommend: Enabled. - Istio - Disabled by default, this installs the Google-managed deployment of Istio inside the
cluster. Istio can provide interesting security-related features like "mutual TLS" encryption for traffic betweenpodsin themesh, deep traffic inspection in the form of network traces, and granular role-based permissions for egress network traffic and Layer 7 access controls. In terms of complexity, Istio adds over 50 custom resource definitions that a security team will need to analyze and understand to have awareness of the security posture of thecluster. Recommend: "Disabled" until the features are needed.
- HTTP Load Balancing - Enabled by default, this installs the
- Database Encryption - Disabled by default, this enables an integration with Cloud KMS to have the API server access a Cloud KMS key to encrypt and decrypt the contents of
secretsas they are stored and accessed frometcd. In environments where the control plane nodes are potentially accessible, this makes the compromise of a dedicatedetcdsystem oretcdbackup more difficult to extract thesecretsin plain-text. In GKE, the control plane systems are not accessible directly by users or attackers, so this adds marginal benefit aside from satisfying regulatory requirements of "databases must encrypt sensitive records at the row-level". Recommend: "Disabled" unless compliance requirements dictate. - Default Max Pods Per Node - Controls how many
podscan run on a single node. The default is the hard-coded maximum of 110, but this can be reduced to 64, 32, 16, or 8 if the workload profile is known. This causes thesubnetallocated to each GKEnodeto go from a/24down to a/25,/26, and so on, and it can greatly reduce IP address consumption of precious RFC1918 space. Recommend: the default of 110 unless specific IP space or capacity needs require smaller. - Binary Authorization - Disabled by default. This addon enables an admission controller that validates
podspecs against a policy that defines which container image repositories are allowed and/or validates that container images have a valid PGP signature before allowing them to run inside the cluster. Can be enabled safely as the default policy is to "allow all" until configured. Recommend: Enabled. - Kubernetes Alpha - Disabled by default. Do not enable on production clusters as alpha clusters are deleted automatically after 30 days by GKE. Recommend: Disabled.
- Legacy ABAC - Disabled by default, this is a legacy permissions mechanism that is largely unused in the Kubernetes community. The default role-based access control (RBAC) mechanism is enabled by default and is preferred. Recommend: Disabled.
- Logging Service - Enabled by default, this installs and manages a daemonset that collects and sends all
podlogs to Stackdriver for processing, troubleshooting, and analysis/auditing purposes. Recommend:logging.googleapis.com/kubernetes - Maintenance Policy - Defines a preferred 4-hour time window (in UTC) for when GKE should perform automatic or security-related upgrades. Recommend: setting a 4-hour UTC window with the least workload impact and overlap with operational on-call rotations.
- Master Authorized Networks - By default, the implicit allowed CIDR range is
0.0.0.0/0for which source IPs can connect to the Kubernetes API. You may specify up to 50 different CIDR ranges. This setting can be changed as needed without affecting the lifecycle of the runningcluster, and it is the most common method for reducing the scope of who can reach the KubernetesclusterAPI to a smaller set of IPs. In the event of a vulnerability in the API server, enabling a small list of CIDRs to a known set can reduce the risk of an organization delaying maintenance operations to a time that is less busy or risky. Recommend: Configuring a short list of allowed CIDRs. - Monitoring Service - Enabled by default, this installs and manages a daemonset that collects and sends all metrics to Stackdriver for troubleshooting and analysis. Recommend:
monitoring.googleapis.com/kubernetes - Pod Security Policy - Disabled by default, this enables the admission controller used to validate the specifications of
podsto prevent insecure or "privileged"podsfrom being created that allow trivial escaping to the underlying host asrootand bypassing other security mechanisms likenetworkpolicy. Recommend: Enabled with extensive testing OR implementing the same features with an Open Policy Agent/Gatekeeper validating admission controller. - Authenticator Groups - Disabled by default, this enables RBAC to use Google Groups in
RoleBindingandClusterRoleBindingresources instead of having to specify eachuserorserviceaccountindividually. For example, a Gsuite administrator creates a group calledgke-security-groups@mydomain.comand places groups namedgke-admins@andgke-developers@inside it. Passinggke-security-groups@mydomain.comto this setting allows RBAC to reference/lookup thegke-admins@mydomain.comand/orgke-developers@mydomain.comwhen evaluating access in the API to resources. Recommend: Enabled if using Gsuite/Google Groups to support easier to manage RBAC policies. - Private Cluster -
- Master IPv4 CIDR Range - The RFC1918 subnet that the control plane should pick an IP address from when assigning a private IP. Recommend: a
/28that does not overlap with any other IP space in use. - Enable Private Endpoint - "false" by default, setting this to "true" instructs the GKE control plane IP to be selected from the master CIDR range and does not expose a public IP address. Recommend: true
- Enable Private Nodes - "false" by default, setting this to "true" instructs the GKE
nodesto not be assigned a public IP address. Recommend: true
- Master IPv4 CIDR Range - The RFC1918 subnet that the control plane should pick an IP address from when assigning a private IP. Recommend: a
Private Clusters
There is a common misconception that a "private cluster" is one that has worker nodes with private IP addresses. This is only half-correct. A true "private cluster" is one that has both the control plane and nodes using private IP addresses, and both of the above settings are necessary to achieve that improved security posture. When these two settings are enabled and combined with a small list of master authorized networks, the attack surface of the Kubernetes API can be significantly reduced.
- Remove Default Node Pool - By default, GKE deploys a
node poolto be able to runpods. However, its lifecycle is tied to that of theclusterobject. Meaning, certain changes may cause the entireclusterto be recreated. It's recommended that you disable/remove the defaultnode pooland explicitly declarenode poolswith the desired settings so that they keep their lifecycles independent from the control plane/cluster. - Workload Identity - Disabled by default, this enables the Workload Identity addon and by specifying which Identity "domain" to use. Without Workload Identity,
podswanting to reach GCP APIs would be granted the credentials attached to the underlying GCEinstance. The underlying GCE instance has permissions to send logs, send metrics, read from Cloud Storage/Container Registries, and more. Historically, thenodeswere assigned the default compute service account which was assignedProject Editorpermissions. This meant that everypodcould potentially haveProject Editoraccess! With Workload Identity, a daemonset on every workernodeintercepts all calls for instance credentials to the metadata API. If the Kubernetesservice accountattached to thatpodhas been granted a specific binding to a Googleservice account, then the dynamic crednetials for that specific Googleservice accountwill be returned to thepod. In essence, this is a way to map Kubernetesservice accountsto Googleservice accountsand can remove the need for exporting Googleservice accountkeys in JSON format and storing them manually inside Kubernetessecrets. Recommend: Enabled. - IntraNode Visibility - Disabled by default, this enables the network traffic on the
podnetwork to be viewable by the VPC to be available for capture with VPC flow logs. Recommend: Disabled unless flow logs frompodtraffic is needed.
Node Pool Settings¶
For a full list of node configuration level items, the google_container_node_pool terraform provider documentation lists each setting and its purpose. The list below aims to cover the availability and security-related items with additional supporting guidance:
- Auto Repair - If enabled, should the
nodevalidate its own health and destroy/recreate thenodeif it detects a problem. Generally speaking, this can be safely enabled and avoid getting paged for minor issues that can be self-healed. Recommend: Enabled. - Auto Upgrade - If enabled, will attempt to keep the
nodeversions in sync with the control plane version during the maintenance window. Depending on how the applications running inside the cluster respond to losing a workernodeor how close to maximum capacity theclusteris running, this may or may not affect workload availability. While all applications should be designed to handle this gracefully, it may require additional testing before it is considered appropriate to enable on your productionclusters. Recommend: Enabled with proper testing. - Node Config
- Image Type - Default is "COS", or "Container-Optimized OS". This is a hardened, minimalist distribution of Linux that is designed to run containers securely. Currently, the stable/default version of "COS" leverages a full Docker installation. "COS_containerd" is available leverages
containerdinstead. "Ubuntu" is available for specific needs where full control over the Operating System and kernel modules is required, but it defers many operational and security-related tasks to the customer. Recommend: "COS" until "COS_containerd" becomes the default. - Metadata - The mechanism for passing attributes to the GCE Instance Metadata of the underlying GCE
instances. Perhaps the most important setting which is now default in GKE1.12+, isdisable-legacy-endpoints=true. This enables only thev1metadata API on the instance and disables the "legacy" API versionv1beta1. The legacy API does not require a custom HTTP header ofMetadata-Flavor: Googleto be passed in order to access the metadata API. Applications running on the GCEinstanceorpodson the GKE workernodesvulnerable to Server-Side Request Forgery (SSRF) attacks are therefore much less likely to allow exposure of instance credentials via the Metadata API since they need to also control the ability to send the custom HTTP header for the attack to succeed. Recommend:disable-legacy-endpoints=trueis set. - Oauth Scopes -
oauth scopesserve to "scope" or "limit" the GCP APIs that the credentials of the attachedservice accountcan access. It is the intersection or "overlap" of the permissions granted by theoauth scopesand the Service Account that defines the actual access. If aservice accountattached to thenode poolis grantedProject Ownerbut is only assigned thehttps://www.googleapis.com/auth/logging.writeoauth scope, then those credentials can only write logs to Stackdriver. Additionally, if theoauth scopewashttps://www.googleapis.com/auth/cloud-platform(an alias for "*" or "any GCP API") but theservice accountwas only grantedroles/logging.writer, then those credentials can still only be used to write logs to Stackdriver. Avoid grantingcloud-platformorcomputeoauth scopes, especially when paired withProject EditororProject OwnerorCompute AdminIAM roles, or anypodcan leverage that access! Recommend: theoauth scopesassigned to new clusters by default in GKE1.12+. - Sandbox Config (Gvisor) - Disabled by default, this allows the
nodesto runpodswith thegVisorsandboxing technology. This provides much greater isolation of the container and its ability to interact with the host kernel, but certainpodfeatures are not supported. Disk performance of thosepodswill be negatively affected, so additional acceptance testing encouraged. Recommend: enabled on dedicatednode poolswhere workloads are running that need additional isolation. - Service Account - By default, the "Default Compute Service Account" in the
projectis assigned, and this hasProject Editorbound. To give eachclusterand/ornode poolthe ability to use separate and least-privilege permissions, this should be a dedicatedservice accountcreated for eachclusterornode pool, and the minimum permissions assigned to it. Used in conjunction with Oauth Scopes. Recommend: create and specify a dedicatedservice accountand not the default. - Workload Metadata - Not set by default (the equivalent of "UNSPECIFIED"), this flag enables either the Metadata Concealment Proxy ("SECURE") or Workload Identity ("GKE_METADATA_SERVER") options. The Workload Identity feature performs all of the same concealment functionality of the Metadata Concealment Proxy buth with the added ability of mapping KSAs to GSAs for dynamic GCP credential access. Recommend: "GKE_METADATA_SERVER".
- Image Type - Default is "COS", or "Container-Optimized OS". This is a hardened, minimalist distribution of Linux that is designed to run containers securely. Currently, the stable/default version of "COS" leverages a full Docker installation. "COS_containerd" is available leverages
Resources¶
- Hardening GKE Clusters
- Private GKE Clusters
- GCE SSH Access
- GKE Master Authorized Networks
- Terraform Cluster Provider
- Terraform Node Pool Provider
- GKE Pod Security Policy
- Open Policy Agent
- Gatekeeper
- COS
- COS_containerd
- GKE Metadata Concealment
- GKE Workload Identity
- GKE OAuth Scopes
- gVisor Sandboxing
- VPC/Pod Flow Logs