validating admission policies, the future of kubernetes policies
Update on Jan 23rd, 2023: this blog article is now on Medium.
Kubernetes 1.26 just introduced an alpha feature: Validating Admission Policies.
Validating admission policies use the Common Expression Language (CEL) to offer a declarative, in-process alternative to validating admission webhooks.
CEL was first introduced to Kubernetes for the Validation rules for CustomResourceDefinitions. This enhancement expands the use of CEL in Kubernetes to support a far wider range of admission use cases.
Admission webhooks can be burdensome to develop and operate. Webhook developers must implement and maintain a webhook binary to handle admission requests. Also, admission webhooks are complex to operate. Each webhook must be deployed, monitored and have a well defined upgrade and rollback plan. To make matters worse, if a webhook times out or becomes unavailable, the Kubernetes control plane can become unavailable.
This enhancement avoids much of this complexity of admission webhooks by embedding CEL expressions into Kubernetes resources instead of calling out to a remote webhook binary.
If you want to learn more about this feature and where it’s coming from, I encourage you to watch Joe Betz’s session at KubeCon NA 2022, I found it very insightful:
In this blog article, let’s see in actions how we could leverage this new Validating Admission Policies feature. Based on my knowledge with Gatekeeper policies, I will also try to add some comments about what could be the missing features based on my own experience.
Here is what will be accomplished throughout this blog article:
- Create a GKE cluster with the Validating Admission Policies alpha feature
- Create a simple policy with max of 3 replicas for any
Deployments
- Pass parameters to the policy
- Exclude namespaces from the policy
- Limitations, gaps and thoughts
- Conclusion
Note: while testing this feature by leveraging its associated blog and doc, it was also the opportunity for me to open my first PRs in the kubernetes/website
repo to fix some frictions I faced: https://github.com/kubernetes/website/pull/38893 and https://github.com/kubernetes/website/pull/38908.
Create a GKE cluster with the Validating Admission Policies alpha feature
GKE just got the version 1.26 available, we could check the versions available by running this command gcloud container get-server-config --zone us-central1-c
.
Let’s provision a cluster in alpha
mode (not for production) with the version 1.26:
gcloud container clusters create cel-admission-control-cluster \
--enable-kubernetes-alpha \
--no-enable-autorepair \
--no-enable-autoupgrade \
--release-channel rapid \
--cluster-version 1.26.0-gke.1500 \
--zone us-central1-c
Once the cluster provisioned, we can check that the Validating Admission Policies alpha feature is availabe with the two associated resources ValidatingAdmissionPolicy
and ValidatingAdmissionPolicyBinding
, kubectl api-resources | grep ValidatingAdmissionPolicy
:
validatingadmissionpolicies admissionregistration.k8s.io/v1alpha1 false ValidatingAdmissionPolicy
validatingadmissionpolicybindings admissionregistration.k8s.io/v1alpha1 false ValidatingAdmissionPolicyBinding
Before jumping in creating and testing the policies, let’s deploy a sample app in our cluster that we could leverage later in this blog:
kubectl create ns sample-app
kubectl create deployment sample-app \
--image=nginx \
--replicas 5 \
-n sample-app
Create a simple policy with max of 3 replicas for any Deployments
Let’s do it, let’s deploy our first policy!
This policy is composed by one ValidatingAdmissionPolicy
defining the validation with the CEL expression and one ValidatingAdmissionPolicyBinding
binding the policy to the appropriate resources in the cluster:
cat << EOF | kubectl apply -f -
apiVersion: admissionregistration.k8s.io/v1alpha1
kind: ValidatingAdmissionPolicy
metadata:
name: max-replicas-deployments
spec:
failurePolicy: Fail
matchConstraints:
resourceRules:
- apiGroups: ["apps"]
apiVersions: ["v1"]
operations: ["CREATE", "UPDATE"]
resources: ["deployments"]
validations:
- expression: "object.spec.replicas <= 3"
EOF
cat << EOF | kubectl apply -f -
apiVersion: admissionregistration.k8s.io/v1alpha1
kind: ValidatingAdmissionPolicyBinding
metadata:
name: max-replicas-deployments
spec:
policyName: max-replicas-deployments
EOF
Now, let’s try to deploy an app with 5 replicas:
kubectl create deployment nginx --image=nginx --replicas 5
We can see that our policy is enforced, great!
error: failed to create deployment: deployments.apps "nginx" is forbidden: ValidatingAdmissionPolicy 'max-replicas-deployments' with binding 'max-replicas-deployments' denied request: failed expression: object.spec.replicas <= 3
So that’s for new admission requests, but what about our existing app we previously deployed? Interestingly, there is nothing telling me that my existing resources are not compliant, I’m a bit disappointed here, I used to do kubectl get constraints
and see the violations raised by Gatekeeper. I think that’s a miss here, let’s see if in the future it will be supported. Nonetheless, kubectl rollout restart deployments sample-app -n sample-app
or kubectl scale deployment sample-app --replicas 6 -n sample-app
for example will fail, like expected.
Pass parameters to the policy
With the policy we just created we hard-coded the number of replicas we allow, but what if you want to have this more customizable? Here comes a really interesting feature where you can pass parameters!
The paramKind
field allows you to pass an existing CRD that you could create by yourself or you can easily leverage existing ones like ConfigMap
or Secrets
. Let’s update our policy with a ConfigMap
to achieve this:
cat << EOF | kubectl apply -f -
apiVersion: admissionregistration.k8s.io/v1alpha1
kind: ValidatingAdmissionPolicy
metadata:
name: max-replicas-deployments
spec:
failurePolicy: Fail
paramKind:
apiVersion: v1
kind: ConfigMap
matchConstraints:
resourceRules:
- apiGroups: ["apps"]
apiVersions: ["v1"]
operations: ["CREATE", "UPDATE"]
resources: ["deployments"]
validations:
- expression: "params != null"
message: "params missing but required to bind to this policy"
- expression: "has(params.data.maxReplicas)"
message: "params.data.maxReplicas missing but required to bind to this policy"
- expression: "object.spec.replicas <= int(params.data.maxReplicas)"
EOF
kubectl create ns policies-configs
cat << EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: max-replicas-deployments
namespace: policies-configs
data:
maxReplicas: "3"
EOF
cat << EOF | kubectl apply -f -
apiVersion: admissionregistration.k8s.io/v1alpha1
kind: ValidatingAdmissionPolicyBinding
metadata:
name: max-replicas-deployments
spec:
paramRef:
name: max-replicas-deployments
namespace: policies-configs
policyName: max-replicas-deployments
EOF
Note: because ConfigMap
has to reside in a namespace, we are also creating a dedicated policies-configs
namespace.
Now, let’s try to deploy an app with 5 replicas:
kubectl create deployment nginx --image=nginx --replicas 5
We can see that our policy is still enforced with a new message, great!
error: failed to create deployment: deployments.apps "nginx" is forbidden: ValidatingAdmissionPolicy 'max-replicas-deployments' with binding 'max-replicas-deployments' denied request: failed expression: object.spec.replicas <= int(params.data.maxReplicas)
Exclude namespaces from the policy
One of the features used with Gatekeeper policies is the ability to excludedNamespaces
with a Constraint
. Very helpful to avoid breaking clusters with policies on system namespaces.
Here, we will use a namespaceSelector
on our ValidatingAdmissionPolicyBinding
to exclude system namespaces as well as our own allow-listed
namespace:
cat << EOF | kubectl apply -f -
apiVersion: admissionregistration.k8s.io/v1alpha1
kind: ValidatingAdmissionPolicyBinding
metadata:
name: max-replicas-deployments
spec:
paramRef:
name: max-replicas-deployments
namespace: policies-configs
policyName: max-replicas-deployments
matchResources:
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: NotIn
values:
- kube-node-lease
- kube-public
- kube-system
- allow-listed
EOF
Note: in order to have this namespaceSelector
expression working, we are assuming that we are in a Kubernetes cluster version 1.22+ which automatically adds the kubernetes.io/metadata.name
label on any Namespaces
. Very convenient for our use case to exclude namespaces from a policy
Now, let’s try to deploy an app with 5 replicas in the default namespace:
kubectl create deployment nginx \
--image=nginx \
--replicas 5
We can see that our policy is still enforced, great!
error: failed to create deployment: deployments.apps "nginx" is forbidden: ValidatingAdmissionPolicy 'max-replicas-deployments' with binding 'max-replicas-deployments' denied request: failed expression: object.spec.replicas <= int(params.data.maxReplicas)
On the other hand, we should be able to deploy it in the allow-listed
namespace:
kubectl create ns allow-listed
kubectl create deployment nginx \
--image=nginx \
--replicas 5 \
-n allow-listed
Sweet!
Limitations, gaps and thoughts
Again, based on my Gatekeeper experience and the quick tests that I have done so far with this Validation Admission Policies feature, here are some limitations, gaps and thoughts that I’m seeing:
Failure policy Ignore
I don’t seem to understand yet what this failure policy Ignore
as opposed to Fail
does. I get the same behavior with both…
Just for admission
It’s just for admission, not for evaluating existing resources already in a cluster.
Client-side validation
Not able to evaluate the resources against policies outside of a cluster, like we can do with the gator
cli.
Inline parameters
Inline parameters in ValidatingAdmissionPolicyBinding
would be way more easier, today we need to create our own CRD or have resources like ConfigMap
, Secrets
, etc. which are scoped in a namespace.
Variables’s values in message
Evaluate values in validation.message
like "object.spec.replicas should be less than {int(params.data.maxReplicas)}"
Cluster-wide exempted namespaces
Repeating the namespaceSelector
expression for all the ValidatingAdmissionPolicyBinding
could generate more work and errors, exempting namespaces cluster-wide would be really great.
Mutating
Mutating doesn’t exist.
External data
Advanced scenario like leveraging cosign
with Gatekeeper’s External data feature doesn’t exist.
Referential constraints
Advanced scenario where I want a policy making sure that any Namespace
has a NetworkPolicy
or an AuthorizationPolicy
, based on referential constraints with Gatekeeper could be really helpful. It doesn’t seem to be supported today.
I will test it soon, but I’m thinking about passing the associated CRD (NetworkPolicy
and Authorization
) as paramKind
while evaluating Pods
creation/update (I thought about Namespace
but there is a chicken and eggs problem here). I will report back upading this blog article as soon as I made my tests. Stay tuned!
Workload resources
Policies on Pods
are important but could be tricky with the workload resources generating them, think about Deployments
, ReplicaSet
, Jobs
, Daemonsets
, etc. I haven’t tested it yet, I will report back upading this blog article as soon as I made my tests. Stay tuned!
Conclusion
We were able to create our own policy, pass a parameter to it and exclude some namespaces. Finally, some limitations and gaps with Gatekeeper policies were discussed.
This feature in alpha since Kubernetes 1.26 is really promising, very easy to leverage and already powerful. I really like the fact that it’s also out of the box in Kubernetes and that it’s a very light footprint in my cluster as opposed to have others CRDs/containers in my cluster like we have with Gatekeeper or Kyverno.
I think this image below taken from Joe Betz’s session at KubeCon NA 2022 is a good summary about the positioning of this feature versus the advanced scenarios covered by webhooks:
I’m really looking forward to seeing the next iterations on this feature as it will reach beta
and then stable
states in the future. I’m also curious to learn more about when to use it and maybe still using Gatekeeper if I need more advanced scenarios like illustrated in the previous Limitations, gaps and thoughts section. Finally, I’m curious also to learn more about how Gatekeeper, Kyverno, Styra, etc. will position their projects and products based on this feature upstream in Kubernetes.
Happy sailing, cheers!