FEATURE SERVER HIGH-AVAILABILITY AND AUTO-SCALING ON KUBERNETES

The Feast Operator now supports horizontal scaling with static replicas, HPA autoscaling, KEDA, and high-availability features including PodDisruptionBudgets and topology spread constraints.

By Nikhil Kathole, Antonin Stefanutti

Feature Server High-Availability and Auto-Scaling on Kubernetes

As ML systems move from experimentation to production, the feature server often becomes a critical bottleneck. A single-replica deployment might handle development traffic, but production workloads — real-time inference, batch scoring, multiple consuming services — demand the ability to scale horizontally.

We’re excited to announce that the Feast Operator now supports horizontal scaling for the FeatureStore deployment, giving teams the tools to run Feast at production scale on Kubernetes.

The Problem: Single-Replica Limitations

By default, the Feast Operator deploys a single-replica Deployment. This works well for getting started, but presents challenges as traffic grows:

  • Single point of failure — one pod crash means downtime for all feature consumers
  • Throughput ceiling — a single pod can only handle so many concurrent requests
  • No elasticity — traffic spikes (model retraining, batch inference) can overwhelm the server
  • Rolling updates cause downtime — the default Recreate strategy tears down the old pod before starting a new one

Teams have been manually patching Deployments or creating external HPAs, but this bypasses the operator’s reconciliation loop and can lead to configuration drift.

The Solution: Native Scaling Support

The Feast Operator now supports three scaling modes. The FeatureStore CRD implements the Kubernetes scale sub-resource, which means you can also scale with kubectl scale featurestore/my-feast --replicas=3.

1. Static Replicas

The simplest approach — set a fixed number of replicas via spec.replicas:

apiVersion: feast.dev/v1
kind: FeatureStore
metadata:
  name: production-feast
spec:
  feastProject: my_project
  replicas: 3
  services:
    onlineStore:
      persistence:
        store:
          type: postgres
          secretRef:
            name: feast-data-stores
    registry:
      local:
        persistence:
          store:
            type: sql
            secretRef:
              name: feast-data-stores

This gives you high availability and load distribution with a predictable resource footprint. The operator automatically switches the Deployment strategy to RollingUpdate, ensuring zero-downtime deployments.

2. HPA Autoscaling

For workloads with variable traffic patterns, the operator can create and manage a HorizontalPodAutoscaler directly. HPA autoscaling is configured under services.scaling.autoscaling and is mutually exclusive with spec.replicas > 1:

apiVersion: feast.dev/v1
kind: FeatureStore
metadata:
  name: autoscaled-feast
spec:
  feastProject: my_project
  services:
    scaling:
      autoscaling:
        minReplicas: 2
        maxReplicas: 10
        metrics:
        - type: Resource
          resource:
            name: cpu
            target:
              type: Utilization
              averageUtilization: 70
    podDisruptionBudgets:
      maxUnavailable: 1
    onlineStore:
      persistence:
        store:
          type: postgres
          secretRef:
            name: feast-data-stores
      server:
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
          limits:
            cpu: "1"
            memory: 1Gi
    registry:
      local:
        persistence:
          store:
            type: sql
            secretRef:
              name: feast-data-stores

The operator creates the HPA as an owned resource — it’s automatically cleaned up if you remove the autoscaling configuration or delete the FeatureStore CR. If no custom metrics are specified, the operator defaults to 80% CPU utilization. The operator also auto-injects soft pod anti-affinity (node-level) and topology spread constraints (zone-level) to improve resilience — see the High Availability section for details.

3. External Autoscalers (KEDA, Custom HPAs)

For teams using KEDA or other external autoscalers, KEDA should target the FeatureStore’s scale sub-resource directly (since it implements the Kubernetes scale API). This is the recommended approach because the operator manages the Deployment’s replica count from spec.replicas — targeting the Deployment directly would conflict with the operator’s reconciliation.

When using KEDA, do not set spec.replicas > 1 or services.scaling.autoscaling — KEDA manages the replica count through the scale sub-resource. Configure the FeatureStore with DB-backed persistence, then create a KEDA ScaledObject targeting the FeatureStore resource:

apiVersion: feast.dev/v1
kind: FeatureStore
metadata:
  name: keda-feast
spec:
  feastProject: my_project
  services:
    onlineStore:
      persistence:
        store:
          type: postgres
          secretRef:
            name: feast-data-stores
    registry:
      local:
        persistence:
          store:
            type: sql
            secretRef:
              name: feast-data-stores
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: feast-scaledobject
spec:
  scaleTargetRef:
    apiVersion: feast.dev/v1
    kind: FeatureStore
    name: keda-feast
  minReplicaCount: 1
  maxReplicaCount: 10
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus.monitoring.svc:9090
      metricName: http_requests_total
      query: sum(rate(http_requests_total{service="feast"}[2m]))
      threshold: "100"

When KEDA scales up spec.replicas via the scale sub-resource, the CRD’s CEL validation rules automatically ensure DB-backed persistence is configured. The operator also automatically switches the deployment strategy to RollingUpdate when replicas > 1. This gives you the full power of KEDA’s 50+ event-driven triggers with built-in safety checks.

High Availability

Scaling to multiple replicas is only half the story — you also need to ensure pods are spread across failure domains and protected during disruptions. The operator includes two HA features that activate when scaling is enabled:

Pod Anti-Affinity

When scaling is enabled, the operator automatically injects a soft pod anti-affinity rule that prefers spreading pods across different nodes:

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        topologyKey: kubernetes.io/hostname
        labelSelector:
          matchLabels:
            feast.dev/name: my-feast

This means the scheduler will try to place each replica on a separate node, but won’t prevent scheduling if nodes are constrained. You can override this with your own affinity configuration in the CR, or set it to an explicit value to customize the behavior (e.g. requiredDuringSchedulingIgnoredDuringExecution for strict anti-affinity).

Topology Spread Constraints

When replicas > 1 or autoscaling is configured, the operator automatically injects a soft zone-spread constraint:

topologySpreadConstraints:
- maxSkew: 1
  topologyKey: topology.kubernetes.io/zone
  whenUnsatisfiable: ScheduleAnyway
  labelSelector:
    matchLabels:
      feast.dev/name: my-feast

This distributes pods across availability zones on a best-effort basis. If your cluster has 3 zones and 3 replicas, each zone gets one pod. If zones are unavailable, pods still get scheduled rather than staying pending.

You can override this with explicit constraints (e.g. strict DoNotSchedule) or disable it entirely by setting topologySpreadConstraints: [].

PodDisruptionBudgets

For protection during voluntary disruptions (node drains, cluster upgrades), you can configure a PDB:

spec:
  replicas: 3
  services:
    podDisruptionBudgets:
      maxUnavailable: 1
    onlineStore:
      # ...

The PDB requires explicit configuration — it’s not auto-injected because a misconfigured PDB can block node drains. The operator enforces that exactly one of minAvailable or maxUnavailable is set via CEL validation. The PDB is only created when scaling is enabled and is automatically cleaned up when scaling is disabled.

Safety First: Persistence Validation

Not all persistence backends are safe for multi-replica deployments. File-based stores like SQLite, DuckDB, and local registry.db use single-writer file locks that don’t work across pods.

The operator enforces this at admission time via CEL validation rules on the CRD — if you try to create or update a FeatureStore with scaling and file-based persistence, the API server rejects the request immediately:

Scaling requires DB-backed persistence for the online store.
Configure services.onlineStore.persistence.store when using replicas > 1 or autoscaling.

This validation applies to all enabled services (online store, offline store, and registry) and is enforced for both direct CR updates and kubectl scale commands via the scale sub-resource. Object-store-backed registry paths (s3:// and gs://) are treated as safe since they support concurrent readers.

Persistence TypeCompatible with Scaling?
PostgreSQL / MySQLYes
RedisYes
CassandraYes
SQL-based RegistryYes
S3/GCS RegistryYes
SQLiteNo
DuckDBNo
Local registry.dbNo

How It Works Under the Hood

The implementation adds three key behaviors to the operator’s reconciliation loop:

1. Replica management — The operator sets the Deployment’s replica count from spec.replicas (which defaults to 1). When HPA is configured, the operator leaves the replicas field unset so the HPA controller can manage it. External autoscalers like KEDA can update the replica count through the FeatureStore’s scale sub-resource, which updates spec.replicas and triggers the operator to reconcile.

2. Deployment strategy — The operator automatically switches from Recreate (the default for single-replica) to RollingUpdate when scaling is enabled. This prevents the “kill-all-pods-then-start-new-ones” behavior that would cause downtime during scaling events. Users can always override this with an explicit deploymentStrategy in the CR.

3. HPA lifecycle — The operator creates, updates, and deletes the HPA as an owned resource tied to the FeatureStore CR. Removing the autoscaling configuration automatically cleans up the HPA.

4. HA features — The operator auto-injects soft topology spread constraints across zones when scaling is enabled, and manages PodDisruptionBudgets as owned resources when explicitly configured.

The scaling status is reported back on the FeatureStore status:

status:
  scalingStatus:
    currentReplicas: 3
    desiredReplicas: 3

What About TLS, CronJobs, and Services?

Scaling is designed to work seamlessly with existing operator features:

  • TLS — Each pod mounts the same TLS secret. OpenShift service-serving certificates work automatically since they’re bound to the Service, not individual pods.
  • Kubernetes Services — The Service’s label selector already matches all pods in the Deployment, so load balancing across replicas works out of the box.
  • CronJobs — The feast apply and feast materialize-incremental CronJobs use kubectl exec into a single pod. Since DB-backed persistence is required for scaling, all pods share the same state — it doesn’t matter which pod the CronJob runs against.

Getting Started

1. Ensure DB-backed persistence for all enabled services (online store, offline store, registry).

2. Configure scaling in your FeatureStore CR — use either static replicas or HPA (mutually exclusive). Optionally add a PDB for disruption protection:

spec:
  replicas: 3            # static replicas (top-level)
  services:
    podDisruptionBudgets:                 # optional: protect against disruptions
      maxUnavailable: 1
  # -- OR --
  # services:
  #   scaling:
  #     autoscaling:      # HPA
  #       minReplicas: 2
  #       maxReplicas: 10
  #   podDisruptionBudgets:
  #     maxUnavailable: 1

3. Apply the updated CR:

kubectl apply -f my-featurestore.yaml

4. Verify the scaling:

# Check pods
kubectl get pods -l app.kubernetes.io/managed-by=feast

# Check HPA (if using autoscaling)
kubectl get hpa

# Check FeatureStore status
kubectl get feast -o yaml

Learn More

We’re excited to see teams scale their feature serving infrastructure with confidence. Try it out and let us know how it works for your use case!