Serving features in milliseconds with Feast feature store

February 1, 2022
Tsotne Tabidze, Oleksii Moskalenko, Danny Chiao

Feature stores are operational ML systems that serve data to models in production. The speed at which a feature store can serve features can have an impact on the performance of a model and user experience.

In this blog post, we show how fast Feast is at serving features in production and describe considerations for deploying Feast.

Updates

Apr 19: Updated DynamoDB benchmarks for Feast 0.20 given batch retrieval improvements

Background

One of the most common questions Feast users ask in our community Slack is: how scalable / performant is Feast? (spoiler alert: Feast is very fast, serving features at <1.5ms @p99 when using Redis in the below benchmarks)

In a survey conducted last year (results), we saw that most users were tackling challenging problems like recommender systems (e.g. recommending items to buy) and fraud detection, and had strict latency requirements:

Over 80% of survey respondents needed features to be read at less than 100ms (@p99). Taking into account that most users in this survey were supporting recommender systems, which often require ranking 100s-1000s of entities simultaneously, this becomes even more strict. Feature serving latency scales with batch size because of the need to query features for random entities and other sources of tail latency.

In this blog, we present results from a benchmark suite (RFC), describe the benchmark setup, and provide recommendations for how to deploy Feast to meet different operational goals.

Considerations when deploying Feast

There are a couple of decisions users need to make when deploying Feast to support online inference. There are two key decisions when it comes to performance:

How to deploy a feature server
Choice of online store

Each approach comes with different tradeoffs in terms of performance, scalability, flexibility, and ease of use. This post aims to help users decide between these approaches and enable users to easily set up their own benchmarks to see if Feast meets their own latency requirements.

How to deploy a feature server

While all users setup a Feast feature repo in the same way (using the Python SDK to define and materialize features), users retrieve features from Feast in a few different ways (see also Running Feast in Production):

Deploy a Java gRPC feature server (Beta)
Deploy a Python HTTP feature server
Deploy a serverless Python HTTP feature server on AWS Lambda
Use the Python client SDK to directly fetch features
(Advanced) Build a custom client (e.g in Go or Java) to directly read the registry and read from an online store

The first four above come for free with Feast, while the fifth requires custom work. All options communicate with the same Feast registry component (managed by feast apply) to understand where features are stored.

Deploying a feature server service (compared to using a Feast client that directly communicates with online stores) can enable many improvements such as better caching (e.g. across clients), improved data access management, rate limiting, centralized monitoring, supporting client libraries across multiple languages, etc. However, this comes at the cost of increased architectural complexity. Serverless architectures are on the other end of the spectrum, enabling simple deployments at the cost of latency overhead.

Choice of online stores

Feast is highly pluggable and extensible and supports serving features from a range of online stores (e.g. Amazon DynamoDB, Google Cloud Datastore, Redis, PostgreSQL). Many users build their own plugins to support their specific needs / online stores.

Building a Feature Store dives into some of the trade-offs between online stores. Easier to manage solutions like DynamoDB or Datastore often lose against Redis in terms of read performance and cost. Each store also has its own API idiosyncrasies that can impact performance. The Feast community is continuously optimizing store-specific performance.

Benchmark Setup

See https://github.com/feast-dev/feast-benchmarks for the exact benchmark code.

Machines

For Redis and AWS DynamoDB tests:
- Running in Docker on AWS EC2 instances (c5.4xlarge, 16 vCPU)
For GCP Datastore tests
- Running in Docker on GCP GCE instances (c2-standard-16, 16 vCPU)
These machines are co-located with the online stores (i.e. in the same region)

Data and query patterns

Feast’s feature retrieval primarily manages retrieving the latest values of a given feature for specified entities. In this benchmark, the online stores contain:

25 feature views (with 10 features per feature view) for a total of 250 features
1M entity rows

As described in RFC-031, we simulate different query patterns by additionally varying by number of entity rows in a request (i.e. batch size), requests per second, and the concurrency of the feature server. The goal here is to have numbers that apply to a diverse set of teams, regardless of their scale and typical query patterns. Users are welcome to extend the benchmark suite to better test their own setup.

Online store setup

Redis:
- Use of a single Redis server, locally run with Docker Compose on an EC2 instance.
- This should closely approximate usage of a separate Redis server in AWS. Typical network latency within the same availability zone in AWS is < 1-2 ms.
- In these benchmarks, we did not hit limits that required use of a Redis cluster. With higher batch sizes, the benchmark suite would likely only work with Redis clusters. Redis clusters should improve Feast’s performance.
AWS DynamoDB:
- No additional setup required, except to make sure that the benchmarks are running in the same region as where DynamoDB tables reside.
- Feast automatically creates on-demand DynamoDB tables on `feast apply`. Feast servers don’t use DAX for caching.
Google Cloud Datastore:
- Cloud Firestore in Datastore mode
- No additional setup required, except to make sure that the benchmarks are running in the same region as where Datastore service is deployed at.

Benchmark Results

The raw data exists at https://github.com/feast-dev/feast-benchmarks. We choose a subset of comparisons here to answer some of the most common questions we hear from the community.

Summary

The Java feature server is very fast (e.g. p99 latency is ~1.3 ms for a single row fetch of 250 features)
- Note: The Java feature server is in Beta and does not support new functionality such as the more scalable SQL registry.
For the same number of features and batch size, the Java feature server is about 5-7x faster than the Python feature server
- Despite this, there are still compelling reasons to use Python, depending on your situation (e.g. simplicity of deployment)
Feature server latency…
- scales linearly (moderate slope) with batch size
- scales linearly (low slope) with number of features
- does not substantially change as requests per seconds increase
Comparing online stores
- For single entity & batch feature retrieval, Redis >> AWS DynamoDB > GCP Datastore

Deciding between Java and Python feature servers

A key decision users must make is whether they want to use Java or Python for feature retrieval.

We recommend starting with the Python centric stack. This is battle tested and works at scale, but suffers from generally slower retrieval time. A Python centric stack can be easier to manage (especially since feature retrieval & data scientist workflows are in Python). Working with a Java feature server requires a slightly more complicated architecture.

To get a sense of how much faster the Java feature server is, we compare here using Redis as the online store. For the two most extreme tested scenarios, the Java feature server is fast, while the Python feature server still satisfies most user’s latency requirements:

Batch size = 1, num features = 250
- Java p99 latency is ~1.3 ms
- Python p99 latency is ~45 ms
Batch size = 100, num features = 50
- Java p99 latency is ~17ms
- Python p99 latency is ~125ms

Throughput wise, the Java feature server, when aiming for >99.9% successful requests for a 100ms timeout maxes out at ~ 5k QPS for single row fetches of 250 features (50 parallel clients).

Java vs Python: latency when varying by batch size

For this comparison, we check retrieval of 50 features across 5 feature views.

At p99, we see that Java performance is significantly better (~5-7x faster) at 22ms, though Python still meets many of our user’s latency requirements (125ms).

p99 retrieval times (ms), varying by batch size (num features = 50)

Batch size
	1	10	20	30	40	50	60	70	80	90	100
Python	7.23	15.14	23.96	32.80	41.44	50.43	59.88	94.57	103.28	111.93	124.87
Java	0.8	2.23	3.66	4.8	6.09	7.29	8.6	10.18	11.91	14.32	17

Java vs Python: latency when varying by number of requested features

The Java feature server scales a bit better than the Python feature server in terms of supporting a large number of features:

p99 retrieval times (ms), varying by number of requested features (batch size = 1)

Num features
	50	100	150	200	250
Python	8.42	10.28	13.36	16.69	45.41
Java	0.8	0.91	1.08	1.19	1.33

Deciding between online stores

Here we separate comparisons by users in AWS vs users in GCP:

For AWS, users today can choose between AWS DynamoDB and Redis (e.g. via Redis Enterprise Cloud, Elasticache or custom deployed Redis clusters).
For GCP, users today can choose between Cloud Datastore and Redis (e.g. via Redis Enterprise Cloud, Memorystore or custom deployed Redis clusters).
- Note: there are plans to support Bigtable as well, which should have significantly better latency characteristics, but is slightly harder to manage

GCP Cloud Datastore vs Redis

We see that Redis is typically around 10-20x faster than Datastore when working with batch size = 1.

However, Datastore becomes compelling at higher batch sizes. As batch size increases, the two become more comparable (~3.4x slower at batch size 100). Projecting based on the latency trends, at batch size = 1000, Datastore would be just 66% slower, with the gap narrowing as batch size increases further. Given that Datastore is very hands off, this is a compelling choice at higher batch sizes.

p99 retrieval times (ms), varying by num requested features (batch size = 1)

Num features
	50	100	150	200	250
Datastore	465.51	304.91	408.28	460.28	668.87
Redis	8.42	10.28	13.36	16.69	45.41

p99 retrieval times, varying by batch size (num features = 50)

Batch size
	1	10	20	30	40	50	60	70	80	90	100
Datastore	181.38	205.79	233.58	281.00	286.80	338.50	384.95	416.03	414.55	483.29	507.95
Redis	15.19	17.24	27.45	37.87	73.97	83.48	95.03	106.18	126.04	127.43	148.31

AWS DynamoDB vs Redis

Here we see that DynamoDB generally is slower than Redis at smaller batch sizes, but quickly becomes faster at batch size >= 30 (~50% faster at batch size = 100).

p99 retrieval times (ms), varying by num requested features (batch size = 1)

Num features
	50	100	150	200	250
DynamoDB	54.68	96.77	149.36	213.70	207.63
Redis	8.424	10.282	13.355	16.685	45.409

p99 retrieval times (ms), varying by batch size (num features = 50)

Batch size
	1	10	20	30	40	50	60	70	80	90	100
DynamoDB	129.40	99.11	122.20	139.93	163.02	220.73	227.54	250.99	287.48	313.74	353.78
Redis	15.19	17.24	27.45	37.87	73.97	83.48	95.03	106.18	126.04	127.43	148.31

Conclusion

From the benchmarks, we’ve seen that Feast serving performance varies widely:

By choice of the online store
By choice of Java vs Python
By number of features in the request
By batch size of the serving request

The Beta Feast Java feature server with Redis provides very low latency retrieval (p99 < 1.5ms for single row retrieval of 250 features), but at increased architectural complexity, less first class support for functionality (e.g. no SQL registry support), and more overhead in managing Redis clusters. Using a Python server with other managed online stores like DynamoDB or Datastore is easier to manage.

Note: there are managed services for Redis like Redis Enterprise Cloud which remove the additional complexity associated with managing Redis clusters and provide additional benefits.

What’s next

The community is always improving Feast performance, and we’ll post updates to performance improvements in the future. Future improvements in the works include:

Improved on demand transformation performance
Improved pooling of clients (e.g. we’ve seen that caching Google clients significantly improves response times and reduces memory consumption)

Join the conversation in our community Slack channel or our GitHub repo. Contributions and feedback are welcome!

Credits

Thank you to the many contributors that have helped bring Feast serving latency down (including @judahrand, @pyalex, @felixwang9817, @tsotnet, @nossrannug, @ptoman-pa, @vas28r13, and @DvirDukhan)!