
Why Feature Stores Make Sense for GenAI and RAG
Feature stores have been developed over the past decade to address the challenges AI practitioners face in managing, serving, and scaling machine learning models in production.
Some of the key challenges include:
- Accessing the right raw data
- Building features from raw data
- Combining features into training data
- Calculating and serving features in production
- Monitoring features in production
And Feast was specifically designed to address these challenges.
These same challenges extend naturally to Generative AI (GenAI) applications, with the exception of model training. In GenAI applications, the foundation model is typically pre-trained and the focus is on fine-tuning or using the model simply as an endpoint from some provider (e.g., OpenAI, Anthropic, etc.).
For GenAI use cases, feature stores enable the efficient management of context and metadata, both during training/fine-tuning and at inference time.
By using a feature store for your application, you have the ability to treat the LLM context, including the prompt, as features. This means you can manage not only input context, document processing, data formatting, tokenization, chunking, and embeddings, but also track and version the context used during model inference, ensuring consistency, transparency, and reproducibility across models and iterations.
With Feast, ML engineers can streamline the embedding generation process, ensure consistency across both offline and online environments, and track the lineage of data and transformations. By leveraging a feature store, GenAI applications benefit from enhanced scalability, maintainability, and reproducibility, making them ideal for complex AI applications and enterprise needs.
Feast Now Supports RAG
With the rise of generative AI applications, the need to serve vectors has grown quickly. Feast now has alpha support for vector similarity search to power retrieval augmented generation (RAG) systems in production.

This allows ML Engineers and Data Scientists to use the power of their feature store to easily deploy GenAI applications using RAG to production. More importantly, Feast offers the flexibility to customize and scale your production RAG applications through our scalable transformation systems (streaming, request-time, and batch).
Retrieval Augmented Generation (RAG)
RAG is a technique that combines generative models (e.g., LLMs) with retrieval systems to generate contextually relevant output for a particular goal (e.g., question and answering).
The typical RAG process involves:
- Sourcing text data relevant for your application
- Transforming each text document into smaller chunks of text
- Transforming those chunks of text into embeddings
- Inserting those chunks of text along with some identifier for the chunk and document in some database
- Retrieving those chunks of text along with the identifiers at run-time to inject that text into the LLM’s context
- Calling some API to run inference with your LLM to generate contextually relevant output
- Returning the output to some end user
Implicit in (1)-(4) is the potential of scaling to large amounts of data (i.e., using some form of distributed computing), orchestrating that scaling through some batch or streaming pipeline, and customization of key transformation decisions (e.g., tokenization, model, chunking, data formatting, etc.).
Powering Retrieval in Production
To power the Retrieval step of RAG in production, we need to handle data ingestion, data transformation, indexing, and serving web requests from an API.
Building high availability software that can handle these requirements and scale as your data scales is a non-trivial task. This is a strength of Feast, using the power of Kubernetes, large scale data frameworks like Spark and Flink, and the ability to ingest and transform data in real-time through the Feast Feature Server is a powerful combination.
Beyond Vector Similarity Search
RAG patterns often use vector similarity search for the retrieval step, but this is not the only retrieval pattern that can be useful. In fact, standard entity-based retrieval can be very powerful for applications where relevant user-context is necessary.
For example, many RAG applications are customer Chat Bots and they benefit significantly from user data (e.g., account balance, location, etc.) to generate contextually relevant output. Feast can help you manage this user data using its existing entity based retrieval patterns.
The Benefits of Feast
Fine-tuning is the holy grail to optimize your RAG systems, and by logging the documents/data and context retrieved and during inference, you can ensure that you can fine-tune both the generator and the retriever your LLMs for your particular needs.
This means that Feast can help you not only serve your documents, user data, and other metadata for production RAG applications, but it can also help you scale your embeddings on large amounts of data (e.g,. using Spark to embed gigabytes of documents), re-use the same code online and offline, track changes to your transformations, data sources, and RAG-sources to provide you with replayability and data lineage, and prepare your datasets so you can fine-tune your embedding, retrieval, or generator models later.
Historically, Feast catered to Data Scientists and ML Engineers who implemented their own types of data/feature transformations but, now, many RAG providers handle this out of the box for you. We will invest in creating extendable implementations to make it easier to ship your applications.
Feast Powered by Milvus
Milvus is a high performance open source vector database that provides a powerful and efficient way to store and retrieve embeddings. By using Feast with Milvus, you can easily deploy RAG applications to production and scale your retrieval systems on Kubernetes using the Feast Operator or the Feature Server Helm Chart.
This tutorial will walk you through building a basic RAG application with Milvus and Feast; i.e., ingesting embedded documents in Milvus and retrieving the most similar documents for a given query embedding.
This example consists of 5 steps:
- Configuring Milvus
- Defining your Data Sources and Views
- Updating your Registry
- Ingesting the Data
- Retrieving the Data
The full demo is available on our GitHub repository.
Step 1: Configure Milvus
Configure milvus in a simple yaml
file.
project: rag
provider: local
registry: data/registry.db
online_store:
type: milvus
path: data/online_store.db
vector_enabled: true
embedding_dim: 384
index_type: "IVF_FLAT"
offline_store:
type: file
entity_key_serialization_version: 3
# By default, no_auth for authentication and authorization, other possible values kubernetes and oidc. Refer the documentation for more details.
auth:
type: no_auth
Step 2: Define your Data Sources and Views
You define your data declaratively using Feast’s FeatureView
and Entity
objects, which are meant to be an easy way
to give your software engineers and data scientists a common language to define data they want to ship to production.
Here is an example of how you might define a FeatureView
for a document retrieval. Notice how we define the vector
field and enable vector search by setting vector_index=True
and the distance metric to COSINE
.
That’s it, the rest of the implementation is already handled for you by Feast and Milvus.
document = Entity(
name="document_id",
description="Document ID",
value_type=ValueType.INT64,
)
source = FileSource(
file_format=ParquetFormat(),
path="./data/my_data.parquet",
timestamp_field="event_timestamp",
)
# Define the view for retrieval
city_embeddings_feature_view = FeatureView(
name="city_embeddings",
entities=[document],
schema=[
Field(
name="vector",
dtype=Array(Float32),
vector_index=True, # Vector search enabled
vector_search_metric="COSINE", # Distance metric configured
),
Field(name="state", dtype=String),
Field(name="sentence_chunks", dtype=String),
Field(name="wiki_summary", dtype=String),
],
source=source,
ttl=timedelta(hours=2),
)
Step 3: Update your Registry
After we have defined our code we use the feast apply
syntax in the same folder as the feature_store.yaml
file and
update the registry with our metadata.
feast apply
Step 4: Ingest your Data
Now that we have defined our metadata, we can ingest our data into Milvus using the following code:
store.write_to_online_store(feature_view_name='city_embeddings', df=df)
Step 5: Retrieve your Data
Now that the data is actually stored in Milvus, we can easily query it using the SDK (and corresponding REST API) to retrieve the most similar documents for a given query embedding.
context_data = store.retrieve_online_documents_v2(
features=[
"city_embeddings:vector",
"city_embeddings:document_id",
"city_embeddings:state",
"city_embeddings:sentence_chunks",
"city_embeddings:wiki_summary",
],
query=query_embedding,
top_k=3,
distance_metric='COSINE',
).to_df()
The Benefits from using Feast for RAG
We’ve discussed some of the high-level benefits from using Feast for a RAG application. More specifically, here are some of the concrete benefits you can expect from using Feast for RAG:
- Real-time, Stream, and Batch data Ingestion support to the Feature Server for online retrieval
- Data dictionary/metadata catalog autogenerated from code
- UI exposing the metadata catalog
- FastAPI Server to serve your data
- Role Based Access Control (RBAC) to govern access to your data
- Deployment on Kubernetes using our Helm Chart or our Operator
- Multiple vector database providers
- Multiple data warehouse providers
- Support for different data sources
- Support for stream and batch processors (e.g., Spark and Flink)
And more!
The Future of Feast and GenAI
Feast will continue to invest in GenAI use cases.
In particular, we will invest in (1) NLP as a first-class citizen, (2) support for images, (3) support for transforming unstructured data (e.g., PDFs), (4) an enhanced GenAI focused feature server to allow our end-users to more easily ship RAG to production, (4) an out of the box chat UI meant for internal development and fast iteration, and (5) making Milvus a fully supported and core online store for RAG.
Join the Conversation
Are you interested in learning more about how Feast can help you build and deploy RAG applications to production? Reach out to us on Slack or GitHub, we’d love to hear from you!