Retrieval Augmented Generation with Feast

Why Feature Stores Make Sense for GenAI and RAG

Feature stores have been developed over the past decade to address the challenges AI practitioners face in managing, serving, and scaling machine learning models in production.

Some of the key challenges include:

Accessing the right raw data
Building features from raw data
Combining features into training data
Calculating and serving features in production
Monitoring features in production

And Feast was specifically designed to address these challenges.

These same challenges extend naturally to Generative AI (GenAI) applications, with the exception of model training. In GenAI applications, the foundation model is typically pre-trained and the focus is on fine-tuning or using the model simply as an endpoint from some provider (e.g., OpenAI, Anthropic, etc.).

For GenAI use cases, feature stores enable the efficient management of context and metadata, both during training/fine-tuning and at inference time.

By using a feature store for your application, you have the ability to treat the LLM context, including the prompt, as features. This means you can manage not only input context, document processing, data formatting, tokenization, chunking, and embeddings, but also track and version the context used during model inference, ensuring consistency, transparency, and reproducibility across models and iterations.

With Feast, ML engineers can streamline the embedding generation process, ensure consistency across both offline and online environments, and track the lineage of data and transformations. By leveraging a feature store, GenAI applications benefit from enhanced scalability, maintainability, and reproducibility, making them ideal for complex AI applications and enterprise needs.

Feast Now Supports RAG

With the rise of generative AI applications, the need to serve vectors has grown quickly. Feast now has alpha support for vector similarity search to power retrieval augmented generation (RAG) systems in production.

Retrieval Augmented Generation with Milvus and Feast

This allows ML Engineers and Data Scientists to use the power of their feature store to easily deploy GenAI applications using RAG to production. More importantly, Feast offers the flexibility to customize and scale your production RAG applications through our scalable transformation systems (streaming, request-time, and batch).

Retrieval Augmented Generation (RAG)

RAG is a technique that combines generative models (e.g., LLMs) with retrieval systems to generate contextually relevant output for a particular goal (e.g., question and answering).

The typical RAG process involves:

Sourcing text data relevant for your application
Transforming each text document into smaller chunks of text
Transforming those chunks of text into embeddings
Inserting those chunks of text along with some identifier for the chunk and document in some database
Retrieving those chunks of text along with the identifiers at run-time to inject that text into the LLM’s context
Calling some API to run inference with your LLM to generate contextually relevant output
Returning the output to some end user

Implicit in (1)-(4) is the potential of scaling to large amounts of data (i.e., using some form of distributed computing), orchestrating that scaling through some batch or streaming pipeline, and customization of key transformation decisions (e.g., tokenization, model, chunking, data formatting, etc.).

Powering Retrieval in Production

To power the Retrieval step of RAG in production, we need to handle data ingestion, data transformation, indexing, and serving web requests from an API.

Building high availability software that can handle these requirements and scale as your data scales is a non-trivial task. This is a strength of Feast, using the power of Kubernetes, large scale data frameworks like Spark and Flink, and the ability to ingest and transform data in real-time through the Feast Feature Server is a powerful combination.

Beyond Vector Similarity Search

RAG patterns often use vector similarity search for the retrieval step, but this is not the only retrieval pattern that can be useful. In fact, standard entity-based retrieval can be very powerful for applications where relevant user-context is necessary.

For example, many RAG applications are customer Chat Bots and they benefit significantly from user data (e.g., account balance, location, etc.) to generate contextually relevant output. Feast can help you manage this user data using its existing entity based retrieval patterns.

The Benefits of Feast

Fine-tuning is the holy grail to optimize your RAG systems, and by logging the documents/data and context retrieved and during inference, you can ensure that you can fine-tune both the generator and the retriever your LLMs for your particular needs.

This means that Feast can help you not only serve your documents, user data, and other metadata for production RAG applications, but it can also help you scale your embeddings on large amounts of data (e.g,. using Spark to embed gigabytes of documents), re-use the same code online and offline, track changes to your transformations, data sources, and RAG-sources to provide you with replayability and data lineage, and prepare your datasets so you can fine-tune your embedding, retrieval, or generator models later.

Historically, Feast catered to Data Scientists and ML Engineers who implemented their own types of data/feature transformations but, now, many RAG providers handle this out of the box for you. We will invest in creating extendable implementations to make it easier to ship your applications.

Feast Powered by Milvus

Milvus is a high performance open source vector database that provides a powerful and efficient way to store and retrieve embeddings. By using Feast with Milvus, you can easily deploy RAG applications to production and scale your retrieval systems on Kubernetes using the Feast Operator or the Feature Server Helm Chart.

This tutorial will walk you through building a basic RAG application with Milvus and Feast; i.e., ingesting embedded documents in Milvus and retrieving the most similar documents for a given query embedding.

This example consists of 5 steps:

Configuring Milvus
Defining your Data Sources and Views
Updating your Registry
Ingesting the Data
Retrieving the Data

The full demo is available on our GitHub repository.

Step 1: Configure Milvus

Configure milvus in a simple yaml file.

project: rag
provider: local
registry: data/registry.db
online_store:
  type: milvus
  path: data/online_store.db
  vector_enabled: true
  embedding_dim: 384
  index_type: "IVF_FLAT"

offline_store:
  type: file
entity_key_serialization_version: 3
# By default, no_auth for authentication and authorization, other possible values kubernetes and oidc. Refer the documentation for more details.
auth:
    type: no_auth

Step 2: Define your Data Sources and Views

You define your data declaratively using Feast’s FeatureView and Entity objects, which are meant to be an easy way to give your software engineers and data scientists a common language to define data they want to ship to production.

Here is an example of how you might define a FeatureView for a document retrieval. Notice how we define the vector field and enable vector search by setting vector_index=True and the distance metric to COSINE.

That’s it, the rest of the implementation is already handled for you by Feast and Milvus.

document = Entity(
    name="document_id",
    description="Document ID",
    value_type=ValueType.INT64,
)

source = FileSource(
    file_format=ParquetFormat(),
    path="./data/my_data.parquet",
    timestamp_field="event_timestamp",
)

# Define the view for retrieval
city_embeddings_feature_view = FeatureView(
    name="city_embeddings",
    entities=[document],
    schema=[
        Field(
            name="vector",
            dtype=Array(Float32),
            vector_index=True,                # Vector search enabled
            vector_search_metric="COSINE",    # Distance metric configured
        ),
        Field(name="state", dtype=String),
        Field(name="sentence_chunks", dtype=String),
        Field(name="wiki_summary", dtype=String),
    ],
    source=source,
    ttl=timedelta(hours=2),
)

Step 3: Update your Registry

After we have defined our code we use the feast apply syntax in the same folder as the feature_store.yaml file and update the registry with our metadata.

feast apply

Step 4: Ingest your Data

Now that we have defined our metadata, we can ingest our data into Milvus using the following code:

store.write_to_online_store(feature_view_name='city_embeddings', df=df)

Step 5: Retrieve your Data

Now that the data is actually stored in Milvus, we can easily query it using the SDK (and corresponding REST API) to retrieve the most similar documents for a given query embedding.

context_data = store.retrieve_online_documents_v2(
    features=[
        "city_embeddings:vector",
        "city_embeddings:document_id",
        "city_embeddings:state",
        "city_embeddings:sentence_chunks",
        "city_embeddings:wiki_summary",
    ],
    query=query_embedding,
    top_k=3,
    distance_metric='COSINE',
).to_df()

The Benefits from using Feast for RAG

We’ve discussed some of the high-level benefits from using Feast for a RAG application. More specifically, here are some of the concrete benefits you can expect from using Feast for RAG:

Real-time, Stream, and Batch data Ingestion support to the Feature Server for online retrieval
Data dictionary/metadata catalog autogenerated from code
UI exposing the metadata catalog
FastAPI Server to serve your data
Role Based Access Control (RBAC) to govern access to your data
Deployment on Kubernetes using our Helm Chart or our Operator
Multiple vector database providers
Multiple data warehouse providers
Support for different data sources
Support for stream and batch processors (e.g., Spark and Flink)

And more!

The Future of Feast and GenAI

Feast will continue to invest in GenAI use cases.

In particular, we will invest in (1) NLP as a first-class citizen, (2) support for images, (3) support for transforming unstructured data (e.g., PDFs), (4) an enhanced GenAI focused feature server to allow our end-users to more easily ship RAG to production, (4) an out of the box chat UI meant for internal development and fast iteration, and (5) making Milvus a fully supported and core online store for RAG.

Join the Conversation

Are you interested in learning more about how Feast can help you build and deploy RAG applications to production? Reach out to us on Slack or GitHub, we’d love to hear from you!