Feast Launches Support for Vector Databases 🚀

  • July 25, 2024
  • Daniel Dowler, Francisco Javier Arceo

Feast and Vector Databases

With the rise of generative AI applications, the need to serve vectors has grown quickly. We are pleased to announce that Feast now supports (as an experimental feature in Alpha) embedding vector features for popular GenAI use-cases such as RAG (retrieval augmented generation). 

An important consideration is that GenAI applications using embedding vectors stand to benefit from a formal feature framework, just as traditional ML applications do. We are excited about adding support for embedding vector features because of the opportunity to improve GenAI backend operations. The integration of embedding vectors as features into Feast, allows GenAI developers to take advantage of MLOps best practices, lowering development time,improving quality of work, and sets the stage for Retrieval Augmented Fine Tuning.

Setting Up a Document Embedding Feature View

The feast-workshop repo example shows how Feast users can define feature views with vector database sources. They can easily convert text queries to embedding vectors, which are then matched against a vector database to retrieve closest vector records. All of this works seamlessly within the Feast toolset, so that vector features become a natural addition to the Feast feature store solution.

Defining a feature backed by a vector database is very similar to defining other types of features in Feast. Specifically, we can use the FeatureView class with an Array type field. 

from datetime import timedelta
from feast import FeatureView
from feast.types import Array, Float32
from feast.field import Field

city_embeddings_feature_view = FeatureView(
    name="city_embeddings",
    entities=[item],
    schema=[
        Field(name="Embeddings", dtype=Array(Float32)),
    ],
    source=source,
    ttl=timedelta(hours=2),
)

(Setting up the feature store is exactly the same as with traditional features, so we omit those steps here. For a more in-depth look, or to try out vector feature functionality for yourself, visit the repo page.)

Feast typically does retrieval based on some primary key (defined in the entities parameter argument). For vector similarity search, we have dropped the entities parameter and use the online-stores native similarity search functionality using different metrics (e.g., cosine or euclidean distance) to find the k-nearest neighbors. The code below shows that we can easily convert text into an embedding, which can then be fed to Feast’s retrieve_online_documents method. The method returns matched document embedding vectors.

# Load embedding model and embed end-user query text
from batch_score_documents import run_model, TOKENIZER, MODEL
from transformers import AutoTokenizer, AutoModel

question = "the most populous city in the U.S. state of Texas?"

tokenizer = AutoTokenizer.from_pretrained(TOKENIZER)
model = AutoModel.from_pretrained(MODEL)
query_embedding = run_model(question, tokenizer, model)
query = query_embedding.detach().cpu().numpy().tolist()[0]

# Use Feast to match the end-users query to database vectors
from feast import FeatureStore
store = FeatureStore(repo_path=".")
features = store.retrieve_online_documents(
    feature="city_embeddings:Embeddings",
    query=query,
    top_k=5
).to_dict()

def print_online_features(features):
    for key, value in sorted(features.items()):
        print(key, " : ", value)

print_online_features(features)

Supported Vector Databases

The Feast development team has conducted preliminary testing with the following vector stores:

  • SQLite
  • Postgres with the PGVector extension
  • Elasticsearch

There are many more vector store solutions available, and we are excited about discovering how Feast may work with them to support vector feature use-cases. We welcome community contributions in this area–if you have any thoughts feel free to join the conversation on GitHub

Final Thoughts

Feast brings formal feature operations support to AI/ML teams, enabling them to produce models faster and at higher levels of quality. The need for feature store support naturally extends to vector embeddings as features from vector databases (i.e., online stores). 

Vector storage and retrieval is an active space with lots of development and solutions. We are excited by where the space is moving, and look forward to Feast’s role in operationalizing embedding vectors as first class features in the MLOps ecosystem.

If you are new to feature stores and MLOps, this is a great time to give Feast a try. Check out Feast documentation and the Feast GitHub page for more on getting started. 

Big thanks to Hao Xu and Francisco Javier Arceo for contributing and shipping this new Feature. We can’t wait to hear about your usage!