Feast 0.12 adds AWS Redshift and DynamoDB stores

  • August 11, 2021
  • Jules S. Damji, Tsotne Tabidze, and Achal Shah

We are delighted to announce Feast 0.12 is released! With this release, Feast users can take advantage of AWS technologies such as Redshift and DynamoDB as feature store backends to power their machine learning models. We want to share three key additions that extend Feast’s ecosystem and facilitate a convenient way to group features via a Feature Service for serving:

  1. Adding AWS Redshift, a cloud data warehouse, as an offline store, which supports features serving for training and batch inference at high throughput 
  2. Adding DynamoDB, a NoSQL Key-value database, as an online store, which supports feature serving at low latency for quick prediction
  3. Providing a logical grouping of features your model needs as a Feature Service

Let’s briefly take a peek at each and how easily you can use them through simple declarative APIs and configuration changes.

AWS Redshift as a feature store data source and an offline store 

Redshift data source allows you to fetch historical feature values from Redshift for building training datasets and materializing features into an online store (see below how to materialize). A data source is defined as part of the Feast Declarative API in the feature repo directory’s Python files. For example, aws_datasource.py defines a table from which we want to fetch features.

‌from feast import RedshiftSource

my_redshift_source = RedshiftSource(table="redshift_driver_table")

To enable Feast to recognize your Redshift data source, modify your featore_store.yaml:

project: driver_features
registry: data/registry.db
provider: aws
offline_store:
    type: redshift
    region: us-west-2
    cluster_id: feast-cluster
    database: feast-database
    user: redshift-user
    s3_staging_location: s3://feast-bucket/redshift
    iam_role: arn:aws:iam::123456789012:role/redshift_s3_access_role

Building a training dataset

After executing the feast apply, Feast can build a training dataset from existing feature data in Redshift. To access your historical features to train a model, use the same APIs as before, with the appropriate AWS RepoConfig configurations.

# Connect to the feature registry
fs = FeatureStore(
   RepoConfig(
       registry="s3://feast-bucket/redshift",
       project="driver_features"
   )
)
# Load our driver event stable. This dataframe will be enriched 
# with features from RedShift

driver_events = pd.read_csv("driver_events.csv")

# Build a training dataset from features in Redshift
training_df = fs.get_historical_features(
   features=[
       "driver_hourly_stats:conv_rate",
       "driver_hourly_stats:acc_rate"
   ],
   entity_df=driver_events
).to_df()


AWS DynamoDB as an online store

To allow teams to scale up and support high volumes of online transactions requests for machine learning (ML) predictions, Feast now supports a scalable DynamoDB to serve fresh features to your model in production in the AWS cloud. To enable DynamoDB as your online store, just change featore_store.yaml:  

project: fraud_detection
registry: data/registry.db
provider: aws
online_store:
  type: dynamodb
  region: us-west-2
...

To materialize your features into your DynamoDB online store, simply issue the command:

$ feast materialize
Materializing 3 feature views to 2021-06-15 18:43:03+00:00 into the DynamoDB online store.

user_account_features from 2020-06-16 18:43:04 to 2021-08-15 18:43:13:
100%|███████████████████████| 9944/9944 [00:05<00:00, 21470.13it/s]
user_has_fraudulent_transactions from 2020-06-16 18:43:13 to 2021-08-15 18:43:03:
100%|███████████████████████| 9944/9944 [00:04<00:00, 20065.15it/s]
user_transaction_count_7d from 2021-06-08 18:43:21 to 2021-08-15 18:43:03:
100%|███████████████████████| 9674/9674 [00:04<00:00, 19943.82it/s]

To read more about how to configure permissions for DynamoDB, check the documentation.

Fetching a feature vector at low latency

With our online store DynamoDB loaded with fresh features, after executing feast apply, we can easily access a feature vector with low latency for quick model prediction.

# Connect to the feature store
fs = feast.FeatureStore(RepoConfig(registry="s3://feast-bucket/redshift",
project="driver_features")
)

# Query DynamoDB for online feature values
online_features = fs.get_online_features(
   features=[
       "driver_hourly_stats:conv_rate",
       "driver_hourly_stats:acc_rate"
   ],
   entity_rows=[{"driver_id": 1001}]).to_dict()

# Make a prediction
model.predict(online_features)

Grouping features with Feature Service

Not to be confused with an actual service being deployed, a Feature Service is a convenient way to group logical features together as used in a particular model, making it easier for Feast to know which set of features are needed to serve a model. During training, you can record those features in model management tools like SageMaker or MLflow.

Here is an example of a Feature View defined in a file: driver_trips_fview.py

driver_stats_fv = FeatureView(name="driver_activity",
    entities=["driver"],
    features=[ Feature(name="trips_today", dtype=ValueType.INT64),
        Feature(name="rating", dtype=ValueType.FLOAT),
    ],
input=RedshiftSource(table="redshift_driver_table"))

When defining a FeatureService for a specific model, you can simply reference the driver_stats_fv object instead of listing or duplicating all Feature object references.

from driver_trips_fview import driver_stats_fv

driver_stats_fs = FeatureService(name="driver_activity",
    features=[driver_stats_fv])

Use a Feature Service when you want to logically group features from multiple Feature Views. This way, when requested from Feast, all features will be returned from the feature store.

feature_store.get_historical_features(...) and feature_store.get_online_features(...)

What’s next

We are working on a Feast tutorial use case on AWS, meanwhile you can check out other tutorials in documentation. For more documentation about the aforementioned features, check the following Feast links:

Download Feast 0.12 today from PyPI and have a go at it. Let us know on our slack channel

Credits

We want to extend our gratitude and acknowledgement to all Feast community contributors, @achals, @adchia, @charliec443, @codyjlin, @DvirDukhan, @felixwang9817, @GregKuhlmann, @MattDelac, @mavysavydav, @Mwad22, @nels, @potatochip, @szalai1, @tedhtchang and @tsotnet, who helped us achieve this milestone.

To see all the features, enhancements, and bug fixes from the Feast community contributors, check the changelog for this release.

If you want to be part of the community join us in Slack and register for the Apply Conference Meetup. Or if you missed it, watch the recorded talks in our archives.