Streamlining AI Feature Engineering with Feast and dbt
If you’re a dbt user, you know the power of well-crafted data models. You’ve invested time building clean, tested, and documented transformations that your team relies on. Your dbt models represent the single source of truth for analytics, reporting, and increasingly—AI features.
But here’s the challenge: when your AI team wants to use these models for production predictions, they often need to rebuild the same transformations in their feature store. Your beautiful dbt models, with all their logic and documentation, end up getting reimplemented elsewhere. This feels like wasted effort, and it is.
What if you could take your existing dbt models and put them directly into production for AI without rewriting anything? That’s exactly what Feast’s dbt integration enables.
Your dbt Models Are Already AI-Ready
You’ve already done the hard work with dbt:
- Transformed raw data into clean, aggregated tables
- Documented your models with column descriptions and metadata
- Tested your logic to ensure data quality
- Organized your transformations into a maintainable codebase
These models are perfect for AI features. The aggregations you’ve built for your daily reports? Those are features. The customer attributes you’ve enriched? Features. The time-based calculations you’ve perfected? You guessed it—features.
The problem isn’t your models—they’re great. The problem is getting them into a system that can serve them for real-time AI predictions with low latency and point-in-time correctness.
How Feast Brings Your dbt Models to Production AI
Feast’s dbt integration is designed with one principle in mind: your dbt models should be the single source of truth. Instead of asking you to rewrite your transformations, Feast reads your dbt project and automatically generates everything needed to serve those models for AI predictions.
Here’s how it works:
- Tag your dbt models that you want to use as features (just add
tags: ['feast']to your config) - Run
feast dbt importto automatically generate Feast definitions from your dbt metadata - Deploy to production using Feast’s feature serving infrastructure
Feast reads your manifest.json (the compiled output from dbt compile) and extracts:
- Column names, types, and descriptions from your schema files
- Table locations from your dbt models
- All the metadata you’ve already documented
Then it generates Python code defining Feast entities, data sources, and feature views—all matching your dbt models exactly. Your documentation becomes feature documentation. Your data types become feature types. Your models become production-ready features.
The best part? You don’t change your dbt workflow at all. Keep building models the way you always have. The integration simply creates a bridge from your dbt project to production AI serving.
See It In Action: From dbt Model to Production Features
Let’s walk through a real example. Imagine you’re a data engineer at a ride-sharing company, and you’ve already built dbt models to track driver performance. Your analytics team loves these models, and now your AI team wants to use them to predict which drivers are likely to accept rides.
Perfect use case. Let’s take your existing dbt models to production AI in just a few steps.
Step 1: Install Feast with dbt Support
First, ensure you have Feast installed with dbt support:
pip install 'feast[dbt]'
Step 2: Tag Your Existing dbt Model
You already have a dbt model that computes driver metrics. All you need to do is add one tag to mark it for Feast:
{% code title=“models/features/driver_features.sql” %}
{{ config(
materialized='table',
tags=['feast'] -- ← Just add this tag!
) }}
WITH driver_stats AS (
SELECT
driver_id,
DATE(completed_at) as date,
AVG(rating) as avg_rating,
COUNT(*) as total_trips,
SUM(fare_amount) as total_earnings,
AVG(trip_duration_minutes) as avg_trip_duration
FROM {{ ref('trips') }}
WHERE status = 'completed'
GROUP BY driver_id, DATE(completed_at)
)
SELECT
driver_id,
TIMESTAMP(date) as event_timestamp,
avg_rating,
total_trips,
total_earnings,
avg_trip_duration,
CASE WHEN total_trips >= 5 THEN true ELSE false END as is_active
FROM driver_stats
{% endcode %}
That’s it. One tag. Your model doesn’t change—it keeps working exactly as before for your analytics workloads.
Step 3: Use Your Existing Documentation
You’re probably already documenting your dbt models (and if you’re not, you should be!). Feast uses this exact same documentation—no duplication needed:
{% code title=“models/features/schema.yml” %}
version: 2
models:
- name: driver_features
description: "Daily aggregated features for drivers including ratings and activity metrics"
columns:
- name: driver_id
description: "Unique identifier for the driver"
data_type: STRING
- name: event_timestamp
description: "Date of the feature computation"
data_type: TIMESTAMP
- name: avg_rating
description: "Average rating received from riders"
data_type: FLOAT64
- name: total_trips
description: "Total number of completed trips"
data_type: INT64
- name: total_earnings
description: "Total earnings in dollars"
data_type: FLOAT64
- name: avg_trip_duration
description: "Average trip duration in minutes"
data_type: FLOAT64
- name: is_active
description: "Whether driver completed 5+ trips (active status)"
data_type: BOOLEAN
{% endcode %}
Your column descriptions and data types become the feature documentation in Feast automatically. Write it once, use it everywhere.
Step 4: Compile Your dbt Project (As Usual)
This is your normal dbt workflow—nothing special here:
cd your_dbt_project
dbt compile
This creates target/manifest.json with all your model metadata—the same artifact you’re already generating.
Step 5: See What Feast Found
Use the Feast CLI to discover your tagged models:
feast dbt list -m target/manifest.json --tag feast
You’ll see output like:
Found 1 model(s):
driver_features [tags: feast]
Table: my_project.my_dataset.driver_features
Description: Daily aggregated features for drivers including ratings and activity metrics
Step 6: Import Your dbt Model to Feast
Now for the magic—automatically generate production-ready feature definitions from your dbt model:
feast dbt import -m target/manifest.json \
--entity-column driver_id \
--data-source-type bigquery \
--tag feast \
--output feature_repo/driver_features.py
In seconds, Feast generates a complete Python file with everything needed for production AI serving—all from your existing dbt model:
{% code title=“feature_repo/driver_features.py” %}
"""
Feast feature definitions generated from dbt models.
Source: target/manifest.json
Generated by: feast dbt import
"""
from datetime import timedelta
from feast import Entity, FeatureView, Field
from feast.types import Bool, Float64, Int64, String
from feast.infra.offline_stores.bigquery_source import BigQuerySource
# Entities
driver_id = Entity(
name="driver_id",
join_keys=["driver_id"],
description="Entity key for dbt models",
tags={'source': 'dbt'},
)
# Data Sources
driver_features_source = BigQuerySource(
name="driver_features_source",
table="my_project.my_dataset.driver_features",
timestamp_field="event_timestamp",
description="Daily aggregated features for drivers including ratings and activity metrics",
tags={'dbt.model': 'driver_features', 'dbt.tag.feast': 'true'},
)
# Feature Views
driver_features_fv = FeatureView(
name="driver_features",
entities=[driver_id],
ttl=timedelta(days=1),
schema=[
Field(name="avg_rating", dtype=Float64, description="Average rating received from riders"),
Field(name="total_trips", dtype=Int64, description="Total number of completed trips"),
Field(name="total_earnings", dtype=Float64, description="Total earnings in dollars"),
Field(name="avg_trip_duration", dtype=Float64, description="Average trip duration in minutes"),
Field(name="is_active", dtype=Bool, description="Whether driver completed 5+ trips (active status)"),
],
online=True,
source=driver_features_source,
description="Daily aggregated features for drivers including ratings and activity metrics",
tags={'dbt.model': 'driver_features', 'dbt.tag.feast': 'true'},
)
{% endcode %}
Step 7: Apply to Your Feature Store
Now you can use standard Feast commands to materialize these features:
cd feature_repo
feast apply
feast materialize-incremental $(date -u +%Y-%m-%dT%H:%M:%S)
What Just Happened?
You just went from dbt model to production AI features without rewriting a single line of transformation logic. Your dbt model—with all its carefully crafted SQL, documentation, and testing—is now:
- Serving features in milliseconds for real-time predictions
- Maintaining point-in-time correctness to prevent data leakage during training
- Syncing with your data warehouse automatically as your dbt models update
- Self-documenting using the descriptions you already wrote
And here’s the best part: when you update your dbt model (maybe you add a new column or refine your logic), just re-run feast dbt import and feast apply. Your production features stay in sync with your dbt source of truth.
Advanced Use Cases for dbt Users
Multiple Entity Support
For features involving multiple entities (like user-merchant transactions), specify multiple entity columns:
feast dbt import -m target/manifest.json \
-e user_id \
-e merchant_id \
--tag feast \
-o feature_repo/transaction_features.py
This creates a FeatureView with composite keys, useful for:
- Transaction features keyed by both user and merchant
- Interaction features for recommendation systems
- Many-to-many relationship features
Snowflake and Other Data Sources
Feast’s dbt integration supports multiple data warehouse backends:
Snowflake:
feast dbt import -m manifest.json \
-e user_id \
-d snowflake \
-o features.py
File-based sources (Parquet, etc.):
feast dbt import -m manifest.json \
-e user_id \
-d file \
-o features.py
Customizing Generated Code
You can fine-tune the import with additional options:
feast dbt import -m target/manifest.json \
-e driver_id \
-d bigquery \
--timestamp-field created_at \
--ttl-days 7 \
--exclude-columns internal_id,temp_field \
-o features.py
Best Practices
1. Establish a Tagging Convention
Use dbt’s configuration hierarchy to automatically tag entire directories:
# dbt_project.yml
models:
my_project:
features:
+tags: ['feast'] # All models in features/ get tagged
2. Maintain Rich Documentation
Column descriptions from your dbt schema files become feature descriptions in Feast, creating a self-documenting feature catalog. Invest time in documenting your dbt models—it pays dividends in feature discoverability.
3. Integrate with CI/CD
Automate feature definition updates in your deployment pipeline:
# .github/workflows/features.yml
name: Update Features
on:
push:
paths:
- 'dbt_project/**'
jobs:
update-features:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install 'feast[dbt]'
pip install dbt-bigquery
- name: Compile dbt
run: |
cd dbt_project
dbt compile
- name: Generate Feast definitions
run: |
feast dbt import -m dbt_project/target/manifest.json \
-e user_id \
-d bigquery \
--tag feast \
-o feature_repo/features.py
- name: Apply to feature store
run: |
cd feature_repo
feast apply
4. Use Dry Run for Validation
Before generating code, preview what will be created:
feast dbt import -m manifest.json -e driver_id --dry-run
This helps catch issues like missing columns or incorrect types before committing.
5. Version Control Generated Code
Commit the generated Python files to your repository. This provides:
- Change tracking for feature definitions
- Code review visibility for dbt-to-Feast mappings
- Rollback capability if needed
Why dbt Users Love This
Data teams using Feast with dbt are seeing real impact:
- “We stopped rewriting features twice”: Data engineers build once in dbt, AI teams use directly
- 50-70% faster AI deployment: From dbt model to production features in minutes, not weeks
- Single source of truth: When dbt models update, AI features stay in sync
- Analytics expertise becomes AI expertise: Your dbt knowledge directly translates to AI feature engineering
- Better collaboration: No more need to rewrite SQL in Python
Current Limitations and Future Roadmap
The dbt integration is currently in alpha with some limitations:
- Data source support: Currently supports BigQuery, Snowflake, and file-based sources
- Manual entity specification: You must explicitly specify entity columns
- No incremental updates: Each import generates a complete file
We’re actively working on enhancements including:
- Automatic entity inference from foreign key relationships
- Support for additional data sources (Redshift, Postgres)
- Incremental updates to preserve custom modifications
- Enhanced type mapping for complex nested structures
Getting Help
If you encounter issues or have questions:
- Documentation: Check our dbt integration guide
- Community: Join our Slack community
- Issues: Report bugs or request features on GitHub
Conclusion: Your dbt Models Deserve Production AI
You’ve invested time and care into your dbt models. They’re clean, documented, tested, and trusted by your organization. They shouldn’t have to be rewritten to power AI—they should work as-is.
Feast’s dbt integration makes that possible. Your dbt models become production AI features with:
- ✅ No rewriting or duplication
- ✅ No changes to your dbt workflow
- ✅ All your documentation preserved
- ✅ Real-time serving for predictions
- ✅ Point-in-time correctness for training
If you’re a dbt user who’s been asked to “make those models work for AI,” this is your answer.
Ready to see your dbt models in production? Install Feast and try it out:
pip install 'feast[dbt]'
cd your_dbt_project
dbt compile
feast dbt import -m target/manifest.json -e your_entity_column -d bigquery
Your models are already great. Now make them do more.
Join us on Slack to share your dbt + Feast success stories, or check out the full documentation to dive deeper.
Want to contribute to Feast’s dbt integration? Check out our contributing guide and join us on GitHub.