What is Amazon SageMaker Feature Store?

SageMaker Feature Store is a managed service that provides a centralized repository for machine learning features, enabling consistent feature reuse across models and environments for both training and inference.

What is the difference between the online and offline store?

The online store provides low-latency access to the latest feature values for real-time inference, while the offline store stores historical feature data for model training, batch inference, and analytical purposes.

How does SageMaker Feature Store ensure consistency between training and inference?

It provides a unified feature definition and storage for both training and inference. By serving features from the same source, it helps mitigate training-serving skew, where discrepancies between data used for training and inference can degrade model performance.

What programming languages does SageMaker Feature Store support?

The primary language for interacting with SageMaker Feature Store is Python, utilizing the AWS SDK for Python (Boto3) and the SageMaker Python SDK.

Can I use SageMaker Feature Store with other AWS services?

Yes, it integrates with various AWS services, including Amazon S3, AWS Lambda, Amazon Kinesis, Amazon EMR, AWS Glue, and other Amazon SageMaker components, to build end-to-end ML workflows.

What kind of data can be stored in SageMaker Feature Store?

It can store numerical (integral, fractional) and string data types defined within feature groups. The underlying storage for the offline store is Amazon S3, which supports various data formats.

Is there a free tier available for SageMaker Feature Store?

Yes, SageMaker Feature Store is included in the AWS Free Tier, offering limited usage for eligible new and existing AWS accounts.

SageMaker Feature Store — Unified ML Feature Management

Overview

Amazon SageMaker Feature Store is a fully managed service that provides a dedicated repository for machine learning features, addressing challenges related to feature engineering, storage, and serving. It is designed to help teams standardize and reuse features across multiple models and applications, aiming to reduce redundant work and improve model consistency. The service separates an online store for low-latency, real-time inference and an offline store for batch inference and model training. This architecture helps ensure that the features used during model training are identical to those used during inference, mitigating training-serving skew.

Developers and data scientists utilize SageMaker Feature Store to create feature groups, which are logical collections of features defined by a schema. These feature groups can be populated with data from various sources, and the service manages the ingestion process. Once features are stored, they become discoverable and can be retrieved programmatically using the AWS SDK for Python (Boto3). The integration with the broader AWS ecosystem means that it can interact with other SageMaker services, such as SageMaker Training and SageMaker Inference, as well as data storage services like Amazon S3.

SageMaker Feature Store is suitable for organizations building and deploying ML models at scale, particularly those requiring consistent feature definitions across development and production environments. Its capabilities support use cases ranging from real-time recommendation engines that need fresh features for immediate predictions to batch processing for fraud detection or customer segmentation. Compliance certifications such as SOC 1, SOC 2, and SOC 3, along with ISO 27001, GDPR, and HIPAA support, make it an option for regulated industries. However, new users unfamiliar with AWS Identity and Access Management (IAM) and networking configurations may find the initial setup complex.

Key features

Online Store: Provides low-latency access to the latest feature values for real-time inference, supporting use cases like personalized recommendations or fraud detection.
Offline Store: Stores historical feature data for model training, batch inference, and analytical purposes, typically leveraging Amazon S3.
Feature Groups: Allows users to define a schema for a collection of related features, enabling structured storage and retrieval.
Data Ingestion: Supports ingesting feature data from various sources into both online and offline stores, with mechanisms for handling updates and versioning.
Feature Discovery: Enables data scientists and ML engineers to search and discover existing features, promoting reuse and reducing redundant feature engineering efforts.
Point-in-Time Queries: Facilitates retrieving feature values as they existed at a specific historical timestamp, crucial for preventing data leakage during model training.
Training-Serving Consistency: Designed to ensure that the same feature definitions and values are used for both model training and real-time inference, mitigating potential performance degradation.
Access Control and Governance: Integrates with AWS IAM for granular control over who can access, create, or modify feature groups and their data.

Pricing

SageMaker Feature Store operates on a pay-as-you-go model, with costs primarily based on data storage, read/write operations, and data transfer. The AWS Free Tier offers limited usage for new accounts.

SageMaker Feature Store Pricing Summary (as of 2026-05-28)
Component	Description
Online Store Storage	Billed per GB-month for data stored in the online store.
Online Store Write Operations	Billed per million write requests to the online store.
Online Store Read Operations	Billed per million read requests from the online store.
Offline Store Storage	Billed based on standard Amazon S3 storage rates for data stored in the offline store.
Offline Store Ingestion	Billed per GB of data ingested into the offline store.
Data Transfer	Standard AWS data transfer rates apply for data moving in and out of the service.

For detailed and current pricing information, refer to the AWS SageMaker Feature Store pricing page.

Common integrations

Amazon SageMaker Studio: Integrated development environment for ML, allowing discovery and utilization of features directly.
Amazon S3: Serves as the underlying storage for the offline feature store, enabling large-scale data storage and retrieval.
AWS Lambda: Can be used to trigger feature engineering pipelines or to serve features for real-time inference.
Amazon Kinesis/Kafka: Common sources for streaming data ingestion into the online store for fresh features.
Amazon EMR/Glue: Often used for batch processing and transforming raw data into features before ingestion into the feature store.
AWS Identity and Access Management (IAM): Provides security and access control for feature groups and operations within the feature store.

Alternatives

Databricks Feature Store: Integrated with the Databricks Lakehouse Platform, offering a managed feature store solution for Databricks users.
Google Cloud Vertex AI Feature Store: A fully managed feature store within Google Cloud's Vertex AI platform, providing capabilities for feature management and serving.
Tecton: A specialized feature platform designed to operationalize features for real-time ML, offering a robust set of tools for feature engineering and serving.
Open-source solutions (e.g., Feast): Community-driven feature stores that require self-hosting and management but offer flexibility and control over the infrastructure.

Getting started

To get started with Amazon SageMaker Feature Store, you typically define a feature group, ingest data, and then retrieve features for training or inference. The following Python example demonstrates how to create a simple feature group and ingest a record using the Boto3 SDK.

import sagemaker
import boto3
import pandas as pd
import time

# Initialize SageMaker session and Boto3 client
sagemaker_session = sagemaker.Session()
boto3_session = boto3.Session(region_name=sagemaker_session.boto_region_name)
featurestore_runtime = boto3_session.client(service_name='sagemaker-featurestore-runtime')
sagemaker_client = boto3_session.client(service_name='sagemaker')

# Define a unique feature group name
feature_group_name = f'my-sample-feature-group-{int(time.time())}'

# Define the feature group schema
feature_definitions = [
    {'FeatureName': 'customer_id', 'FeatureType': 'Fractional'},
    {'FeatureName': 'transaction_count', 'FeatureType': 'Integral'},
    {'FeatureName': 'last_login_timestamp', 'FeatureType': 'Fractional'}
]

# Create a FeatureGroup object
feature_group = sagemaker.feature_store.feature_group.FeatureGroup(
    name=feature_group_name,
    feature_definitions=feature_definitions,
    sagemaker_session=sagemaker_session
)

# Create the feature group in SageMaker
print(f"Creating feature group: {feature_group_name}...")
feature_group.create(
    s3_uri=f's3://{sagemaker_session.default_bucket()}/feature-store/{feature_group_name}',
    record_identifier_name='customer_id',
    event_time_feature_name='last_login_timestamp',
    enable_online_store=True
)

# Wait for the feature group to be created
status = feature_group.describe().get('FeatureGroupStatus')
while status == 'Creating':
    print("Waiting for Feature Group to be created...")
    time.sleep(5)
    status = feature_group.describe().get('FeatureGroupStatus')
print(f"Feature Group {feature_group_name} status: {status}")

# Prepare data for ingestion
record_data = [
    {
        'FeatureName': 'customer_id',
        'ValueAsString': '101.0'
    },
    {
        'FeatureName': 'transaction_count',
        'ValueAsString': '5'
    },
    {
        'FeatureName': 'last_login_timestamp',
        'ValueAsString': str(time.time())
    }
]

# Ingest a record into the feature group
try:
    featurestore_runtime.put_record(
        FeatureGroupName=feature_group_name,
        Record=record_data
    )
    print(f"Successfully put record into feature group {feature_group_name}.")
except Exception as e:
    print(f"Error putting record: {e}")

# Example of retrieving a record (after a short delay for propagation)
time.sleep(2) # Give some time for the record to propagate to the online store
try:
    retrieved_record = featurestore_runtime.get_record(
        FeatureGroupName=feature_group_name,
        RecordIdentifierValueAsString='101.0'
    )
    print("Retrieved Record:")
    for feature in retrieved_record['Record']:
        print(f"  {feature['FeatureName']}: {feature['ValueAsString']}")
except Exception as e:
    print(f"Error retrieving record: {e}")

# Clean up (optional) - delete the feature group
# print(f"Deleting feature group: {feature_group_name}...")
# feature_group.delete()
# print(f"Feature group {feature_group_name} deleted.")

This script first initializes the necessary AWS clients and defines a feature group with a schema. It then creates the feature group, waits for it to become active, and ingests a sample record. Finally, it demonstrates how to retrieve the ingested record. This foundational process is extended in real-world scenarios to include more complex feature engineering pipelines and integration with ML models for training and inference.

SageMaker Feature Store

Overview

Key features

Pricing

Common integrations

Alternatives

Getting started

Frequently asked questions

User reviews

Reader threads

Overview

Key features

Pricing

Common integrations

Alternatives

Getting started

Related

Frequently asked questions

User reviews

Reader threads