Class: Aws::SageMaker::Types::ClusterTieredStorageConfig
- Inherits:
-
Struct
- Object
- Struct
- Aws::SageMaker::Types::ClusterTieredStorageConfig
- Includes:
- Aws::Structure
- Defined in:
- lib/aws-sdk-sagemaker/types.rb
Overview
Defines the configuration for managed tier checkpointing in a HyperPod cluster. Managed tier checkpointing uses multiple storage tiers, including cluster CPU memory, to provide faster checkpoint operations and improved fault tolerance for large-scale model training. The system automatically saves checkpoints at high frequency to memory and periodically persists them to durable storage, like Amazon S3.
Constant Summary collapse
- SENSITIVE =
[]
Instance Attribute Summary collapse
-
#instance_memory_allocation_percentage ⇒ Integer
The percentage (int) of cluster memory to allocate for checkpointing.
-
#mode ⇒ String
Specifies whether managed tier checkpointing is enabled or disabled for the HyperPod cluster.
Instance Attribute Details
#instance_memory_allocation_percentage ⇒ Integer
The percentage (int) of cluster memory to allocate for checkpointing.
6628 6629 6630 6631 6632 6633 |
# File 'lib/aws-sdk-sagemaker/types.rb', line 6628 class ClusterTieredStorageConfig < Struct.new( :mode, :instance_memory_allocation_percentage) SENSITIVE = [] include Aws::Structure end |
#mode ⇒ String
Specifies whether managed tier checkpointing is enabled or disabled for the HyperPod cluster. When set to ‘Enable`, the system installs a memory management daemon that provides disaggregated memory as a service for checkpoint storage. When set to `Disable`, the feature is turned off and the memory management daemon is removed from the cluster.
6628 6629 6630 6631 6632 6633 |
# File 'lib/aws-sdk-sagemaker/types.rb', line 6628 class ClusterTieredStorageConfig < Struct.new( :mode, :instance_memory_allocation_percentage) SENSITIVE = [] include Aws::Structure end |