Class: Aws::SageMaker::Types::ClusterTieredStorageConfig

Inherits:
Struct
  • Object
show all
Includes:
Aws::Structure
Defined in:
lib/aws-sdk-sagemaker/types.rb

Overview

Defines the configuration for managed tier checkpointing in a HyperPod cluster. Managed tier checkpointing uses multiple storage tiers, including cluster CPU memory, to provide faster checkpoint operations and improved fault tolerance for large-scale model training. The system automatically saves checkpoints at high frequency to memory and periodically persists them to durable storage, like Amazon S3.

Constant Summary collapse

SENSITIVE =
[]

Instance Attribute Summary collapse

Instance Attribute Details

#instance_memory_allocation_percentageInteger

The percentage (int) of cluster memory to allocate for checkpointing.

Returns:

  • (Integer)


6628
6629
6630
6631
6632
6633
# File 'lib/aws-sdk-sagemaker/types.rb', line 6628

class ClusterTieredStorageConfig < Struct.new(
  :mode,
  :instance_memory_allocation_percentage)
  SENSITIVE = []
  include Aws::Structure
end

#modeString

Specifies whether managed tier checkpointing is enabled or disabled for the HyperPod cluster. When set to ‘Enable`, the system installs a memory management daemon that provides disaggregated memory as a service for checkpoint storage. When set to `Disable`, the feature is turned off and the memory management daemon is removed from the cluster.

Returns:

  • (String)


6628
6629
6630
6631
6632
6633
# File 'lib/aws-sdk-sagemaker/types.rb', line 6628

class ClusterTieredStorageConfig < Struct.new(
  :mode,
  :instance_memory_allocation_percentage)
  SENSITIVE = []
  include Aws::Structure
end