Class: FiniteMDP::HashModel

Inherits:
Object
  • Object
show all
Includes:
Model
Defined in:
lib/finite_mdp/hash_model.rb

Overview

A finite markov decision process model for which the transition probabilities and rewards are specified using nested hash tables.

The structure of the nested hash is as follows:

hash[:s]         #=> a Hash that maps actions to successor states
hash[:s][:a]     #=> a Hash from successor states to pairs (see next)
hash[:s][:a][:t] #=> an Array [probability, reward] for transition (s,a,t)

The states and actions can be arbitrary objects; see notes for Model.

The TableModel is an alternative way of storing these data.

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Model

#check_transition_probabilities_sum, #terminal_states, #transition_probability_sums

Constructor Details

#initialize(hash) ⇒ HashModel

Returns a new instance of HashModel.

Parameters:

  • hash (Hash<state, Hash<action, Hash<state, [Float, Float]>>>)

    see notes for FiniteMDP::HashModel for an explanation of this structure


23
24
25
# File 'lib/finite_mdp/hash_model.rb', line 23

def initialize(hash)
  @hash = hash
end

Instance Attribute Details

#hashHash<state, Hash<action, Hash<state, [Float, Float]>>>

Returns see notes for FiniteMDP::HashModel for an explanation of this structure.

Returns:

  • (Hash<state, Hash<action, Hash<state, [Float, Float]>>>)

    see notes for FiniteMDP::HashModel for an explanation of this structure


31
32
33
# File 'lib/finite_mdp/hash_model.rb', line 31

def hash
  @hash
end

Class Method Details

.from_model(model, sparse = true) ⇒ HashModel

Convert a generic model into a hash model.

Parameters:

  • model (Model)
  • sparse (Boolean) (defaults to: true)

    do not store entries for transitions with zero probability

Returns:


109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# File 'lib/finite_mdp/hash_model.rb', line 109

def self.from_model(model, sparse = true)
  hash = {}
  model.states.each do |state|
    hash[state] ||= {}
    model.actions(state).each do |action|
      hash[state][action] ||= {}
      model.next_states(state, action).each do |next_state|
        pr = model.transition_probability(state, action, next_state)
        next unless pr > 0 || !sparse
        hash[state][action][next_state] =
          [pr, model.reward(state, action, next_state)]
      end
    end
  end
  FiniteMDP::HashModel.new(hash)
end

Instance Method Details

#actions(state) ⇒ Array<action>

Actions that are valid for the given state; see Model#actions.

Parameters:

  • state (state)

Returns:

  • (Array<action>)

    not empty; no duplicate actions


49
50
51
# File 'lib/finite_mdp/hash_model.rb', line 49

def actions(state)
  hash[state].keys
end

#next_states(state, action) ⇒ Array<state>

Possible successor states after taking the given action in the given state; see Model#next_states.

Parameters:

  • state (state)
  • action (action)

Returns:

  • (Array<state>)

    not empty; no duplicate states


63
64
65
# File 'lib/finite_mdp/hash_model.rb', line 63

def next_states(state, action)
  hash[state][action].keys
end

#reward(state, action, next_state) ⇒ Float?

Reward for a given transition; see Model#reward.

Parameters:

  • state (state)
  • action (action)
  • next_state (state)

Returns:

  • (Float, nil)

    nil if the transition is not in the hash


94
95
96
97
# File 'lib/finite_mdp/hash_model.rb', line 94

def reward(state, action, next_state)
  _probability, reward = hash[state][action][next_state]
  reward
end

#statesArray<state>

States in this model; see Model#states.

Returns:

  • (Array<state>)

    not empty; no duplicate states


38
39
40
# File 'lib/finite_mdp/hash_model.rb', line 38

def states
  hash.keys
end

#transition_probability(state, action, next_state) ⇒ Float

Probability of the given transition; see Model#transition_probability.

Parameters:

  • state (state)
  • action (action)
  • next_state (state)

Returns:

  • (Float)

    in [0, 1]; zero if the transition is not in the hash


78
79
80
81
# File 'lib/finite_mdp/hash_model.rb', line 78

def transition_probability(state, action, next_state)
  probability, _reward = hash[state][action][next_state]
  probability || 0
end