Class: FiniteMDP::ArrayModel

Inherits:
Object
  • Object
show all
Includes:
Model
Defined in:
lib/finite_mdp/array_model.rb

Overview

A finite markov decision process model for which the states, transition probabilities and rewards are stored in a sparse nested array format:

model[state_num][action_num] = [[next_state_num, probability, reward], ...]

Note: The action_num is not consistent between states — each state's action array contains only the actions that apply in that state.

This class also maintains a StateActionMap to map between the state and action numbers and the original states and actions.

Defined Under Namespace

Classes: OrderedStateActionMap, StateActionMap

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Model

#check_transition_probabilities_sum, #terminal_states, #transition_probability_sums

Constructor Details

#initialize(array, state_action_map) ⇒ ArrayModel

Returns a new instance of ArrayModel.

Parameters:


94
95
96
97
# File 'lib/finite_mdp/array_model.rb', line 94

def initialize(array, state_action_map)
  @array = array
  @state_action_map = state_action_map
end

Instance Attribute Details

#arrayArray<Array<Array>> (readonly)

Returns ] array see notes for FiniteMDP::ArrayModel.

Returns:


102
103
104
# File 'lib/finite_mdp/array_model.rb', line 102

def array
  @array
end

#state_action_mapStateActionMap (readonly)

Returns:


107
108
109
# File 'lib/finite_mdp/array_model.rb', line 107

def state_action_map
  @state_action_map
end

Class Method Details

.from_model(model, sparse = true, ordered = nil) ⇒ ArrayModel

Convert a generic model into a hash model.

Parameters:

  • model (Model)
  • sparse (Boolean) (defaults to: true)

    do not store entries for transitions with zero probability

  • ordered (Boolean) (defaults to: nil)

    assume states are orderable; default is to inspect the first state

Returns:


211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
# File 'lib/finite_mdp/array_model.rb', line 211

def self.from_model(model, sparse = true, ordered = nil)
  state_action_map = StateActionMap.from_model(model, ordered)

  array = state_action_map.states.map do |state|
    state_action_map.actions(state).map do |action|
      model.next_states(state, action).map do |next_state|
        pr = model.transition_probability(state, action, next_state)
        next unless pr > 0 || !sparse
        reward = model.reward(state, action, next_state)
        next_index = state_action_map.state_index(next_state)
        raise "successor state not found: #{next_state}" unless next_index
        [next_index, pr, reward]
      end.compact
    end
  end

  FiniteMDP::ArrayModel.new(array, state_action_map)
end

Instance Method Details

#actions(state) ⇒ Array<state>

Actions that are valid for the given state; see Model#actions.

Parameters:

  • state (state)

Returns:

  • (Array<state>)

    not empty; no duplicate actions


134
135
136
# File 'lib/finite_mdp/array_model.rb', line 134

def actions(state)
  @state_action_map.actions(state)
end

#next_states(state, action) ⇒ Array<state>

Possible successor states after taking the given action in the given state; see Model#next_states.

Parameters:

  • state (state)
  • action (action)

Returns:

  • (Array<state>)

    not empty; no duplicates


148
149
150
151
152
153
154
# File 'lib/finite_mdp/array_model.rb', line 148

def next_states(state, action)
  state_index, action_index =
    @state_action_map.state_action_index(state, action)
  @array[state_index][action_index].map do |next_state_index, _pr, _reward|
    @state_action_map.state(next_state_index)
  end
end

#num_statesFixnum

Number of states in the model.

Returns:

  • (Fixnum)

    positive


123
124
125
# File 'lib/finite_mdp/array_model.rb', line 123

def num_states
  @state_action_map.map.size
end

#reward(state, action, next_state) ⇒ Float?

Reward for a given transition; see Model#reward.

Parameters:

  • state (state)
  • action (action)
  • next_state (state)

Returns:

  • (Float, nil)

    nil if the transition is not in the model


188
189
190
191
192
193
194
195
196
# File 'lib/finite_mdp/array_model.rb', line 188

def reward(state, action, next_state)
  state_index, action_index =
    @state_action_map.state_action_index(state, action)
  next_state_index = @state_action_map.state_index(next_state)
  @array[state_index][action_index].each do |index, _probability, reward|
    return reward if index == next_state_index
  end
  nil
end

#statesArray<state>

States in this model; see Model#states.

Returns:

  • (Array<state>)

    not empty; no duplicate states


114
115
116
# File 'lib/finite_mdp/array_model.rb', line 114

def states
  @state_action_map.states
end

#transition_probability(state, action, next_state) ⇒ Float

Probability of the given transition; see Model#transition_probability.

Parameters:

  • state (state)
  • action (action)
  • next_state (state)

Returns:

  • (Float)

    in [0, 1]; zero if the transition is not in the model


167
168
169
170
171
172
173
174
175
# File 'lib/finite_mdp/array_model.rb', line 167

def transition_probability(state, action, next_state)
  state_index, action_index =
    @state_action_map.state_action_index(state, action)
  next_state_index = @state_action_map.state_index(next_state)
  @array[state_index][action_index].each do |index, probability, _reward|
    return probability if index == next_state_index
  end
  0
end