Class: FiniteMDP::TableModel

Inherits:
Object
  • Object
show all
Includes:
Model
Defined in:
lib/finite_mdp/table_model.rb

Overview

A finite markov decision process model for which the states, actions, transition probabilities and rewards are specified as a table. This is a common way of specifying small models.

The states and actions can be arbitrary objects; see notes for Model.

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Model

#check_transition_probabilities_sum, #terminal_states, #transition_probability_sums

Constructor Details

#initialize(rows) ⇒ TableModel

Returns a new instance of TableModel.

Parameters:

  • rows (Array<[state, action, state, Float, Float]>)

    each row is

    state, action, next state, probability, reward

17
18
19
# File 'lib/finite_mdp/table_model.rb', line 17

def initialize(rows)
  @rows = rows
end

Instance Attribute Details

#rowsArray<[state, action, state, Float, Float]>

Returns each row is [state, action, next state, probability, reward].

Returns:

  • (Array<[state, action, state, Float, Float]>)

    each row is [state, action, next state, probability, reward]


25
26
27
# File 'lib/finite_mdp/table_model.rb', line 25

def rows
  @rows
end

Class Method Details

.from_model(model, sparse = true) ⇒ TableModel

Convert any model into a table model.

Parameters:

  • model (Model)
  • sparse (Boolean) (defaults to: true)

    do not store rows for transitions with zero probability

Returns:


110
111
112
113
114
115
116
117
118
119
120
121
122
123
# File 'lib/finite_mdp/table_model.rb', line 110

def self.from_model(model, sparse = true)
  rows = []
  model.states.each do |state|
    model.actions(state).each do |action|
      model.next_states(state, action).each do |next_state|
        pr = model.transition_probability(state, action, next_state)
        next unless pr > 0 || !sparse
        reward = model.reward(state, action, next_state)
        rows << [state, action, next_state, pr, reward]
      end
    end
  end
  FiniteMDP::TableModel.new(rows)
end

Instance Method Details

#actions(state) ⇒ Array<action>

Actions that are valid for the given state; see Model#actions.

Parameters:

  • state (state)

Returns:

  • (Array<action>)

    not empty; no duplicate actions


43
44
45
# File 'lib/finite_mdp/table_model.rb', line 43

def actions(state)
  @rows.map { |row| row[1] if row[0] == state }.compact.uniq
end

#inspectString

Returns can be quite large.

Returns:

  • (String)

    can be quite large


96
97
98
# File 'lib/finite_mdp/table_model.rb', line 96

def inspect
  rows.map(&:inspect).join("\n")
end

#next_states(state, action) ⇒ Array<state>

Possible successor states after taking the given action in the given state; see Model#next_states.

Parameters:

  • state (state)
  • action (action)

Returns:

  • (Array<state>)

    not empty; no duplicate states


57
58
59
# File 'lib/finite_mdp/table_model.rb', line 57

def next_states(state, action)
  @rows.map { |row| row[2] if row[0] == state && row[1] == action }.compact
end

#reward(state, action, next_state) ⇒ Float?

Reward for a given transition; see Model#reward.

Parameters:

  • state (state)
  • action (action)
  • next_state (state)

Returns:

  • (Float, nil)

    nil if the transition is not in the table


88
89
90
91
# File 'lib/finite_mdp/table_model.rb', line 88

def reward(state, action, next_state)
  row = find_row(state, action, next_state)
  row[4] if row
end

#statesArray<state>

States in this model; see Model#states.

Returns:

  • (Array<state>)

    not empty; no duplicate states


32
33
34
# File 'lib/finite_mdp/table_model.rb', line 32

def states
  @rows.map { |row| row[0] }.uniq
end

#transition_probability(state, action, next_state) ⇒ Float

Probability of the given transition; see Model#transition_probability.

Parameters:

  • state (state)
  • action (action)
  • next_state (state)

Returns:

  • (Float)

    in [0, 1]; zero if the transition is not in the table


72
73
74
75
# File 'lib/finite_mdp/table_model.rb', line 72

def transition_probability(state, action, next_state)
  row = find_row(state, action, next_state)
  row ? row[3] : 0
end