Module: Classifier::LSI::IncrementalSVD

Defined in:
lib/classifier/lsi/incremental_svd.rb

Overview

Brand’s Incremental SVD Algorithm for LSI

Implements the algorithm from Brand (2006) “Fast low-rank modifications of the thin singular value decomposition” for adding documents to LSI without full SVD recomputation.

Given existing thin SVD: A ≈ U * S * V^T (with k components) When adding a new column c:

  1. Project: m = U^T * c (project onto existing column space)

  2. Residual: p = c - U * m (component orthogonal to U)

  3. Orthonormalize: If ||p|| > ε: p̂ = p / ||p||

  4. Form K matrix:

    • If ||p|| > ε: K = [diag(s), m; 0, ||p||] (rank grows by 1)

    • If ||p|| ≈ 0: K = diag(s) + m * e_last^T (no new direction)

  5. Small SVD: Compute SVD of K (only (k+1) × (k+1) matrix!)

  6. Update:

    • U_new = [U, p̂] * U’

    • S_new = S’

Constant Summary collapse

EPSILON =
1e-10

Class Method Summary collapse

Class Method Details

.project(u, raw_vec) ⇒ Vector

Projects a document vector onto the semantic space defined by U. Returns the LSI representation: lsi_vec = U^T * raw_vec



64
65
66
# File 'lib/classifier/lsi/incremental_svd.rb', line 64

def project(u, raw_vec)
  u.transpose * raw_vec
end

.update(u, s, c, max_rank:, epsilon: EPSILON) ⇒ Array<Matrix, Array<Float>>

Updates SVD with a new document vector using Brand’s algorithm.



43
44
45
46
47
48
49
50
51
52
53
54
# File 'lib/classifier/lsi/incremental_svd.rb', line 43

def update(u, s, c, max_rank:, epsilon: EPSILON)
  m_vec = project(u, c)
  u_times_m = u * m_vec
  p_vec = c - (u_times_m.is_a?(Vector) ? u_times_m : Vector[*u_times_m.to_a.flatten])
  p_norm = magnitude(p_vec)

  if p_norm > epsilon
    update_with_new_direction(u, s, m_vec, p_vec, p_norm, max_rank, epsilon)
  else
    update_in_span(u, s, m_vec, max_rank, epsilon)
  end
end