Module: Classifier::LSI::IncrementalSVD

Defined in:: lib/classifier/lsi/incremental_svd.rb

Overview

Brand’s Incremental SVD Algorithm for LSI

Implements the algorithm from Brand (2006) “Fast low-rank modifications of the thin singular value decomposition” for adding documents to LSI without full SVD recomputation.

Given existing thin SVD: A ≈ U * S * V^T (with k components) When adding a new column c:

Project: m = U^T * c (project onto existing column space)
Residual: p = c - U * m (component orthogonal to U)
Orthonormalize: If ||p|| > ε: p̂ = p / ||p||
Form K matrix:
- If ||p|| > ε: K = [diag(s), m; 0, ||p||] (rank grows by 1)
- If ||p|| ≈ 0: K = diag(s) + m * e_last^T (no new direction)
Small SVD: Compute SVD of K (only (k+1) × (k+1) matrix!)
Update:
- U_new = [U, p̂] * U’
- S_new = S’

Constant Summary collapse

EPSILON =

1e-10

Class Method Summary collapse

.project(u, raw_vec) ⇒ Vector

Projects a document vector onto the semantic space defined by U.
.update(u, s, c, max_rank:, epsilon: EPSILON) ⇒ Array<Matrix, Array<Float>>

Updates SVD with a new document vector using Brand’s algorithm.

Class Method Details

.project(u, raw_vec) ⇒ `Vector`

Projects a document vector onto the semantic space defined by U. Returns the LSI representation: lsi_vec = U^T * raw_vec



64
65
66

# File 'lib/classifier/lsi/incremental_svd.rb', line 64

def project(u, raw_vec)
  u.transpose * raw_vec
end

.update(u, s, c, max_rank:, epsilon: EPSILON) ⇒ `Array<Matrix, Array<Float>>`

Updates SVD with a new document vector using Brand’s algorithm.

# File 'lib/classifier/lsi/incremental_svd.rb', line 43

def update(u, s, c, max_rank:, epsilon: EPSILON)
  m_vec = project(u, c)
  u_times_m = u * m_vec
  p_vec = c - (u_times_m.is_a?(Vector) ? u_times_m : Vector[*u_times_m.to_a.flatten])
  p_norm = magnitude(p_vec)

  if p_norm > epsilon
    update_with_new_direction(u, s, m_vec, p_vec, p_norm, max_rank, epsilon)
  else
    update_in_span(u, s, m_vec, max_rank, epsilon)
  end
end