Module: Classifier::LSI::IncrementalSVD

Defined in:
lib/classifier/lsi/incremental_svd.rb

Overview

Brand’s Incremental SVD Algorithm for LSI

Implements the algorithm from Brand (2006) “Fast low-rank modifications of the thin singular value decomposition” for adding documents to LSI without full SVD recomputation.

Given existing thin SVD: A ≈ U * S * V^T (with k components) When adding a new column c:

  1. Project: m = U^T * c (project onto existing column space)

  2. Residual: p = c - U * m (component orthogonal to U)

  3. Orthonormalize: If ||p|| > ε: p̂ = p / ||p||

  4. Form K matrix:

    • If ||p|| > ε: K = [diag(s), m; 0, ||p||] (rank grows by 1)

    • If ||p|| ≈ 0: K = diag(s) + m * e_last^T (no new direction)

  5. Small SVD: Compute SVD of K (only (k+1) × (k+1) matrix!)

  6. Update:

    • U_new = [U, p̂] * U’

    • S_new = S’

Constant Summary collapse

EPSILON =
1e-10

Class Method Summary collapse

Class Method Details

.project(u, raw_vec) ⇒ Vector

Projects a document vector onto the semantic space defined by U. Returns the LSI representation: lsi_vec = U^T * raw_vec

Parameters:

  • u (Matrix)

    left singular vectors (m × k)

  • raw_vec (Vector)

    document vector in term space (m × 1)

Returns:

  • (Vector)

    document in semantic space (k × 1)



64
65
66
# File 'lib/classifier/lsi/incremental_svd.rb', line 64

def project(u, raw_vec)
  u.transpose * raw_vec
end

.update(u, s, c, max_rank:, epsilon: EPSILON) ⇒ Array<Matrix, Array<Float>>

Updates SVD with a new document vector using Brand’s algorithm.

Parameters:

  • u (Matrix)

    current left singular vectors (m × k)

  • s (Array<Float>)

    current singular values (k values)

  • c (Vector)

    new document vector (m × 1)

  • max_rank (Integer)

    maximum rank to maintain

  • epsilon (Float) (defaults to: EPSILON)

    threshold for zero detection

Returns:



43
44
45
46
47
48
49
50
51
52
53
54
# File 'lib/classifier/lsi/incremental_svd.rb', line 43

def update(u, s, c, max_rank:, epsilon: EPSILON)
  m_vec = project(u, c)
  u_times_m = u * m_vec
  p_vec = c - (u_times_m.is_a?(Vector) ? u_times_m : Vector[*u_times_m.to_a.flatten])
  p_norm = magnitude(p_vec)

  if p_norm > epsilon
    update_with_new_direction(u, s, m_vec, p_vec, p_norm, max_rank, epsilon)
  else
    update_in_span(u, s, m_vec, max_rank, epsilon)
  end
end