Module: Classifier::LSI::IncrementalSVD
- Defined in:
- lib/classifier/lsi/incremental_svd.rb
Overview
Brand’s Incremental SVD Algorithm for LSI
Implements the algorithm from Brand (2006) “Fast low-rank modifications of the thin singular value decomposition” for adding documents to LSI without full SVD recomputation.
Given existing thin SVD: A ≈ U * S * V^T (with k components) When adding a new column c:
-
Project: m = U^T * c (project onto existing column space)
-
Residual: p = c - U * m (component orthogonal to U)
-
Orthonormalize: If ||p|| > ε: p̂ = p / ||p||
-
Form K matrix:
-
If ||p|| > ε: K = [diag(s), m; 0, ||p||] (rank grows by 1)
-
If ||p|| ≈ 0: K = diag(s) + m * e_last^T (no new direction)
-
-
Small SVD: Compute SVD of K (only (k+1) × (k+1) matrix!)
-
Update:
-
U_new = [U, p̂] * U’
-
S_new = S’
-
Constant Summary collapse
- EPSILON =
1e-10
Class Method Summary collapse
-
.project(u, raw_vec) ⇒ Vector
Projects a document vector onto the semantic space defined by U.
-
.update(u, s, c, max_rank:, epsilon: EPSILON) ⇒ Array<Matrix, Array<Float>>
Updates SVD with a new document vector using Brand’s algorithm.
Class Method Details
.project(u, raw_vec) ⇒ Vector
Projects a document vector onto the semantic space defined by U. Returns the LSI representation: lsi_vec = U^T * raw_vec
64 65 66 |
# File 'lib/classifier/lsi/incremental_svd.rb', line 64 def project(u, raw_vec) u.transpose * raw_vec end |
.update(u, s, c, max_rank:, epsilon: EPSILON) ⇒ Array<Matrix, Array<Float>>
Updates SVD with a new document vector using Brand’s algorithm.
43 44 45 46 47 48 49 50 51 52 53 54 |
# File 'lib/classifier/lsi/incremental_svd.rb', line 43 def update(u, s, c, max_rank:, epsilon: EPSILON) m_vec = project(u, c) u_times_m = u * m_vec p_vec = c - (u_times_m.is_a?(Vector) ? u_times_m : Vector[*u_times_m.to_a.flatten]) p_norm = magnitude(p_vec) if p_norm > epsilon update_with_new_direction(u, s, m_vec, p_vec, p_norm, max_rank, epsilon) else update_in_span(u, s, m_vec, max_rank, epsilon) end end |