Structured OBS

Structured OBS (Optimal Brain Surgeon) via BlockSpec/ScopeSpec.

Compensation modes: local, full, split, interleaved.

sparsekit.pruners.obs.block_col_indices(block, num_cols, device=device(type='cpu'))[source]

Map each block to its original column indices.

Returns:

(*grid_shape[1:], block_numel)

long tensor.

Return type:

col_idx

Parameters:
sparsekit.pruners.obs.block_param_rc(block, num_cols, device=device(type='cpu'))[source]

Map each block to (param_row, param_col).

Unlike block_col_indices() this returns the full grid (including the row dimension) and both row and column indices.

Returns:

(*grid_shape, block_numel) long

tensor.

col_idx: (*grid_shape, block_numel) long

tensor.

Return type:

row_idx

Parameters:
class sparsekit.pruners.obs.StructuredOBS(scope, hessian, damp=0.0001, inv_h=None)[source]

Bases: object

Structured OBS pruner operating through ScopeSpec.

Parameters:
  • scope (ScopeSpec) – ScopeSpec defining the block and scope structure.

  • hessian (Tensor) – (K, K) Hessian matrix.

  • damp (float) – Damping factor for hessian regularization.

  • inv_h (Tensor | None) – Precomputed (K, K) damped inverse. Skips inversion if given.

static compute_inverse(hessian, damp=0.0001)[source]

Compute damped inverse of the Hessian matrix.

Parameters:
  • hessian (Tensor) – (K, K) symmetric positive semi-definite Hessian.

  • damp (float) – Damping factor. If < 1.0, scaled by mean diagonal; otherwise used as absolute value.

Returns:

(H + damp * I)^{-1}.

Return type:

(K, K) inverse of the damped Hessian

prune(nnz, block_size=2048, compensate='local', n_splits=1)[source]

Prune to nnz blocks per scope.

Phase 1: enumerate all C(bs, num_prune) subsets per scope, pick

the best per (row, scope) using C = H^{-1} submatrices.

Phase 2 (compensation):
  • ‘local’: within-scope only (fast, independent scopes)

  • ‘full’: sequential compensation to ALL K columns via C[P, :]

    (slower but ~44% better than SparseGPT)

  • ‘split’: like ‘full’ but recomputes C between column splits.

    Use n_splits to control granularity (2 = one C update at the midpoint).

  • ‘interleaved’: re-selects masks AND compensates at each split

    using recomputed C. Single shared C (O(K²) memory).

Parameters:
  • nnz (int) – Blocks to keep per scope.

  • block_size (int) – Column chunk size for subset search.

  • compensate (str) – ‘local’, ‘full’, ‘split’, or ‘interleaved’.

  • n_splits (int) – Number of column splits (for ‘split’/’interleaved’).

Return type:

None

prune_true_obs(nnz, ng=64, chunk_size=16, order='left_to_right', scoring='independent', c_dtype=None, progress_fn=None)[source]

Per-row True OBS with Schur complement updates.

Each row maintains its own C = inv(H), updated via Schur complement after pruning. Processes ng blocks simultaneously per batch.

Parameters:
  • nnz (int) – Blocks to keep per scope.

  • ng (int) – Number of scopes to process per batch.

  • chunk_size (int) – Rows to process simultaneously.

  • order (str) – "left_to_right" or "largest_first".

  • scoring (str) – "joint" (enumerate subsets) or "independent" (per-element w^2/diag(C) + topk).

  • c_dtype – Dtype for per-row C matrices. Default None uses fp16 for tensor-core Schur updates.

  • progress_fn – Optional callable(str) for progress messages.

Return type:

None