Structured OBS

Structured OBS (Optimal Brain Surgeon) via BlockSpec/ScopeSpec.

Compensation modes: local, full, split, interleaved.

sparsekit.pruners.obs.block_col_indices(block, num_cols, device=device(type='cpu'))[source]

Map each block to its original column indices.

Returns:

(*grid_shape[1:], block_numel): long tensor.

Return type:

col_idx

Parameters:

block (BlockSpec)
num_cols (int)
device (device)

sparsekit.pruners.obs.block_param_rc(block, num_cols, device=device(type='cpu'))[source]

Map each block to (param_row, param_col).

Unlike block_col_indices() this returns the full grid (including the row dimension) and both row and column indices.

Returns:

(*grid_shape, block_numel) long: tensor.
col_idx: (*grid_shape, block_numel) long: tensor.

Return type:

row_idx

Parameters:

block (BlockSpec)
num_cols (int)
device (device)

class sparsekit.pruners.obs.StructuredOBS(scope, hessian, damp=0.0001, inv_h=None)[source]

Bases: object

Structured OBS pruner operating through ScopeSpec.

Parameters:

scope (ScopeSpec) – ScopeSpec defining the block and scope structure.
hessian (Tensor) – (K, K) Hessian matrix.
damp (float) – Damping factor for hessian regularization.
inv_h (Tensor | None) – Precomputed (K, K) damped inverse. Skips inversion if given.

static compute_inverse(hessian, damp=0.0001)[source]

Compute damped inverse of the Hessian matrix.

Parameters:

hessian (Tensor) – (K, K) symmetric positive semi-definite Hessian.
damp (float) – Damping factor. If < 1.0, scaled by mean diagonal; otherwise used as absolute value.

Returns:

(H + damp * I)^{-1}.

Return type:

(K, K) inverse of the damped Hessian

prune(nnz, block_size=2048, compensate='local', n_splits=1)[source]

Prune to nnz blocks per scope.

Phase 1: enumerate all C(bs, num_prune) subsets per scope, pick

the best per (row, scope) using C = H^{-1} submatrices.

Phase 2 (compensation):

‘local’: within-scope only (fast, independent scopes)
‘full’: sequential compensation to ALL K columns via C[P, :]
(slower but ~44% better than SparseGPT)
‘split’: like ‘full’ but recomputes C between column splits.
Use n_splits to control granularity (2 = one C update at the midpoint).
‘interleaved’: re-selects masks AND compensates at each split
using recomputed C. Single shared C (O(K²) memory).

Parameters:

nnz (int) – Blocks to keep per scope.
block_size (int) – Column chunk size for subset search.
compensate (str) – ‘local’, ‘full’, ‘split’, or ‘interleaved’.
n_splits (int) – Number of column splits (for ‘split’/’interleaved’).

Return type:

None

prune_true_obs(nnz, ng=64, chunk_size=16, order='left_to_right', scoring='independent', c_dtype=None, progress_fn=None)[source]

Per-row True OBS with Schur complement updates.

Each row maintains its own C = inv(H), updated via Schur complement after pruning. Processes ng blocks simultaneously per batch.

Parameters:

nnz (int) – Blocks to keep per scope.
ng (int) – Number of scopes to process per batch.
chunk_size (int) – Rows to process simultaneously.
order (str) – "left_to_right" or "largest_first".
scoring (str) – "joint" (enumerate subsets) or "independent" (per-element w^2/diag(C) + topk).
c_dtype – Dtype for per-row C matrices. Default None uses fp16 for tensor-core Schur updates.
progress_fn – Optional callable(str) for progress messages.

Return type:

None