Concepts
========

This page explains the core abstractions and how they compose to
express arbitrary structured sparsity patterns.

The Hierarchy
-------------

Sparsekit uses a three-level hierarchy::

   nn.Parameter
    └── View        (strided view of the , write-through)
    └── BlockSpec   (block grid, single parameter)
    └── ScopeSpec   (scopes of blocks, pruning decision unit)

Each level adds structure on top of the previous one:

1. **View** -- a strided ``as_strided`` view of a ``nn.Parameter``
2. **BlockSpec** -- divides the view into a grid of blocks
3. **ScopeSpec** -- organizes blocks into scopes (competition units for pruning)
4. **StructuredOBS** -- performs OBS pruning through the ScopeSpec

For multi-parameter sparsity, there are coupling variants:

- **BlockCoupling** -- couples multiple ``BlockSpec`` objects
- **ScopeCoupling** -- couples multiple ``ScopeSpec`` objects

View: Strided Write-Through Views
----------------------------------

:class:`~sparsekit.view.View` wraps an ``nn.Parameter`` with an
arbitrary ``(shape, stride)`` view using ``torch.as_strided``.  The key
property is **write-through**: pruning and masking operations modify the
underlying parameter storage directly, without copying the weight tensor.
Intermediate computations (norms, thresholds, mask broadcasting) may
allocate temporaries, but the weights themselves are never duplicated.

.. code-block:: python

   from sparsekit import View

   param = torch.nn.Parameter(torch.randn(2560, 9728))
   # View as (M, K/16, 8, 2) with stride (K, 16, 1, 8)
   view = View(param, shape=(2560, 608, 8, 2), stride=(9728, 16, 1, 8))

This is essential for coupled sparsity patterns where elements that are
far apart in memory must share a pruning decision.

BlockSpec: Block Grids
-----------------------

:class:`~sparsekit.block.BlockSpec` treats a tensor (or View) as a
grid of blocks.  Each block is a small sub-tensor defined by
``shape``.

.. code-block:: python

   from sparsekit import BlockSpec

   param = torch.nn.Parameter(torch.randn(8, 16))
   block = BlockSpec(param, shape=(2, 4))
   # grid_shape = (4, 4), block_numel = 8

Key operations:

- ``norms(values)`` -- L2 norm per block
- ``hard_threshold(thresholds)`` -- zero blocks below threshold
- ``soft_threshold(thresholds)`` -- proximal L1 operator
- ``get_masks(block_masks)`` -- convert block mask to element mask
- ``apply_multiplier(multiplier)`` -- scale each block

All threshold/mask operations write through to the parameter.

ScopeSpec: Scopes of Blocks
------------------------------------

:class:`~sparsekit.scope.ScopeSpec` divides the block grid into
competition scopes.  Within each scope, blocks compete based on their
norms: the top-``nnz`` blocks survive; the rest are pruned.

.. code-block:: python

   from sparsekit import BlockSpec, ScopeSpec

   block = BlockSpec(param, shape=(1, 1))        # scalar blocks
   scope = ScopeSpec(block, shape=(1, 4))    # 4 blocks per scope
   scope.hard_threshold(nnz=2)                # keep 2 of 4

The ``shape`` specifies how many blocks along each dimension form one
scope.  Use ``-1`` to span the entire dimension.

Key operations:

- ``block_to_scope(t)`` -- reshape block tensor to scope layout
- ``scope_to_block(t)`` -- broadcast scope values back to block grid
- ``block_norms(values)`` -- block norms in scope layout
- ``kth_largest(values, nnz)`` -- per-scope pruning thresholds
- ``hard_threshold(nnz=...)`` -- prune in-place
- ``get_masks(nnz)`` -- return element-level masks without pruning

BlockCoupling and ScopeCoupling
------------------------------------

When sparsity must be shared across parameters (e.g., coupled 2:4 where
column pairs 8 apart must have identical masks), use the coupling classes.

:class:`~sparsekit.block.BlockCoupling` merges multiple BlockSpec objects
into one virtual block grid.  The ``orders`` parameter specifies dimension
permutations to align their grids.

:class:`~sparsekit.scope.ScopeCoupling` does the same at the
scope level: it concatenates block norms from all child ScopeSpec
objects along the last dimension, then applies a single threshold across
all of them.

.. code-block:: python

   from sparsekit import BlockSpec, ScopeSpec, ScopeCoupling

   block_a = BlockSpec(param_a, shape=(2, 2), name="A")
   block_b = BlockSpec(param_b, shape=(2, 2), name="B")

   scope_a = ScopeSpec(block_a, shape=(1, 1), name="pA")
   scope_b = ScopeSpec(block_b, shape=(1, 4), name="pB")

   coupled = ScopeCoupling(
       [scope_a, scope_b],
       orders=[(0, 1), (1, 0)],   # align scope grids
   )
   coupled.hard_threshold(nnz=2)

StructuredOBS: Optimal Brain Surgeon
-------------------------------------

:class:`~sparsekit.pruners.obs.StructuredOBS` implements the OBS pruning
algorithm using the ScopeSpec abstraction.  It uses the inverse Hessian
``C = (H + damp*I)^{-1}`` to:

1. **Select** which blocks to prune (minimize OBS cost)
2. **Compensate** remaining weights to reduce the pruning error

Compensation modes:

- ``"local"`` -- compensate within each scope only (fast, independent)
- ``"full"`` -- sequential compensation to all K columns via ``C[P, :]``
- ``"split"`` -- like ``"full"`` but recomputes C between column splits
- ``"interleaved"`` -- re-selects masks AND compensates at each split
  (highest quality, nearly matches SparseGPT)

.. code-block:: python

   from sparsekit import BlockSpec, ScopeSpec, StructuredOBS

   block = BlockSpec(W, shape=(1, 1))
   scope = ScopeSpec(block, shape=(1, 4))

   hessian = (X.T @ X) / X.shape[0]
   inv_h = StructuredOBS.compute_inverse(hessian, damp=1e-4)

   obs = StructuredOBS(scope, hessian, inv_h=inv_h)
   obs.prune(nnz=2, compensate="interleaved", n_splits=64)