vision3d.transforms#

3D data augmentation transforms.

Classes

`CopyPaste3D`(target_counts[, min_points, ...])	Batch-level 3D copy-paste data augmentation.
`PointJitter`([sigma, p])	Add Gaussian noise to point xyz coordinates with probability `p`.
`PointSample`(n)	Subsample (or oversample with replacement) to exactly `n` points.
`PointShuffle`([p])	Randomly permute point order with probability `p`.
`RandomFlip3D`([axis, p])	Flip inputs along a 3D axis with probability `p`.
`RandomRotate3D`([angle_range, axis, p])	Rotate inputs around an axis by a random angle with probability `p`.
`RandomScale3D`([scale_range, p])	Scale inputs by a random uniform factor with probability `p`.
`RandomTransform`([p])	Base class for transforms applied with probability `p`.
`RandomTranslate3D`([translation_range, p])	Translate inputs by a random 3D offset with probability `p`.
`RangeFilter3D`(point_cloud_range)	Drop points and boxes outside an axis-aligned 3D region.
`Transform`()	Base class for vision3d transforms.

class vision3d.transforms.CopyPaste3D(target_counts, min_points=5, max_database_size=None, p=1.0)[source]#

Bases: Transform

Batch-level 3D copy-paste data augmentation.

Maintains a lazy object database that grows as batches pass through. For each sample, pastes additional objects from the database to reach a target count per class. Objects are pasted at their original scene position from the source frame.

Operates on collated batches (tuple_of_inputs, tuple_of_targets), not individual samples. Each instance should be used with only one dataset to avoid cross-contamination.

CopyPaste3D must be the first transform in any pipeline, before any 3D spatial transform (RandomFlip3D, RandomRotate3D, RandomScale3D, RandomTranslate3D). Pasted objects are extracted and re-inserted in the source-frame geometry of the scene they came from. If a scene transform has already mutated the frame, the pasted objects will disagree with the rest of the scene and the resulting boxes/points will be inconsistent.

Parameters:

target_counts (dict[int, int]) – Dict mapping integer class label to desired object count per sample. E.g. {0: 15, 1: 10}.
min_points (int) – Minimum number of points an extracted object must have to be stored in the database. Default: 5.
max_database_size (int | None) – Maximum entries per class. None means unlimited. Default: None.
p (float) – Probability of applying the augmentation. Default: 1.0.

forward(*inputs)[source]#

Apply copy-paste augmentation to a collated batch.

Accepts any pytree structure containing PointCloud3D, BoundingBoxes3D, and optionally camera tensors and plain-tensor labels.

Returns:: The same pytree structure with modified leaves.
Parameters:: inputs (Any)
Return type:: Any

class vision3d.transforms.PointJitter(sigma=0.01, p=0.5)[source]#

Bases: RandomTransform

Add Gaussian noise to point xyz coordinates with probability p.

Parameters:

sigma (float) – Standard deviation of the Gaussian noise. Default: 0.01.
p (float) – Probability of applying. Default: 0.5.

make_params(flat_inputs)[source]#

Sample Gaussian noise.

Returns:: Dict with "noise" key containing [N, 3] tensor.
Parameters:: flat_inputs (list[Any])
Return type:: dict[str, Any]

transform(inpt, params)[source]#

Apply the noise.

Returns:

Jittered input.

Parameters:

inpt (Any)
params (dict[str, Any])

Return type:

Any

class vision3d.transforms.PointSample(n)[source]#

Bases: Transform

Subsample (or oversample with replacement) to exactly n points.

If the point cloud has more than n points, a random subset is selected. If fewer, points are sampled with replacement to reach n.

Parameters:: n (int) – Target number of points.

make_params(flat_inputs)[source]#

Sample indices to reach exactly n points.

Returns:: Dict with "indices" key.
Parameters:: flat_inputs (list[Any])
Return type:: dict[str, Any]

transform(inpt, params)[source]#

Apply the sampling.

Returns:

Sampled input.

Parameters:

inpt (Any)
params (dict[str, Any])

Return type:

Any

class vision3d.transforms.PointShuffle(p=0.5)[source]#

Bases: RandomTransform

Randomly permute point order with probability p.

Parameters:: p (float) – Probability of applying. Default: 0.5.

make_params(flat_inputs)[source]#

Sample a random permutation.

Returns:: Dict with "perm" key.
Parameters:: flat_inputs (list[Any])
Return type:: dict[str, Any]

transform(inpt, params)[source]#

Apply the permutation.

Returns:

Shuffled input.

Parameters:

inpt (Any)
params (dict[str, Any])

Return type:

Any

class vision3d.transforms.RandomFlip3D(axis='x', p=0.5)[source]#

Bases: RandomTransform

Flip inputs along a 3D axis with probability p.

Dispatches to type-specific kernels for BoundingBoxes3D and PointCloud3D. Camera data (images, intrinsics, extrinsics) passes through unchanged.

Parameters:

axis (str) – Axis to flip along. One of "x", "y", "z".
p (float) – Probability of applying the flip. Default: 0.5.

transform(inpt, params)[source]#

Apply the flip to a single input.

Returns:

Flipped input.

Parameters:

inpt (Any)
params (dict[str, Any])

Return type:

Any

class vision3d.transforms.RandomRotate3D(angle_range=math.pi / 4, axis=(0.0, 0.0, 1.0), p=0.5)[source]#

Bases: RandomTransform

Rotate inputs around an axis by a random angle with probability p.

Dispatches to type-specific kernels for PointCloud3D, BoundingBoxes3D, and CameraExtrinsics.

Parameters:

angle_range (float) – Maximum rotation angle in radians. Sampled uniformly from [-angle_range, angle_range]. Default: pi/4.
axis (tuple[float, float, float]) – Rotation axis as a 3-tuple. Default: (0, 0, 1) (Z-up).
p (float) – Probability of applying the rotation. Default: 0.5.

make_params(flat_inputs)[source]#

Sample a random rotation matrix.

Returns:: Dict with "rotation_matrix" key containing a [3, 3] tensor.
Parameters:: flat_inputs (list[Any])
Return type:: dict[str, Any]

transform(inpt, params)[source]#

Apply the rotation to a single input.

Returns:

Rotated input.

Parameters:

inpt (Any)
params (dict[str, Any])

Return type:

Any

class vision3d.transforms.RandomScale3D(scale_range=(0.95, 1.05), p=0.5)[source]#

Bases: RandomTransform

Scale inputs by a random uniform factor with probability p.

Dispatches to type-specific kernels for PointCloud3D, BoundingBoxes3D, and CameraExtrinsics.

Parameters:

scale_range (tuple[float, float]) – Scale factor range as (min, max). Default: (0.95, 1.05).
p (float) – Probability of applying the scaling. Default: 0.5.

make_params(flat_inputs)[source]#

Sample a random scale factor.

Returns:: Dict with "factor" key.
Parameters:: flat_inputs (list[Any])
Return type:: dict[str, Any]

transform(inpt, params)[source]#

Apply the scaling to a single input.

Returns:

Scaled input.

Parameters:

inpt (Any)
params (dict[str, Any])

Return type:

Any

class vision3d.transforms.RandomTransform(p=0.5)[source]#

Bases: Transform

Base class for transforms applied with probability p.

Parameters:: p (float)

forward(*inputs)[source]#

Apply the transform with probability p.

Returns:: Transformed inputs, or the original inputs if skipped.
Parameters:: inputs (Any)
Return type:: Any

class vision3d.transforms.RandomTranslate3D(translation_range=0.5, p=0.5)[source]#

Bases: RandomTransform

Translate inputs by a random 3D offset with probability p.

Dispatches to type-specific kernels for PointCloud3D, BoundingBoxes3D, and CameraExtrinsics.

Parameters:

translation_range (float | tuple[float, float, float]) – Maximum translation per axis. Either a single float (symmetric range [-v, v] for all axes) or a tuple of three floats (tx, ty, tz) for per-axis ranges.
p (float) – Probability of applying the translation. Default: 0.5.

make_params(flat_inputs)[source]#

Sample a random offset.

Returns:: Dict with "offset" key containing a [3] tensor.
Parameters:: flat_inputs (list[Any])
Return type:: dict[str, Any]

transform(inpt, params)[source]#

Apply the translation to a single input.

Returns:

Translated input.

Parameters:

inpt (Any)
params (dict[str, Any])

Return type:

Any

class vision3d.transforms.RangeFilter3D(point_cloud_range)[source]#

Bases: Transform

Drop points and boxes outside an axis-aligned 3D region.

Points are filtered by their xyz coordinates; boxes are filtered by their center (format-agnostic). Labels in targets are filtered in sync with boxes.

Must be applied after spatial augmentations (rotate / scale / translate can push data out of the sensor range) and before the model sees the data.

Parameters:: point_cloud_range (tuple[float, ...]) – Axis-aligned bounds (x_min, y_min, z_min, x_max, y_max, z_max).

forward(*inputs)[source]#

Filter points and boxes outside the configured range.

Accepts both a single sample dict and an (inputs, targets) pair.

Returns:: Filtered sample in the same structure as the input.
Parameters:: inputs (Any)
Return type:: Any

class vision3d.transforms.Transform[source]#

Bases: Module

Base class for vision3d transforms.

Only TVTensor subclasses (e.g. BoundingBoxes3D, PointCloud3D) are transformed. Plain tensors (labels, scores, etc.) pass through unchanged.

Subclasses should override transform() and use _call_kernel to dispatch to the correct kernel for each input type.

make_params(flat_inputs)[source]#

Sample random parameters. Override for randomised transforms.

Returns:: Parameter dict passed to transform().
Parameters:: flat_inputs (list[Any])
Return type:: dict[str, Any]

transform(inpt, params)[source]#

Apply the transform to a single input. Must be overridden.

Parameters:

inpt (Any)
params (dict[str, Any])

Return type:

Any

forward(*inputs)[source]#

Apply the transform to one or more inputs (dicts, tuples, etc.).

Returns:: Transformed inputs in the same structure as the input.
Parameters:: inputs (Any)
Return type:: Any

extra_repr()[source]#

Auto-generate repr from public attributes.

Returns:: Comma-separated key=value string.
Return type:: str