vision3d.transforms#

3D data augmentation transforms.

Classes

CopyPaste3D(target_counts[, min_points, ...])

Batch-level 3D copy-paste data augmentation.

PointJitter([sigma, p])

Add Gaussian noise to point xyz coordinates with probability p.

PointSample(n)

Subsample (or oversample with replacement) to exactly n points.

PointShuffle([p])

Randomly permute point order with probability p.

RandomFlip3D([axis, p])

Flip inputs along a 3D axis with probability p.

RandomRotate3D([angle_range, axis, p])

Rotate inputs around an axis by a random angle with probability p.

RandomScale3D([scale_range, p])

Scale inputs by a random uniform factor with probability p.

RandomTransform([p])

Base class for transforms applied with probability p.

RandomTranslate3D([translation_range, p])

Translate inputs by a random 3D offset with probability p.

RangeFilter3D(point_cloud_range)

Drop points and boxes outside an axis-aligned 3D region.

Transform()

Base class for vision3d transforms.

class vision3d.transforms.CopyPaste3D(target_counts, min_points=5, max_database_size=None, p=1.0)[source]#

Bases: Transform

Batch-level 3D copy-paste data augmentation.

Maintains a lazy object database that grows as batches pass through. For each sample, pastes additional objects from the database to reach a target count per class. Objects are pasted at their original scene position from the source frame.

Operates on collated batches (tuple_of_inputs, tuple_of_targets), not individual samples. Each instance should be used with only one dataset to avoid cross-contamination.

CopyPaste3D must be the first transform in any pipeline, before any 3D spatial transform (RandomFlip3D, RandomRotate3D, RandomScale3D, RandomTranslate3D). Pasted objects are extracted and re-inserted in the source-frame geometry of the scene they came from. If a scene transform has already mutated the frame, the pasted objects will disagree with the rest of the scene and the resulting boxes/points will be inconsistent.

Parameters:
  • target_counts (dict[int, int]) – Dict mapping integer class label to desired object count per sample. E.g. {0: 15, 1: 10}.

  • min_points (int) – Minimum number of points an extracted object must have to be stored in the database. Default: 5.

  • max_database_size (int | None) – Maximum entries per class. None means unlimited. Default: None.

  • p (float) – Probability of applying the augmentation. Default: 1.0.

forward(*inputs)[source]#

Apply copy-paste augmentation to a collated batch.

Accepts any pytree structure containing PointCloud3D, BoundingBoxes3D, and optionally camera tensors and plain-tensor labels.

Returns:

The same pytree structure with modified leaves.

Parameters:

inputs (Any)

Return type:

Any

class vision3d.transforms.PointJitter(sigma=0.01, p=0.5)[source]#

Bases: RandomTransform

Add Gaussian noise to point xyz coordinates with probability p.

Parameters:
  • sigma (float) – Standard deviation of the Gaussian noise. Default: 0.01.

  • p (float) – Probability of applying. Default: 0.5.

make_params(flat_inputs)[source]#

Sample Gaussian noise.

Returns:

Dict with "noise" key containing [N, 3] tensor.

Parameters:

flat_inputs (list[Any])

Return type:

dict[str, Any]

transform(inpt, params)[source]#

Apply the noise.

Returns:

Jittered input.

Parameters:
Return type:

Any

class vision3d.transforms.PointSample(n)[source]#

Bases: Transform

Subsample (or oversample with replacement) to exactly n points.

If the point cloud has more than n points, a random subset is selected. If fewer, points are sampled with replacement to reach n.

Parameters:

n (int) – Target number of points.

make_params(flat_inputs)[source]#

Sample indices to reach exactly n points.

Returns:

Dict with "indices" key.

Parameters:

flat_inputs (list[Any])

Return type:

dict[str, Any]

transform(inpt, params)[source]#

Apply the sampling.

Returns:

Sampled input.

Parameters:
Return type:

Any

class vision3d.transforms.PointShuffle(p=0.5)[source]#

Bases: RandomTransform

Randomly permute point order with probability p.

Parameters:

p (float) – Probability of applying. Default: 0.5.

make_params(flat_inputs)[source]#

Sample a random permutation.

Returns:

Dict with "perm" key.

Parameters:

flat_inputs (list[Any])

Return type:

dict[str, Any]

transform(inpt, params)[source]#

Apply the permutation.

Returns:

Shuffled input.

Parameters:
Return type:

Any

class vision3d.transforms.RandomFlip3D(axis='x', p=0.5)[source]#

Bases: RandomTransform

Flip inputs along a 3D axis with probability p.

Dispatches to type-specific kernels for BoundingBoxes3D and PointCloud3D. Camera data (images, intrinsics, extrinsics) passes through unchanged.

Parameters:
  • axis (str) – Axis to flip along. One of "x", "y", "z".

  • p (float) – Probability of applying the flip. Default: 0.5.

transform(inpt, params)[source]#

Apply the flip to a single input.

Returns:

Flipped input.

Parameters:
Return type:

Any

class vision3d.transforms.RandomRotate3D(angle_range=math.pi / 4, axis=(0.0, 0.0, 1.0), p=0.5)[source]#

Bases: RandomTransform

Rotate inputs around an axis by a random angle with probability p.

Dispatches to type-specific kernels for PointCloud3D, BoundingBoxes3D, and CameraExtrinsics.

Parameters:
  • angle_range (float) – Maximum rotation angle in radians. Sampled uniformly from [-angle_range, angle_range]. Default: pi/4.

  • axis (tuple[float, float, float]) – Rotation axis as a 3-tuple. Default: (0, 0, 1) (Z-up).

  • p (float) – Probability of applying the rotation. Default: 0.5.

make_params(flat_inputs)[source]#

Sample a random rotation matrix.

Returns:

Dict with "rotation_matrix" key containing a [3, 3] tensor.

Parameters:

flat_inputs (list[Any])

Return type:

dict[str, Any]

transform(inpt, params)[source]#

Apply the rotation to a single input.

Returns:

Rotated input.

Parameters:
Return type:

Any

class vision3d.transforms.RandomScale3D(scale_range=(0.95, 1.05), p=0.5)[source]#

Bases: RandomTransform

Scale inputs by a random uniform factor with probability p.

Dispatches to type-specific kernels for PointCloud3D, BoundingBoxes3D, and CameraExtrinsics.

Parameters:
  • scale_range (tuple[float, float]) – Scale factor range as (min, max). Default: (0.95, 1.05).

  • p (float) – Probability of applying the scaling. Default: 0.5.

make_params(flat_inputs)[source]#

Sample a random scale factor.

Returns:

Dict with "factor" key.

Parameters:

flat_inputs (list[Any])

Return type:

dict[str, Any]

transform(inpt, params)[source]#

Apply the scaling to a single input.

Returns:

Scaled input.

Parameters:
Return type:

Any

class vision3d.transforms.RandomTransform(p=0.5)[source]#

Bases: Transform

Base class for transforms applied with probability p.

Parameters:

p (float)

forward(*inputs)[source]#

Apply the transform with probability p.

Returns:

Transformed inputs, or the original inputs if skipped.

Parameters:

inputs (Any)

Return type:

Any

class vision3d.transforms.RandomTranslate3D(translation_range=0.5, p=0.5)[source]#

Bases: RandomTransform

Translate inputs by a random 3D offset with probability p.

Dispatches to type-specific kernels for PointCloud3D, BoundingBoxes3D, and CameraExtrinsics.

Parameters:
  • translation_range (float | tuple[float, float, float]) – Maximum translation per axis. Either a single float (symmetric range [-v, v] for all axes) or a tuple of three floats (tx, ty, tz) for per-axis ranges.

  • p (float) – Probability of applying the translation. Default: 0.5.

make_params(flat_inputs)[source]#

Sample a random offset.

Returns:

Dict with "offset" key containing a [3] tensor.

Parameters:

flat_inputs (list[Any])

Return type:

dict[str, Any]

transform(inpt, params)[source]#

Apply the translation to a single input.

Returns:

Translated input.

Parameters:
Return type:

Any

class vision3d.transforms.RangeFilter3D(point_cloud_range)[source]#

Bases: Transform

Drop points and boxes outside an axis-aligned 3D region.

Points are filtered by their xyz coordinates; boxes are filtered by their center (format-agnostic). Labels in targets are filtered in sync with boxes.

Must be applied after spatial augmentations (rotate / scale / translate can push data out of the sensor range) and before the model sees the data.

Parameters:

point_cloud_range (tuple[float, ...]) – Axis-aligned bounds (x_min, y_min, z_min, x_max, y_max, z_max).

forward(*inputs)[source]#

Filter points and boxes outside the configured range.

Accepts both a single sample dict and an (inputs, targets) pair.

Returns:

Filtered sample in the same structure as the input.

Parameters:

inputs (Any)

Return type:

Any

class vision3d.transforms.Transform[source]#

Bases: Module

Base class for vision3d transforms.

Only TVTensor subclasses (e.g. BoundingBoxes3D, PointCloud3D) are transformed. Plain tensors (labels, scores, etc.) pass through unchanged.

Subclasses should override transform() and use _call_kernel to dispatch to the correct kernel for each input type.

make_params(flat_inputs)[source]#

Sample random parameters. Override for randomised transforms.

Returns:

Parameter dict passed to transform().

Parameters:

flat_inputs (list[Any])

Return type:

dict[str, Any]

transform(inpt, params)[source]#

Apply the transform to a single input. Must be overridden.

Parameters:
Return type:

Any

forward(*inputs)[source]#

Apply the transform to one or more inputs (dicts, tuples, etc.).

Returns:

Transformed inputs in the same structure as the input.

Parameters:

inputs (Any)

Return type:

Any

extra_repr()[source]#

Auto-generate repr from public attributes.

Returns:

Comma-separated key=value string.

Return type:

str