vision3d.transforms#

3D data augmentation transforms.

Classes

CopyPaste3D(target_counts[, min_points, ...])

Batch-level 3D copy-paste data augmentation.

PointJitter([sigma, p])

Add Gaussian noise to point xyz coordinates with probability p.

PointSample(n)

Subsample (or oversample with replacement) to exactly n points.

PointShuffle([p])

Randomly permute point order with probability p.

RandomFlip3D([axis, p])

Flip inputs along a 3D axis with probability p.

RandomRotate3D([angle_range, axis, p])

Rotate inputs around an axis by a random angle with probability p.

RandomScale3D([scale_range, p])

Scale inputs by a random uniform factor with probability p.

RandomTranslate3D([translation_range, p])

Translate inputs by a random 3D offset with probability p.

RangeFilter3D(point_cloud_range)

Drop points and boxes outside an axis-aligned 3D region.

Transform()

Base class for vision3d transforms.

class vision3d.transforms.CopyPaste3D(target_counts, min_points=5, max_database_size=None, p=1.0)[source]#

Bases: Transform

Batch-level 3D copy-paste data augmentation.

Maintains a lazy object database that grows as batches pass through. For each sample, pastes additional objects from the database to reach a target count per class. Objects are pasted at their original scene position from the source frame.

Operates on collated batches (tuple_of_inputs, tuple_of_targets), not individual samples. Each instance should be used with only one dataset to avoid cross-contamination.

CopyPaste3D must be the first transform in any pipeline, before any 3D spatial transform (RandomFlip3D, RandomRotate3D, RandomScale3D, RandomTranslate3D). Pasted objects are extracted and re-inserted in the source-frame geometry of the scene they came from. If a scene transform has already mutated the frame, the pasted objects will disagree with the rest of the scene and the resulting boxes/points will be inconsistent.

Parameters:
  • target_counts (dict[int, int]) – Dict mapping integer class label to desired object count per sample. E.g. {0: 15, 1: 10}.

  • min_points (int) – Minimum number of points an extracted object must have to be stored in the database. Default: 5.

  • max_database_size (int | None) – Maximum entries per class. None means unlimited. Default: None.

  • p (float) – Probability of applying the augmentation. Default: 1.0.

forward(*inputs)[source]#

Apply copy-paste augmentation to a collated batch.

Accepts any pytree structure containing PointCloud3D, BoundingBoxes3D, and optionally camera tensors and plain-tensor labels.

Returns:

The same pytree structure with modified leaves.

Parameters:

inputs (Any)

Return type:

Any

class vision3d.transforms.PointJitter(sigma=0.01, p=0.5)[source]#

Bases: _RandomApplyTransform

Add Gaussian noise to point xyz coordinates with probability p.

Parameters:
  • sigma (float) – Standard deviation of the Gaussian noise. Default: 0.01.

  • p (float) – Probability of applying. Default: 0.5.

make_params(flat_inputs)[source]#

Sample Gaussian noise.

Returns:

Dict with "noise" key containing [N, 3] tensor.

Parameters:

flat_inputs (list[Any])

Return type:

dict[str, Any]

transform(inpt, params)[source]#

Apply the noise.

Returns:

Jittered input.

Parameters:
Return type:

Any

class vision3d.transforms.PointSample(n)[source]#

Bases: Transform

Subsample (or oversample with replacement) to exactly n points.

If the point cloud has more than n points, a random subset is selected. If fewer, points are sampled with replacement to reach n.

Parameters:

n (int) – Target number of points.

make_params(flat_inputs)[source]#

Sample indices to reach exactly n points.

Returns:

Dict with "indices" key.

Parameters:

flat_inputs (list[Any])

Return type:

dict[str, Any]

transform(inpt, params)[source]#

Apply the sampling.

Returns:

Sampled input.

Parameters:
Return type:

Any

class vision3d.transforms.PointShuffle(p=0.5)[source]#

Bases: _RandomApplyTransform

Randomly permute point order with probability p.

Parameters:

p (float) – Probability of applying. Default: 0.5.

make_params(flat_inputs)[source]#

Sample a random permutation.

Returns:

Dict with "perm" key.

Parameters:

flat_inputs (list[Any])

Return type:

dict[str, Any]

transform(inpt, params)[source]#

Apply the permutation.

Returns:

Shuffled input.

Parameters:
Return type:

Any

class vision3d.transforms.RandomFlip3D(axis='x', p=0.5)[source]#

Bases: _RandomApplyTransform

Flip inputs along a 3D axis with probability p.

Operates on PointCloud3D and BoundingBoxes3D. Camera inputs (images, extrinsics, intrinsics) are rejected, since this transform flips only the 3D geometry and leaves the camera side untouched. To flip a scene that contains cameras, use torchvision.transforms.v2.RandomHorizontalFlip (world Y) or torchvision.transforms.v2.RandomVerticalFlip (world Z), which apply the matching image-space flip and update the camera tensors so projection stays consistent.

The world X-axis flip has no image-space equivalent, so RandomFlip3D remains the way to apply it (on camera-free samples).

Parameters:
  • axis (str) – Axis to flip along. One of "x", "y", "z".

  • p (float) – Probability of applying the flip. Default: 0.5.

check_inputs(flat_inputs)[source]#

Reject camera inputs.

Raises:

TypeError – If any camera tensor is present.

Parameters:

flat_inputs (list[Any])

Return type:

None

transform(inpt, params)[source]#

Apply the flip to a single input.

Returns:

Flipped input.

Parameters:
Return type:

Any

class vision3d.transforms.RandomRotate3D(angle_range=math.pi / 4, axis=(0.0, 0.0, 1.0), p=0.5)[source]#

Bases: _RandomApplyTransform

Rotate inputs around an axis by a random angle with probability p.

Operates on PointCloud3D, BoundingBoxes3D, and CameraExtrinsics.

Parameters:
  • angle_range (float) – Maximum rotation angle in radians. Sampled uniformly from [-angle_range, angle_range]. Default: pi/4.

  • axis (tuple[float, float, float]) – Rotation axis as a 3-tuple. Default: (0, 0, 1) (Z-up).

  • p (float) – Probability of applying the rotation. Default: 0.5.

make_params(flat_inputs)[source]#

Sample a random rotation matrix.

Returns:

Dict with "rotation_matrix" key containing a [3, 3] tensor.

Parameters:

flat_inputs (list[Any])

Return type:

dict[str, Any]

transform(inpt, params)[source]#

Apply the rotation to a single input.

Returns:

Rotated input.

Parameters:
Return type:

Any

class vision3d.transforms.RandomScale3D(scale_range=(0.95, 1.05), p=0.5)[source]#

Bases: _RandomApplyTransform

Scale inputs by a random uniform factor with probability p.

Operates on PointCloud3D, BoundingBoxes3D, and CameraExtrinsics.

Parameters:
  • scale_range (tuple[float, float]) – Scale factor range as (min, max). Default: (0.95, 1.05).

  • p (float) – Probability of applying the scaling. Default: 0.5.

make_params(flat_inputs)[source]#

Sample a random scale factor.

Returns:

Dict with "factor" key.

Parameters:

flat_inputs (list[Any])

Return type:

dict[str, Any]

transform(inpt, params)[source]#

Apply the scaling to a single input.

Returns:

Scaled input.

Parameters:
Return type:

Any

class vision3d.transforms.RandomTranslate3D(translation_range=0.5, p=0.5)[source]#

Bases: _RandomApplyTransform

Translate inputs by a random 3D offset with probability p.

Operates on PointCloud3D, BoundingBoxes3D, and CameraExtrinsics.

Parameters:
  • translation_range (float | tuple[float, float, float]) – Maximum translation per axis. Either a single float (symmetric range [-v, v] for all axes) or a tuple of three floats (tx, ty, tz) for per-axis ranges.

  • p (float) – Probability of applying the translation. Default: 0.5.

make_params(flat_inputs)[source]#

Sample a random offset.

Returns:

Dict with "offset" key containing a [3] tensor.

Parameters:

flat_inputs (list[Any])

Return type:

dict[str, Any]

transform(inpt, params)[source]#

Apply the translation to a single input.

Returns:

Translated input.

Parameters:
Return type:

Any

class vision3d.transforms.RangeFilter3D(point_cloud_range)[source]#

Bases: Transform

Drop points and boxes outside an axis-aligned 3D region.

Points are filtered by their xyz coordinates; boxes are filtered by their center (format-agnostic). Labels in targets are filtered in sync with boxes.

Must be applied after spatial augmentations (rotate / scale / translate can push data out of the sensor range) and before the model sees the data.

Parameters:

point_cloud_range (tuple[float, ...]) – Axis-aligned bounds (x_min, y_min, z_min, x_max, y_max, z_max).

forward(*inputs)[source]#

Filter points and boxes outside the configured range.

Accepts both a single sample dict and an (inputs, targets) pair.

Returns:

Filtered sample in the same structure as the input.

Parameters:

inputs (Any)

Return type:

Any

class vision3d.transforms.Transform[source]#

Bases: Module

Base class for vision3d transforms.

Mirrors torchvision.transforms.v2.Transform. Subclasses override transform() and use _call_kernel to dispatch to the correct kernel for each input type.

The class-level _transformed_types tuple lists the input types the transform operates on; inputs whose type is not listed flow through unchanged. Subclasses can additionally override check_inputs() to reject unsupported input combinations with a TypeError.

check_inputs(flat_inputs)[source]#

Validate inputs before transforming. Override to reject inputs.

The base implementation is a no-op. Subclasses raise TypeError for input combinations the transform cannot handle (e.g. a 3D-only transform that has no matching update for camera tensors).

Parameters:

flat_inputs (list[Any])

Return type:

None

make_params(flat_inputs)[source]#

Sample random parameters. Override for randomised transforms.

Returns:

Parameter dict passed to transform().

Parameters:

flat_inputs (list[Any])

Return type:

dict[str, Any]

transform(inpt, params)[source]#

Apply the transform to a single input. Must be overridden.

Parameters:
Return type:

Any

forward(*inputs)[source]#

Apply the transform to one or more inputs (dicts, tuples, etc.).

Do not override this; override transform() instead.

Returns:

Transformed inputs in the same structure as the input.

Parameters:

inputs (Any)

Return type:

Any

extra_repr()[source]#

Auto-generate repr from public attributes.

Returns:

Comma-separated key=value string.

Return type:

str