.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/transforms/plot_transforms.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_transforms_plot_transforms.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_transforms_plot_transforms.py:


Augmenting samples with vision3d transforms
===========================================

This example showcases :mod:`vision3d.transforms` on the nuScenes dataset.

Transforms automatically dispatch on the tensor types from
:mod:`vision3d.tensors` carried by the sample, so a geometric transform
like :class:`~vision3d.transforms.RandomRotate3D` updates points, boxes,
and extrinsics together without requiring any special handling.

Every transform in :mod:`vision3d.transforms` is input and dataset
agnostic by design. They support every
:class:`~vision3d.tensors.BoundingBox3DFormat`, run on lidar-only,
camera-only, and fusion samples, and make no assumptions about scene
composition such as camera count, sensor layout, or axis convention.

.. GENERATED FROM PYTHON SOURCE LINES 20-24

Construct the dataset
---------------------
Grab a single ``(inputs, targets)`` sample to act as the baseline that
every transform is applied to.

.. GENERATED FROM PYTHON SOURCE LINES 24-40

.. code-block:: Python


    from pathlib import Path

    import torch

    from vision3d.datasets import NuScenes3D

    NUSCENES_ROOT = Path("~/.cache/vision3d/nuscenes-mini").expanduser()
    FRAME_INDEX = 100

    torch.manual_seed(42)

    dataset = NuScenes3D(NUSCENES_ROOT, version="v1.0-mini", split="train", download=True)
    inputs, targets = dataset[FRAME_INDEX]
    print(f"num boxes: {targets['boxes'].shape[0]}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    num boxes: 60


.. GENERATED FROM PYTHON SOURCE LINES 41-45

Visualize the baseline sample
-----------------------------
Render the original sample in an embedded Rerun viewer to use as
a reference for the transformed scenes shown later in this example.

.. GENERATED FROM PYTHON SOURCE LINES 45-74

.. code-block:: Python


    import rerun as rr
    import rerun.blueprint as rrb

    from vision3d.viz import fusion_layout, log_sample

    label_to_id = dataset.class_to_idx

    rr.init("vision3d_original", spawn=True)
    rr.send_blueprint(
        rrb.Blueprint(
            fusion_layout(
                NuScenes3D.camera_names,
                NuScenes3D.camera_grid,
                entity_prefix="original",
                name="Original",
            ),
            rrb.TimePanel(state="hidden"),
        )
    )
    rr.log("original", rr.ViewCoordinates.RIGHT_HAND_Z_UP, static=True)
    log_sample(
        inputs,
        targets,
        entity_prefix="original",
        label_to_id=label_to_id,
        jpeg_quality=75,
    )


.. rerun-embed:: vision3d_original.rrd
   :height: 60vh


.. GENERATED FROM PYTHON SOURCE LINES 75-84

Apply a single transform
------------------------
Every transform is a :class:`torchvision.transforms.v2.Transform` that
accepts ``(inputs, targets)`` and returns the transformed pair. The
:class:`~vision3d.tensors.PointCloud3D`,
:class:`~vision3d.tensors.CameraImages`, and
:class:`~vision3d.tensors.BoundingBoxes3D` semantic tensor types
steer dispatch, so a single call updates geometry, imagery, and box
annotations with geometric and photometric consistency.

.. GENERATED FROM PYTHON SOURCE LINES 84-93

.. code-block:: Python


    import math

    from vision3d.transforms import RandomRotate3D

    rotate = RandomRotate3D(angle_range=math.pi / 4, p=1.0)
    r_inputs, r_targets = rotate(inputs, targets)
    print(f"rotated boxes shape: {tuple(r_targets['boxes'].shape)}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    rotated boxes shape: (60, 7)


.. GENERATED FROM PYTHON SOURCE LINES 94-119

Geometric safety
----------------
vision3d transforms mirror the torchvision v2 dispatch model: each
transform declares the input types it operates on via the class-level
``_transformed_types`` tuple, and any input whose type is not listed
passes through unchanged. Transforms whose operation would only be
correct on a subset of scene types additionally override
:meth:`~vision3d.transforms.Transform.check_inputs` to raise
:class:`TypeError` for input combinations they cannot handle. Together
these guard against silently producing geometrically inconsistent
scenes (e.g. flipping the lidar but not the camera image alongside
it).

For example, :class:`~vision3d.transforms.RandomFlip3D` operates on
:class:`~vision3d.tensors.PointCloud3D` and
:class:`~vision3d.tensors.BoundingBoxes3D`, and its ``check_inputs``
refuses samples that also carry camera tensors (images, extrinsics,
intrinsics): flipping the 3D scene without coordinated changes to the
camera side would break geometric consistency. Running it on a fusion
dataset sample therefore raises a :class:`TypeError`.

The error signals that the transform is not compatible with a fusion
pipeline. :class:`~vision3d.transforms.RandomFlip3D` is intended for
lidar-only training pipelines, where there are no camera tensors to
fall out of correspondence.

.. GENERATED FROM PYTHON SOURCE LINES 119-128

.. code-block:: Python


    from vision3d.transforms import RandomFlip3D

    flip = RandomFlip3D(axis="x", p=1.0)
    try:
        flip(inputs, targets)
    except TypeError as e:
        print(e)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    RandomFlip3D cannot operate on samples that contain camera tensors, since it flips only the 3D geometry. For samples that mix camera and point-cloud tensors, use torchvision's RandomHorizontalFlip (world Y) or RandomVerticalFlip (world Z): these flip the point clouds, boxes, and images together and update the camera tensors so projection stays consistent.


.. GENERATED FROM PYTHON SOURCE LINES 129-143

Camera-coordinated flips
------------------------
Flipping a fusion sample *is* well defined when the flip is expressed in
image space rather than world space. vision3d registers
``horizontal_flip`` / ``vertical_flip`` kernels for every camera tensor,
so torchvision's
:class:`~torchvision.transforms.v2.RandomHorizontalFlip` and
:class:`~torchvision.transforms.v2.RandomVerticalFlip` update the images,
intrinsics, and extrinsics together with the points and boxes.
For an upright camera rig a horizontal image flip maps to
a world **Y** reflection and a vertical flip to a world **Z** reflection.
These two transforms appear in the showcase below. The remaining
``RandomFlip3D`` world **X** flip has no image-space equivalent and stays
lidar-only.

.. GENERATED FROM PYTHON SOURCE LINES 143-149

.. code-block:: Python


    from torchvision.transforms import v2

    hflip_inputs, hflip_targets = v2.RandomHorizontalFlip(p=1.0)(inputs, targets)
    print(f"image-flipped boxes shape: {tuple(hflip_targets['boxes'].shape)}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    image-flipped boxes shape: (60, 7)


.. GENERATED FROM PYTHON SOURCE LINES 150-159

Composing transforms
--------------------
vision3d transforms are designed to be chained together to build
advanced data-augmentation pipelines to be used during training.
The standard :class:`torchvision.transforms.v2.Compose` can run 3D
transforms alongside any tensor-aware torchvision image transform.
torchvision image transforms see only the
:class:`~vision3d.tensors.CameraImages` tensor and leave 3D geometry
untouched, so mixing them is safe.

.. GENERATED FROM PYTHON SOURCE LINES 159-187

.. code-block:: Python


    from torchvision.transforms import v2

    from vision3d.transforms import (
        PointJitter,
        RandomScale3D,
        RandomTranslate3D,
        RangeFilter3D,
    )

    compose = v2.Compose(
        [
            RandomRotate3D(angle_range=math.pi / 4, p=1.0),
            RandomScale3D(scale_range=(0.7, 1.3), p=1.0),
            RandomTranslate3D(translation_range=5.0, p=1.0),
            PointJitter(sigma=0.1, p=1.0),
            RangeFilter3D(point_cloud_range=(-50, -50, -5, 50, 50, 3)),
            v2.Resize(size=[450, 800]),
            v2.CenterCrop(size=[400, 700]),
            v2.ColorJitter(brightness=0.6, contrast=0.6, saturation=0.6, hue=0.3),
        ]
    )

    c_inputs, c_targets = compose(inputs, targets)
    print(f"composed points: {tuple(c_inputs['points'].shape)}")
    print(f"composed images: {tuple(c_inputs['images'].shape)}")
    print(f"composed boxes:  {tuple(c_targets['boxes'].shape)}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    composed points: (12007, 5)
    composed images: (6, 3, 400, 700)
    composed boxes:  (22, 7)


.. GENERATED FROM PYTHON SOURCE LINES 188-198

Cross-sample augmentation with CopyPaste3D
------------------------------------------
:class:`~vision3d.transforms.CopyPaste3D` is an advanced augmentation
method based on the ground-truth sampling technique first introduced
in `SECOND <https://www.mdpi.com/1424-8220/18/10/3337>`_. It improves
scene diversity by injecting instances from other scenes into the
current one. Unlike single-sample transforms
:class:`~vision3d.transforms.CopyPaste3D`
operates on collated batches and reads from an internal object
database that grows lazily with each seen batch.

.. GENERATED FROM PYTHON SOURCE LINES 198-246

.. code-block:: Python


    from torch.utils.data import DataLoader, Subset

    from vision3d.datasets import collate_fn
    from vision3d.transforms import CopyPaste3D

    target_counts = {
        dataset.class_to_idx["car"]: 30,
        dataset.class_to_idx["pedestrian"]: 20,
        dataset.class_to_idx["traffic_cone"]: 15,
    }
    copy_paste = CopyPaste3D(target_counts=target_counts, min_points=5)

    dataset_range = list(range(max(0, FRAME_INDEX - 10), FRAME_INDEX))
    dataset_loader = DataLoader(
        Subset(dataset, dataset_range),
        batch_size=2,
        collate_fn=collate_fn,
    )
    for epoch in range(2):
        for batch_inputs, batch_targets in dataset_loader:
            copy_paste(batch_inputs, batch_targets)

    cp_inputs, cp_targets = copy_paste((inputs,), (targets,))
    print(f"boxes before: {targets['boxes'].shape[0]}")
    print(f"boxes after CopyPaste3D:  {cp_targets[0]['boxes'].shape[0]}")

    rr.init("vision3d_copy_paste", spawn=True)
    rr.send_blueprint(
        rrb.Blueprint(
            fusion_layout(
                NuScenes3D.camera_names,
                NuScenes3D.camera_grid,
                entity_prefix="copy_paste",
                name="CopyPaste3D",
            ),
            rrb.TimePanel(state="hidden"),
        )
    )
    rr.log("copy_paste", rr.ViewCoordinates.RIGHT_HAND_Z_UP, static=True)
    log_sample(
        cp_inputs[0],
        cp_targets[0],
        entity_prefix="copy_paste",
        label_to_id=label_to_id,
        jpeg_quality=75,
    )


.. rerun-embed:: vision3d_copy_paste.rrd
   :height: 60vh


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    boxes before: 60
    boxes after CopyPaste3D:  73


.. GENERATED FROM PYTHON SOURCE LINES 247-252

Transforms showcase
------------------------
View every transform side by side in the embedded Rerun viewer, each
on its own tab. Compare each tab against the baseline viewer at the
top of the page.

.. GENERATED FROM PYTHON SOURCE LINES 252-318

.. code-block:: Python


    from vision3d.transforms import PointSample, PointShuffle

    transforms = [
        (
            "translate",
            "RandomTranslate3D(5.0)",
            RandomTranslate3D(translation_range=5.0, p=1.0),
        ),
        ("rotate", "RandomRotate3D(pi/4)", RandomRotate3D(angle_range=math.pi / 4, p=1.0)),
        (
            "scale",
            "RandomScale3D(0.25, 4.0)",
            RandomScale3D(scale_range=(0.25, 4.0), p=1.0),
        ),
        ("hflip", "RandomHorizontalFlip", v2.RandomHorizontalFlip(p=1.0)),
        ("vflip", "RandomVerticalFlip", v2.RandomVerticalFlip(p=1.0)),
        (
            "color_jitter",
            "ColorJitter",
            v2.ColorJitter(brightness=0.8, contrast=0.8, saturation=0.8, hue=0.4),
        ),
        ("gaussian_blur", "GaussianBlur", v2.GaussianBlur(kernel_size=31, sigma=10.0)),
        ("solarize", "Solarize", v2.RandomSolarize(threshold=0.5, p=1.0)),
        ("resize_half", "Resize(half)", v2.Resize(size=[450, 800])),
        ("center_crop", "CenterCrop()", v2.CenterCrop(size=[600, 800])),
        ("pad", "Pad(100)", v2.Pad(padding=100)),
        ("point_shuffle", "PointShuffle", PointShuffle(p=1.0)),
        ("point_sample", "PointSample(4096)", PointSample(n=4096)),
        ("point_jitter", "PointJitter(sigma=0.1)", PointJitter(sigma=0.1, p=1.0)),
        (
            "range_filter",
            "RangeFilter3D()",
            RangeFilter3D(point_cloud_range=(-30, -30, -5, 30, 30, 3)),
        ),
        ("compose", "Compose", compose),
    ]

    rr.init("vision3d_transforms", spawn=True)
    rr.send_blueprint(
        rrb.Blueprint(
            rrb.Tabs(
                *(
                    fusion_layout(
                        NuScenes3D.camera_names,
                        NuScenes3D.camera_grid,
                        entity_prefix=prefix,
                        name=name,
                    )
                    for prefix, name, _ in transforms
                )
            ),
            rrb.TimePanel(state="hidden"),
        )
    )

    for prefix, name, pipeline in transforms:
        rr.log(prefix, rr.ViewCoordinates.RIGHT_HAND_Z_UP, static=True)
        t_inputs, t_targets = pipeline(inputs, targets)
        log_sample(
            t_inputs,
            t_targets,
            entity_prefix=prefix,
            label_to_id=label_to_id,
            jpeg_quality=75,
        )


.. rerun-embed:: vision3d_transforms.rrd
   :height: 60vh


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 8.001 seconds)


.. _sphx_glr_download_auto_examples_transforms_plot_transforms.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_transforms.ipynb <plot_transforms.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_transforms.py <plot_transforms.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_transforms.zip <plot_transforms.zip>`