The CVSS 2025 programme will consist of a series of keynote lectures and hands-on lab sessions covering a wide range of topics in computer vision. The programme is designed to provide attendees with both theoretical knowledge and practical skills in the field. The core themes which drive the programme are outlined below:

Theme 1: Image and Video Synthesis

Computer vision theory (foundations underpinning synthesis and generation) Deep learning architectures and techniques (transformers, diffusion, autoregressive, foundation models) Representation learning (disentanglement, latent spaces, feature learning) Optimisation methods (classical optimisation, energy-based methods, variational methods) Generative modelling (GANs, diffusion, video diffusion, controllable generation, editing/inpainting, evaluation) Explainable computer vision (interpretability for generative and discriminative models) ScienceDirect Transparency, fairness, accountability, privacy and ethics in vision (bias analysis, documentation and reporting practices, privacy-aware CV)

Theme 2: 3D, Geometry, and Physical Understanding

3D from multi-view and sensors (SfM/MVS, SLAM, LiDAR/RGB-D fusion, calibration) 3D from single images (monocular depth, single-view reconstruction, priors and generative 3D) Geometry-grounded synthesis and rendering (NeRFs, 3D Gaussian Splatting, differentiable rendering, inverse rendering) Low-level vision (denoising, deblurring, super-resolution, HDR, optical flow) Segmentation (2D/3D, instance, panoptic, promptable segmentation/tracking) Scene analysis and understanding (layout, objects, relationships, affordances, physical reasoning)

Theme 3: Vision for X

Computer vision for robotics (perception for manipulation, navigation, inspection) Embodied vision (active perception, agent learning, simulation and sim2real) Autonomous driving (detection, tracking, occupancy, scene understanding, robustness) Video: low-level analysis, motion and tracking (temporal modelling, event understanding) Humans: face, body, pose, gesture, movement (2D/3D pose, behaviour understanding) Recognition: categorisation, detection, retrieval (open-vocabulary recognition, large-scale retrieval, long-tail settings)

Theme 4: Multimodal Learning, Vision, Language and Reasoning

Multimodal learning (vision-language, video-language, audio-visual, sensor fusion) Vision-language models and grounding (captioning, VQA, referring expressions, prompt-based perception) Reasoning and agentic multimodal systems (tool use, planning, compositional reasoning over images/video) Transfer / low-shot / continual / long-tail learning (domain shift, adaptation, incremental learning, data imbalance) Self-, semi-, meta- and unsupervised learning (self-supervised pretraining, weak supervision, meta-learning)