Bridging Vision, Language, and Action: What's Missing in Actionable Visual Perception for Robotics


CVPR 2026 Workshop, Denver CO, USA

Date: June 3rd, Wednesday


Submission Portal

Overview

Vision foundation models excel at encoding passive data, yet robots require physically-grounded reasoning about pose, dynamics, and affordances. This workshop bridges the gap between computer vision and robotics, moving beyond simple deployment to pioneer task-driven, co-designed perception-action loops. We aim to translate perceptual abstractions into actionable structures, closing the loop from pixels to torque for robust, real-world systems.

To achieve this, we foster a bidirectional dialogue: enabling vision researchers to incorporate robotic constraints into design, while empowering roboticists to effectively deploy advanced models. Specifically, this workshop focuses on three interactive dimensions and the corresponding stage-wise challenges essential for actionable visual perception.

Topics

Interactive Dimensions

What visual capability is needed for fully autonomous systems

How can the vision community contribute to general-purpose robotic systems

What data modality is critical for generalizable, robust robot control

Stage-Wise Challenges

Data

  • Dynamic logs with multi-modal feedback.
  • Teaching models about risk.
  • Physically accurate data for the "sim-to-real" gap.

Model

  • 3D geometry and physical dynamics.
  • Differentiable cause-effect relationships.
  • Conditioned representations over passive observation.

Optimization

  • Safety and stability in the learning objective.
  • Downstream task success.
  • Model confidence for safe real-world deployment.

Evaluation

  • Closed-loop performance.
  • Reliability against environmental variability.
  • Physical task completion and safe interaction.

Call for Papers

Submission Instructions

We welcome submissions covering:

All formats allow unlimited references and appendices.

Contributions will be non-archival but hosted on our workshop website, and thus dual submission is allowed where permitted by third parties. We welcome submissions that are under submission or accepted by other conferences. Please mention it in the last sentence of the paper abstract if your paper has been under submission or accepted by other conferences.

Submissions should follow CVPR two-column style and be anonymous; see the CVPR-26 author kit for details.

Submission and Important Dates

Invited Speakers

Schedule

To encourage open-ended discussion and maximize in-person engagement, the workshop will feature a mix of structured and interactive formats.

These interactive elements are designed to stimulate lively exchanges, bridge the gap between junior and senior researchers, and cultivate an open, inclusive research community.

Time Talk Tentative Topics
8:50 – 9:00 Opening Remarks -
9:00 – 9:45 Keynote Talk 1 Topics in Data Curation & Synthesis for Embodied AI, Q&A.
9:45 – 10:30 Keynote Talk 2 Topics in Data Curation & Synthesis for Embodied AI, Q&A.
10:30 – 10:45 Coffee Break & Posters Informal networking.
10:45 – 11:30 Keynote Talk 3 Topics in Physics-Informed Vision Models Design, Q&A.
11:30 – 12:00 Panel Discussion -
12:00 – 14:00 Lunch Break -
14:00 – 14:45 Keynote Talk 4 Topics in Training Strategies and Optimization, Q&A.
14:45 – 15:30 Keynote Talk 5 Topics in Training Strategies and Optimization, Q&A.
15:30 – 15:45 Coffee Break & Posters Informal networking.
15:45 – 16:30 Keynote Talk 6 Topics in Model Evaluation & Verification, Q&A.
16:30 – 17:00 Panel Discussion -
17:00 – 17:10 Closing remarks -

Organizers

Student Organizers