At a glance VLM-based robot planners struggle with long, complex tasks because natural-language plans can be ambiguous, especially when specifying both actions and locations. GroundedPlanBench evaluates whether models can plan actions and determine where they should occur across diverse, real-world robot scenarios. Video-to-Spatially Grounded Planning (V2GP) is a framework that converts robot demonstration videos into










