GROUNDING PHYSICAL OBJECT AND EVENT CONCEPTS THROUGH DYNAMIC VISUAL REASONING

Paper Under Review

This page shows DCL's qualitative visual examples and failure cases for concept learning. We will make our code, models and data available as soon as possible.

Contents

Visualization of Concept Learning on CLEVRER
Visualization of Concept Learning on BLOCK TOWERS
Qualitative Results on CLEVRER-QA
Qualitative Results on CLEVRER-Grounding
Qualitative Results on CLEVRER-Retrieval
Qualitative Results on BLOCK TOWERS
Failure Cases on CLEVRER-QA

Visualization of Concept Learning on CLEVRER

We visualize the dynamic concepts learned by DCL on CLEVRER. We parse the scenes by densely quantizing the concepts in each frame. Extracted object trajectories, predicted collision, in and out are marked with green, red, blue and yellow colors.

Visualization of Concept Learning on BLOCK TOWERS

We visualize the dynamic concepts learned by DCL on BLOCK TOWERS. We parse the scenes by densely quantizing the color and falling concepts in each video. Extracted object trajectories are bounded by the predicted color. Falling objects are marked with falling on top of their bounding boxes.

Qualitative Results on CLEVRER-QA

Question 1: What is the color of the last object to collide with the metal sphere?
Predicted Answer: Brown.   (Ground-truth Answer: Brown.)

(a). Descriptive question sample. Extracted object trajectories, predicted
collision, in and out are marked with green, red, blue and yellow colors.

Question 2: Which of the following is responsible for the collision between the yellow cube and the cylinder?
Choice 1: The collision between the green metal object and the metal sphere. (Predicted: Wrong)   (GT.: Wrong)
Choice 2: The green metal cube's colliding with the yellow cube. (Predicted: Correct.)   (GT.: Correct.)
Choice 3: The green metal object's entering the scene. (Predicted: Correct.)   (GT.: Correct.)

(b). Explanatory question sample. Extracted object trajectories, predicted
collision, in and out are marked with green, red, blue and yellow colors.

Question 3: What will happen next?
Choice 1: The cylinder collides with the blue sphere. (Predicted: Correct.)   (GT.: Correct.)
Choice 2: The cylinder and the red object collide. (Predicted: Wrong)   (GT.: Wrong)

(c). Predictive question sample. Objects and evnets predicted in the future scenes are marked with black boxes.
Extracted object trajectories, predictive collision, in and out are marked with green, red, blue and yellow colors.

Question 4: Without the red sphere, which of the following will happen?
Choice 1: The cylinder collides with the blue object. (Predicted: Wrong)   (GT.: Wrong)
Choice 2: The yellow object and the cylinder collide. (Predicted: Correct.)   (GT.: Correct.)

Original Video.


Counterfacutal Video.

(d). Counterfactual question sample.

Qualitative Results on CLEVRER-Grounding

  Query: The collision that happens    
after the blue sphere exits the scene.
  Query: The cube enters the scene before    
the rubber sphere enters the scene.
   Query: The object that collides  
with the brown cube.
We visualize typical examples of CLEVRER-Grounding. The query expressions are shown on top of the videos and the spatio-temporal localization results in the videos are bounded with green boxes. DCL can explicitly ground object and event concepts, analyze temporal structures, and understanding the complex logic to localize the target event or object.

Qualitative Results on CLEVRER-Retrieval

Query expression: A video that contains a collision that happens before the green rubber cube enters the scene.
 
Top 1
 
Top 2
 
Top 3

Top 4
We visualize a typical example of CLEVRER-Retrieval. Gallery videos with top 4 ranks are shown. DCL can explicitly ground object and event concepts, analyze their relations and perform step-by-step reasoning to get the positive gallery videos.

Qualitative Results on BLOCK TOWERS


Q.: Are there any falling green objects?   
A.: No.   (GT.: No)
Q.: How many falling blocks are there?   
A.: 2.   (GT.: 2)
Q.: What is the color of the block at the top?   
A.: Blue.   (GT.: Blue)
Qualitative Results on BLOCK TOWERS. Questions, predicted answers and ground-truth anoswers are marked with Q., A. and GT., respectively. Extracted object trajectories are bounded by the target objects' predicted colors. Falling objects are marked with falling on top of their bounding boxes.

Failure Cases on CLEVRER-QA

Question: What will not happen if the green object is removed?
Choice 1: The sphere collides with the cylinder. (Predicted: Wrong)   (GT.: Wrong.)
Choice 2: The cylinder collides with the gray cube. (Predicted: Correct.)   (GT.: Correct.)
Choice 3: The sphere collides with the gray cube. (Predicted: Correct.)   (GT.: Wrong.)


Original Video.


Counterfacutal Video.

A typical faiulre cases of DCL. The sphere collides with the gray cube near the end of the video, which require
the dynamic predictor to have long-term dynamic prediction capacity. The dynamic predictor fails to predict the
long-term trajectory of the sphere, leading to DCL's failure prediction for Choice 3.