Situat3DChange: Situated 3DChange Understanding Dataset for Multimodal Large Language Model (Supplementary Materials)
–Neural Information Processing Systems
The data generation process includes situation sampling, long-form text generation, query generation for the long-form text, and QA generation. It is based on human observations of changes, object attributes, and allocentric object relationships in 3DSSG [9], as well as egocentric relationships between the human and the objects. A.1 Situation Sampling We follow the situation categories of MSQA [4], namely sitting, interacting, and standing, but with more detailed geometric analysis: Sitting. The 28seat categories in 3RScan [8] are grouped into four types: 3large seats with backrests (e.g., sofa), 16 small seats with backrests (e.g., armchair), 1 large seat without a backrest (bed), and 8small seats without backrests (e.g., beanbag). Seatable and backrest areas are classified by surface normals, or by nearby walls within 0.5 m if no backrest exists. For small seats, the seating point is the bounding box center, oriented away from the backrest. For large seats, we select a point with a backrest behind and open space (0.5-1 m) in front.
Neural Information Processing Systems
Jun-19-2026, 04:56:40 GMT
- Technology: