TY - GEN
T1 - Analyzing Occupant Behavior in Smart Buildings Using Vision-Language Models and Temporal Segmentation Networks
AU - Wang, Chaoju
AU - Zhou, Yuyan
AU - Zhou, Tongyu
AU - Xie, Jing
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025/7
Y1 - 2025/7
N2 - Understanding occupant behavior is crucial for optimizing energy usage, comfort, and sustainability in buildings. Occupant behaviors within indoor environments are inherently complex, stochastic, and diverse, particularly in scenarios involving interactions among multiple individuals. As a result, accurately detecting, tracking, and interpreting such behaviors poses a challenge. In this paper, we propose a novel pipeline that integrates a Visual-Language Model (VLM) with a Temporal Segmentation Network (TSN). The proposed method employs VLM to generate semantically rich textual descriptors of occupant behaviors through image-text alignment, which are subsequently processed by TSN to capture and segment temporal dynamics effectively. Experimental results in a simulated campus classroom scenario validated the proposed method. It achieved high accuracy occupant behavior recognition and demonstrated its feasibility for practical deployment in smart building applications.
AB - Understanding occupant behavior is crucial for optimizing energy usage, comfort, and sustainability in buildings. Occupant behaviors within indoor environments are inherently complex, stochastic, and diverse, particularly in scenarios involving interactions among multiple individuals. As a result, accurately detecting, tracking, and interpreting such behaviors poses a challenge. In this paper, we propose a novel pipeline that integrates a Visual-Language Model (VLM) with a Temporal Segmentation Network (TSN). The proposed method employs VLM to generate semantically rich textual descriptors of occupant behaviors through image-text alignment, which are subsequently processed by TSN to capture and segment temporal dynamics effectively. Experimental results in a simulated campus classroom scenario validated the proposed method. It achieved high accuracy occupant behavior recognition and demonstrated its feasibility for practical deployment in smart building applications.
KW - Behavior recognition
KW - Built environment
KW - Occupant Behavior
KW - Temporal Segmentation Network
KW - Visual-Language Model
UR - https://www.scopus.com/pages/publications/105013052968
U2 - 10.1109/CVIDL65390.2025.11085690
DO - 10.1109/CVIDL65390.2025.11085690
M3 - Conference contribution
AN - SCOPUS:105013052968
T3 - 2025 6th International Conference on Computer Vision, Image and Deep Learning, CVIDL 2025
SP - 1060
EP - 1063
BT - 2025 6th International Conference on Computer Vision, Image and Deep Learning, CVIDL 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 6th International Conference on Computer Vision, Image and Deep Learning, CVIDL 2025
Y2 - 23 May 2025 through 25 May 2025
ER -