Analyzing Occupant Behavior in Smart Buildings Using Vision-Language Models and Temporal Segmentation Networks

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

Understanding occupant behavior is crucial for optimizing energy usage, comfort, and sustainability in buildings. Occupant behaviors within indoor environments are inherently complex, stochastic, and diverse, particularly in scenarios involving interactions among multiple individuals. As a result, accurately detecting, tracking, and interpreting such behaviors poses a challenge. In this paper, we propose a novel pipeline that integrates a Visual-Language Model (VLM) with a Temporal Segmentation Network (TSN). The proposed method employs VLM to generate semantically rich textual descriptors of occupant behaviors through image-text alignment, which are subsequently processed by TSN to capture and segment temporal dynamics effectively. Experimental results in a simulated campus classroom scenario validated the proposed method. It achieved high accuracy occupant behavior recognition and demonstrated its feasibility for practical deployment in smart building applications.

Original languageEnglish
Title of host publication2025 6th International Conference on Computer Vision, Image and Deep Learning, CVIDL 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1060-1063
Number of pages4
ISBN (Electronic)9798331523244
DOIs
Publication statusPublished - Jul 2025
Event6th International Conference on Computer Vision, Image and Deep Learning, CVIDL 2025 - Ningbo, China
Duration: 23 May 202525 May 2025

Publication series

Name2025 6th International Conference on Computer Vision, Image and Deep Learning, CVIDL 2025

Conference

Conference6th International Conference on Computer Vision, Image and Deep Learning, CVIDL 2025
Country/TerritoryChina
CityNingbo
Period23/05/2525/05/25

Keywords

  • Behavior recognition
  • Built environment
  • Occupant Behavior
  • Temporal Segmentation Network
  • Visual-Language Model

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Safety, Risk, Reliability and Quality
  • Instrumentation

Fingerprint

Dive into the research topics of 'Analyzing Occupant Behavior in Smart Buildings Using Vision-Language Models and Temporal Segmentation Networks'. Together they form a unique fingerprint.

Cite this