A practical deep reinforcement learning framework for multivariate occupant-centric control in buildings

Yue Lei, Sicheng Zhan, Eikichi Ono, Yuzhen Peng, Zhiang Zhang, Takamasa Hasama, Adrian Chong

Research output: Journal PublicationArticlepeer-review

30 Citations (Scopus)


Reinforcement learning (RL) has been shown to have the potential for optimal control of heating, ventilation, and air conditioning (HVAC) systems. Although research on RL-based building control has received extensive attention in recent years, there is limited real-world implementation to evaluate its performance while keeping occupants in the loop. Additionally, many HVAC systems consist of multiple subsystems, but conventional RL algorithms face significant challenges when dealing with high-dimensional action spaces. This study proposes a practical deep reinforcement learning (DRL) based multivariate occupant-centric control framework that considers personalized thermal comfort and occupant presence. Specifically, Branching Dueling Q-network (BDQ) is leveraged as the learning agent to efficiently solve the multi-dimensional control task, and a tabular-based personal comfort modeling method is applied that is naturally integrated into human-in-the-loop operations. The BDQ agent is pre-trained in a virtual environment, followed by online deployment in a real office space for 5-dimensional action control. Based on the actual deployment and real-time comfort votes, our results showed a 14% reduction in cooling energy and an 11% improvement in total thermal acceptability.
Original languageEnglish
Article number119742
Number of pages18
JournalApplied Energy
Publication statusPublished - 15 Oct 2022


  • Occupant-centric control
  • Deep learning
  • Reinforcement learning
  • Thermal comfort
  • Energy efficiency

ASJC Scopus subject areas

  • Building and Construction
  • Mechanical Engineering
  • General Energy
  • Management, Monitoring, Policy and Law


Dive into the research topics of 'A practical deep reinforcement learning framework for multivariate occupant-centric control in buildings'. Together they form a unique fingerprint.

Cite this