Abstract
Addressee detection (AD) enables robots to interact smoothly with a human by distinguishing whether it is being addressed. However, this has not been widely explored. The few studies that have explored this area focused on a human-to-human or human-to-robot conversation confined inside a meeting room using gaze and utterance. These works used statistical and rule-based approaches, which tend to depend on specific settings. Further, they did not fully leverage the available audio and visual information or the short-term and long-term segments, and they have not explored combining important conversation cues—the facial and audio features. In addition, no audiovisual spatiotemporal annotated dataset captured in mixed human-to-human and human-to-robot settings is available to support exploring the area using new approaches.
| Original language | English |
|---|---|
| Article number | 22959594 |
| Pages (from-to) | 25-38 |
| Journal | IEEE Systems, Man, and Cybernetics Magazine |
| Volume | 9 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - 18 Apr 2023 |
Keywords
- Deep learning
- Visualization
- Annotations
- Input variables
- Human-robot interaction
- Oral communication
- Predictive models