Gaze density estimation has attracted many research efforts in the past years. The factors considered in the existing methods include low level feature saliency, spatial position, and objects. Emotion, as an important factor driving attention, has not been taken into account. In this paper, we are the first to estimate gaze density through incorporating emotion. To estimate the emotion intensity of each position in an image, we consider three aspects, generic emotional content, facial expression intensity, and emotional objects. Generic emotional content is estimated by using Multiple instance learning, which is employed to train an emotion detector from weakly labeled images. Facial expression intensity is estimated by using a ranking method. Emotional objects are detected, by taking blood/injury and worm/snake as examples. Finally, emotion intensity, low level feature saliency, and spatial position, are fused, through a linear support vector machine, to estimate gaze density. The performance is tested on public eye tracking dataset. Experimental results indicate that incorporating emotion does improve the performance of gaze density estimation.