Face anti-spoofing is crucial as face recognition systems are widely challenged by the print attack and replay attack. Since facial temporal patterns of these attacks and real face are naturally different, this paper proposes two temporal modelling approaches to face anti-spoofing tasks. Firstly, we propose to analyze the temporal patterns of mid-level facial attributes in spectral domain, aiming to find the unique frequency patterns of real face and each attack, respectively. Then, we propose to directly model dynamics from the given data, by employing the dynamic image algorithm to generate low-level spatiotemporal representations of videos. In particular, we extract deep features from both global and local face parts, i.e. eyes, nose and mouth, and then fuse them for face spoofing detection. Then, a Convolutional Neural Networks (CNN) - Long Short-Term Memory (LSTM) units (CNN-LSTM) architecture is introduced to learn the high-level spatiotemporal features from dynamic facial images. The proposed approaches were evaluated on two benchmark databases. The results suggest the effectiveness of the second approaches, i.e. as low as 1.85% Equal Error Rate (EER) on CASIA-FASD and 0.00% Average Classification Error Rate (ACER) on REPLAY-ATTACK have been achieved.