In this paper we propose a novel framework for action recognition based on multiple features for improve action recognition in videos. The fusion of multiple features is important for recognizing actions as often a single feature based representation is not enough to capture the imaging variations (view-point, illumination etc.) and attributes of individuals (size, age, gender etc.). Hence, we use two kinds of features: i) a quantized vocabulary of local spatio-temporal (ST) volumes (cuboids and 2-D SIFT), and ii) the higher-order statistical models of interest points, which aims to capture the global information of the actor. We construct video representation in terms of local space-time features and global features and integrate such representations with hyper-sphere multi-class SVM. Experiments on publicly available datasets show that our proposed approach is effective. An additional experiment shows that using both local and global features provides a richer representation of human action when compared to the use of a single feature type.