Due to wide application prospects and various challenges such as large scale variation, inter-occlusion between crowd people and background noise, crowd counting is receiving increasing attention. In this paper, we propose a scale-aware rolling fusion network (SRF-Net) for crowd counting, which focuses on dealing with scale variation in highly congested noisy scenes. SRF-Net is a two-stage architecture that consists of a band-pass stage and a rolling guidance stage. Compared with the existing methods, SRF-Net achieves better results in retaining appropriate multi-level features and capturing multi-scale features, thus improving the quality of density estimation maps in crowded scenarios with large scale variation. We evaluate our method on three popular crowd counting datasets (ShanghaiTech, UCF-CC-50 and UCF-QNRF), and extensive experiments show its outperformance over the state-of-the-art approaches.