Snowflakes captured on photos may severely decrease the visual quality and cause difficulties for vision analysis systems. Most noise removal frameworks are designed for de-raining or de-hazing, regarding rain or haze as translucent masks on clean images. However, snowflakes are different from them in terms of sizes, shapes, transparencies and floating trajectories, which decreases the performance of de-raining or de-hazing models in processing snowy images. In this work, we propose an effective multi-scale generative adversarial network framework for single-image snow removal, which is built with a multi-scale structure to identify various scales of snowflakes and a capsule-based structure to fuse the features extracted from the multi-scale encoding branches, so that different scaled features could be summarised and learnt by a joint framework. The overall framework is supervised by a weighted joint loss with an iterative training procedure to keep the training stability for the multi-branch-based structure. The experimental results demonstrate that our model outperforms the state-of-the-art comparisons.