Deep learning methods have become popular in multimodal medical image fusion for diagnostic problems. Unlike conventional ways where spatial alignment is a crucial step, the deep learning methods perform the fusion at middle layers of deep neural networks and the alignment of multiple image modalities is achieved implicitly at the semantic level. Therefore, the role of spatial alignment in the fusion process using deep learning is doubted. This study tried to clarify this doubt via a series of experiments. Particularly, based on two specific clinical diagnostic problems, i.e. diagnosis of AD and AMD, performances of concatenation-based deep fusion networks with spatially aligned or misaligned inputs were compared. Moreover, modified deep fusion networks with an STN module to provide adaptive spatial alignment was proposed and tested. It was observed that there was an improvement in diagnostic results if the inputs of deep fusion networks were spatially aligned, and adaptive spatial alignment could bring additional improvement. These findings suggest that spatial alignment still works in the fusion process using deep learning and an additional adaptive spatial alignment is recommended for better fusion results.