High-quality 3D reconstruction of large-scale indoor scene is the key to combine Simultaneous Localization And Mapping (SLAM) with other applications, such as building inspection and construction monitoring. However, the requirement of global consistency brings challenges to both localization and mapping. In particular, significant localization and mapping error can happen when standard SLAM techniques are used when dealing with the area of featureless walls and roofs. This paper proposed a novel framework aiming to reconstruct a high-quality, globally consistent 3D model for indoor environments using only a RGB-D sensor. We first introduce the sparse and dense feature constraints in the local bundle adjustment. Then, the planar constraints are incorporated in the global bundle adjustment. We fuse the point clouds in a truncated signed distance function volume, from which the high quality mesh can be extracted. Our framework leads to a comprehensive 3D scanning solution for indoor scene, enabling high-quality results and potential applications in building information system. The video of 3D models reconstructed by the method proposed in this paper is available at https://youtu.be/DWMP4YfeNeY.