Three-Dimensional (3D) FPGA as a promising design trend, achieves significant performance improvement over conventional 2D-based FPGA. The maturity of the uni-directional routing architecture design, which achieves 25% area saving in area-delay-product (ADP) over bi-directional routing architectures, has driven major vendors such as Xilinx and Altera to switch to such architecture in their 2D-based products. However, few studies were contributed to exploring performanceoptimal uni-directional 3D routing architectures. In this paper, we propose and evaluate a novel uni-directional 3D routing architecture named UNI-3D. Additionally, in the EDA counterpart, we also propose an improved simulated annealing (SA)-based placement algorithm that caters the unidirectional architecture, to alleviate signal propagation imbalance in the vertical channels resulted from using conventional bi-directional based SA approach. Our simulation results show that our proposed architecture is able to achieve up to 28.44% of delay reduction and 26.21% planar channel width reduction compared with the baseline 2D uni-directional architecture. At the same time, the proposed SA algorithm is able to improve the average vertical channel width up to 16% compared to state-of-the-art works.