The large number of connected networks that underpin today's IT ecosystem make them more vulnerable to cyber threats because of their connectivity, user diversity, amount of connected devices, and services and applications that are available worldwide. As a response to these cyberthreats, zero trust security has been recommended. However, it's crucial to remember that this kind of security monitoring can be done by outside experts. When cloud-based third parties access network traces, there are threats to data security, thus the present trend in security monitoring needs to change to a "Never Trust, Always Verify"approach. Network Intrusion Detection System (NIDS) can be used to detect anomalous behavior. Convolution Neural Network (CNN) and Bi-directional Long Short Term Memory (BiLSTM) based classifiers and Auto-Encoder (AE) feature extractors have shown promising results in NIDS. AE feature extractor provides possibility of compressing the most important information and training the model unsupervised. CNNs are capable to capture local spatial relationships, while BiLSTMs are good at exploiting temporal interactions. In addition, Attention modules are good at capturing content-based global interactions, and can be applied on CNNs to attend to the most important contextual information. In this work, we utilized the advantages of all AE, CNN and BiLSTM structures using a multi-head Self Attention mechanism to focus and integrate CNN features for feeding into BiLSTM classifier. We proposed to use the bottleneck features of a pre-trained AE for an Attention-based CNN-BiLSTM classifier. Our experiments using 6 and 10 category NID system on UNSW-NB15 dataset showed that our proposed method outperforms state-of-the-art methods and achieved accuracy of 89.79% and 88.13% respectively. Also, we proposed a balanced data sampler for training 10 categories of NIDS which improved the accuracy up to 91.72%. We demonstrated the importance of Attention mechanism through our proposed method.