Plant-CNN-ViT: Plant Classification with Ensemble of Convolutional Neural Networks and Vision Transformer

Chin Poo Lee, Kian Ming Lim, Yu Xuan Song, Ali Alqahtani

Research output: Journal PublicationArticlepeer-review

8 Citations (Scopus)


Plant leaf classification involves identifying and categorizing plant species based on leaf characteristics, such as patterns, shapes, textures, and veins. In recent years, research has been conducted to improve the accuracy of plant classification using machine learning techniques. This involves training models on large datasets of plant images and using them to identify different plant species. However, these models are limited by their reliance on large amounts of training data, which can be difficult to obtain for many plant species. To overcome this challenge, this paper proposes a Plant-CNN-ViT ensemble model that combines the strengths of four pre-trained models: Vision Transformer, ResNet-50, DenseNet-201, and Xception. Vision Transformer utilizes self-attention to capture dependencies and focus on important leaf features. ResNet-50 introduces residual connections, aiding in efficient training and hierarchical feature extraction. DenseNet-201 employs dense connections, facilitating information flow and capturing intricate leaf patterns. Xception uses separable convolutions, reducing the computational cost while capturing fine-grained details in leaf images. The proposed Plant-CNN-ViT was evaluated on four plant leaf datasets and achieved remarkable accuracy of 100.00%, 100.00%, 100.00%, and 99.83% on the Flavia dataset, Folio Leaf dataset, Swedish Leaf dataset, and MalayaKew Leaf dataset, respectively.

Original languageEnglish
Article number2642
Issue number14
Publication statusPublished - Jul 2023
Externally publishedYes


  • convolutional neural network
  • deep learning
  • plant classification
  • plant leaf classification
  • Vision Transformer

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Ecology
  • Plant Science


Dive into the research topics of 'Plant-CNN-ViT: Plant Classification with Ensemble of Convolutional Neural Networks and Vision Transformer'. Together they form a unique fingerprint.

Cite this