A farm can produce many vegetables in the same season. Before selling the vegetables, farmers sort them. Tomatoes in one basket, cabbage in another, potatoes in yet another.

Image: Rene Cortin via Wikimedia Commons
If the harvest is small, manual sorting works. But, on large farms, where vegetables pile up, sorting becomes slow and labour intensive. Automation becomes necessary.
The first step in automation is recognition. Can we create an artificial intelligence method to accurately identify vegetables?
To address the challenge, Ramakrishnan Raman, Symbiosis International (Deemed) University, Pune collaborated with researchers at the Asia University and the Chaoyang University of Technology. Artificial intelligence already recognizes faces, diseases and speech. So, vegetable classification should be simple.
But the team realised that there is a hidden problem. Modern deep learning systems are heavy. They require millions of parameters, calling for powerful GPUs and enormous energy. Training them increases carbon emissions.
Agriculture needs automation. The planet needs lower energy consumption. Can AI solve one problem without creating other problems?
So, the team set out to build a smarter model. They began with data.
They turned to Kaggle, a global data science platform where publicly available datasets from laboratories around the world are stored. The researchers selected a vegetable image dataset.
They limited the scope deliberately. Fifteen vegetables. No more.
Bean, bitter gourd, bottle gourd, brinjal, broccoli, cabbage, capsicum, carrot, cauliflower, cucumber, papaya, potato, pumpkin, radish and tomato – vegetables commonly seen at Indian markets and farms.
Expanding beyond 15 classes would increase computational cost and disbalance the dataset. Sustainability requires restraint.
Each category contained extensive training data along with 200 test images. The images varied in lighting, angle, background clutter and natural imperfections, reflecting real world conditions.
After defining the dataset, the researchers had to make a decision on the model to be used. Instead of relying on massive pre-trained networks, they designed a lightweight convolutional neural network from scratch.
First, the model needed to extract visual features from raw images. For this, the researchers stacked three convolutional layers, to detect the edges, textures and shapes of the vegetables in the images.
To reduce computational overhead, they used a rectified linear unit. To stabilise training, they used batch normalisation. To reduce spatial dimensions, they used Max pooling. This helped lower computational cost while preserving dominant features in the images.
Thus, they could reduce the input images gradually from 128 ×128 to 64×64, then to 32×32, and finally 16×16 feature maps. The network moved step by step from raw pixels to abstract representations.
But extracting features is not enough. The model also needed to decide which features mattered most. To achieve this without increasing model size, the researchers integrated Squeeze-and-Excitation attention mechanisms into their model. This allowed the model to pay attention to selected features, improving accuracy without adding significant parameters. Instead of adding depth, the system learned to recalibrate channel responses. It examined which feature channels carried meaningful information and amplified them, while suppressing those that were not so useful.
Finally, the refined feature maps had to be converted into predictions.To compress spatial information into a single vector per channel, the researchers applied global average pooling. This reduced the parameters, compared to those used in traditional fully connected layers.
A final linear layer then produced 15 logits, raw scores representing the strength of prediction about the class of vegetable to which each image belongs.
To make the final decision, the researchers used Softmax, an algorithm that converts the raw scores produced by the model into probabilities between 0 and 1, ensuring that all the probabilities add up to one. Softmax answers the question: which vegetable is this image most likely to be? The vegetable with the highest probability becomes the model’s final prediction.
The result? 98% accuracy in identifying the vegetables!
Bottle gourd, pumpkin and radish were classified almost perfectly. There were minor confusions between cabbage and cauliflower, broccoli and cauliflower, papaya and tomato due to visual similarities in texture, shape or colour. But these errors were minimal.
The system was reliable. More importantly, it was efficient.
The model used only 1.39 million parameters. To do the same task, the earlier models used many more. ResNet18 uses over 11 million. ResNeXt uses 23 million. AlexNet uses 57 million.
The researchers assessed the computational cost of their model using Giga Floating-Point Operations – GFLOPs. GFLOPs tell us how much calculation a system must perform to make a prediction. The proposed model required only 15.63 GFLOPS. ResNet18 required 43.65. ResNeXt required 34.06.
The model outperformed ResNet18, AlexNet, SqueezeNet, ResNeXt and GoogLeNet, while remaining significantly lighter. It did not rely on transfer learning, pruning pipelines, or multimodal sensor fusion.
This lightweight model, constructed from ground up is ready for deployment in real world situations.
“Now what remains to be done is the engineering to drop the identified vegetable into the right bin,” says Brij B Gupta, Symbiosis Centre for Information Technology. Pune.
“The strategy we used can be replicated for classifying multi-class objects from images, not just vegetables and fruits,” adds Ramakrishnan Raman, his senior colleague.
DOI: 10.1016/j.grets.2025.100257;
Green Technologies and Sustainability, 4 (1): 100257 (2026)
Reported by Shubhangi Chauhan
Symbiosis Institute of Mass Communication, Lavale, Pune
Leave a comment