This thesis project aims to develop a real-time, on-board, image-based Municipal Solid Waste (MSW) classification system that can also recognize hand gestures using computer vision techniques. Computer vision is a sub-field of artificial intelligence and the recent state-of-the-art vision models are mostly using deep neural networks for various image tasks, including object detection, classification, tracking, etc. These deep models owe a great part of their success to high-performance computers and large annotated datasets in which they are pre-trained. A popular deep network (e.g., RestNet) usually comprises of millions of network parameters and requires high-performance Graphical Processing Units (GPUs) for both training and testing, which is not suitable for on-board computing. This project investigates on the real-time implementation of both MSW classification and hand gesture recognition using low-cost low-energy chips. Besides the efficiency requirement, other major challenges lie in the intra-class variations, small dataset size, and large variations in the background between the training and testing environments (i.e., lighting and contrast). Consequently, a previously trained deep network might not generalize well to unseen data. To improve the generalization issue, this thesis explores various approaches for reducing background noise using background subtraction and developing approaches to train pre-trained deep learning models from a newly collected dataset of hand gestures and MSW objects. This thesis project also explores the development of cloud-based automation pipeline for generating synthetic images of MSW objects for a larger dataset, model development, and total system integration.