Research | Akshay L Chandra

2025

Master’s Thesis

Fine-Tuning Diffusion Policies with World Models

Akshay L Chandra, Iman Nematollahi, Chenguang Huang, and 2 more authors

Master’s Thesis, Robot Learning Lab [Coursework; Not Peer-Reviewed], Feb 2025

Abs

In the behaviour-cloning paradigm, diffusion-based policies (DPs) have recently emerged as a preferred choice for continuous control and robot learning tasks. Much of the adoption is attributed to their efficacy in modelling high-dimensional and multimodal action distributions while showing immense training stability. However, DPs have been limited mainly to the scale and quality of the expert data they fit. To that end, recent works have shown that policy gradient (PG) methods from reinforcement learning (RL) can be helpful in fine-tuning DPs to improve them beyond expert data, with online interactions and sparse task-completion rewards. Specifically, by treating the denoising process as a separate Markov Decision Process (MDP), previous works have applied PG methods to fine-tune DPs. However, RL fine-tuning with millions of environmental interactions for real robots can often be unsafe and unrealistic. We introduce Diffusion Policy Policy Optimisation in World-Models, DPOW, a sample-efficient, real-robot-friendly algorithmic framework for improving DPs offline with learnt dynamics models, i.e. world models. Through experimental investigation, we find that DPOW can fine-tune and improve policies with just offline interactions. We demonstrate stable training and robustness to task difficulty across a range of simulated continuous control manipulation tasks in CALVIN.
ICRA

LUMOS: Language-Conditioned Imitation Learning with World Models

Iman Nematollahi, Branton DeMoss, Akshay L Chandra, and 3 more authors

IEEE International Conference on Robotics and Automation, Feb 2025

Abs arXiv Code Website

We introduce LUMOS, a language-conditioned multi-task imitation learning framework for robotics. LUMOS learns skills by practicing them over many long-horizon rollouts in the latent space of a learned world model and transfers these skills zero-shot to a real robot. By learning on-policy in the latent space of the learned world model, our algorithm mitigates policy-induced distribution shift which most offline imitation learning methods suffer from. LUMOS learns from unstructured play data with fewer than 1% hindsight language annotations but is steerable with language commands at test time. We achieve this coherent long-horizon performance by combining latent planning with both image- and language-based hindsight goal relabeling during training, and by optimizing an intrinsic reward defined in the latent space of the world model over multiple time steps, effectively reducing covariate shift. In experiments on the difficult long-horizon CALVIN benchmark, LUMOS outperforms prior learning-based methods with comparable approaches on chained multi-task evaluations. To the best of our knowledge, we are the first to learn a languageconditioned continuous visuomotor control for a real-world robot within an offline world model.

2024

Master’s Project

SAC-N-GMM: Robot Skill Refining and Sequencing for Long-Horizon Manipulation Tasks

Akshay L Chandra, Iman Nematollahi, and Tim Welschehold

Master’s Project, Robot Learning Lab [Coursework; Not Peer-Reviewed], Feb 2024

Abs PDF Code

Despite access to expert data, most long-horizon imitation-learning (IL) agents suffer from distribution shifts, compounding errors, and expert dependency. Several previous works show that refining IL agents in the world with reinforcement learning (RL) alleviates some of these problem by making the agents more robust to noisy perception and stochasticity in dynamics with much helpful real-world exposure. SAC-GMM does this efficiently by first learning a task from demonstrations with a classical robotics technique (e.g., Gaussian Mixture Model) and then refines it with a deep RL (Soft Actor-Critic) agent with sparse task-completion rewards. One could further dampen the side effects of long-horizon IL agents by breaking down complex tasks into short-horizon skills. This simplifies the learning goal into a hierarchy of agents, i.e. high-level planning agent (skill sequencer) and low-level control agent (skill executor). To this end, we propose the Soft Actor-Critic-N -Gaussian Mixture Model (SAC-N-GMM), a novel hybrid RL approach that learns to simultaneously refine and sequence a repertoire of low-level skills to perform numerous combinations of long-horizon tasks. Our approach extends SAC-GMM (1) by learning N lowlevel robot skills with Riemannian Manifold GMMs that learn both robot positions and orientations (2) by learning a single RL agent to refine and sequence multiple manifold-aware GMM skills. Extensive evaluations in the CALVIN simulation environment demonstrate that our approach leverages high-dimensional sensory data, minimal expert demonstrations, minimal physical interactions, and sparse task-completion rewards efficiently to achieve superior long-horizon task performance compared to baselines. Code is available at https://github.com/acl21/sac_n_gmm

2022

Plant Phenomics

How Useful Is Image-Based Active Learning for Plant Organ Segmentation?

Shivangana Rawat, Akshay L Chandra, Sai Vikas Desai, and 3 more authors

Plant Phenomics, Feb 2022

Abs DOI HTML PDF Code

Training deep learning models typically requires a huge amount of labeled data which is expensive to acquire, especially in dense prediction tasks such as semantic segmentation. Moreover, plant phenotyping datasets pose additional challenges of heavy occlusion and varied lighting conditions which makes annotations more time-consuming to obtain. Active learning helps in reducing the annotation cost by selecting samples for labeling which are most informative to the model, thus improving model performance with fewer annotations. Active learning for semantic segmentation has been well studied on datasets such as PASCAL VOC and Cityscapes. However, its effectiveness on plant datasets has not received much importance. To bridge this gap, we empirically study and benchmark the effectiveness of four uncertainty-based active learning strategies on three natural plant organ segmentation datasets. We also study their behaviour in response to variations in training configurations in terms of augmentations used, the scale of training images, active learning batch sizes, and train-validation set splits.

2021

PMLR/NeurIPS W

On Initial Pools for Deep Active Learning

Akshay L Chandra^*, Sai Vikas Desai^*, Chaitanya Devaguptapu^*, and 1 more author

NeurIPS 2020 Workshop on Pre-registration in Machine Learning, Dec 2021

Abs HTML PDF Code

Active Learning (AL) techniques aim to minimize the training data required to train a model for a given task. Pool-based AL techniques start with a small initial labeled pool and then iteratively pick batches of the most informative samples for labeling. Generally, the initial pool is sampled randomly and labeled to seed the AL iterations. While recent studies have focused on evaluating the robustness of various query functions in AL, little to no attention has been given to the design of the initial labeled pool for deep active learning. Given the recent successes of learning representations in self-supervised/unsupervised ways, we study if an intelligently sampled initial labeled pool can improve deep AL performance. We investigate the effect of intelligently sampled initial labeled pools, including the use of self-supervised and unsupervised strategies, on deep AL methods. The setup, hypotheses, methodology, and implementation details were evaluated by peer review before experiments were conducted. Experimental results could not conclusively prove that intelligently sampled initial pools are better for AL than random initial pools in the long run, although a Variational Autoencoder-based initial pool sampling strategy showed interesting trends that merit deeper investigation.

2020

Plant Methods

Active Learning with Point Supervision for Cost-Effective Panicle Detection in Cereal Crops

Akshay L Chandra^*, Sai Vikas Desai^*, Vineeth N Balasubramanian, and 2 more authors

Plant Methods (BioMed Central), Mar 2020

Abs DOI arXiv HTML

Panicle density of cereal crops such as wheat and sorghum is one of the main components for plant breeders and agronomists in understanding the yield of their crops. To phenotype the panicle density effectively, researchers agree there is a significant need for computer vision-based object detection techniques. Especially in recent times, research in deep learning-based object detection shows promising results in various agricultural studies. However, training such systems usually requires a lot of bounding-box labeled data. Since crops vary by both environmental and genetic conditions, acquisition of huge amount of labeled image datasets for each crop is expensive and time-consuming. Thus, to catalyze the widespread usage of automatic object detection for crop phenotyping, a cost-effective method to develop such automated systems is essential.
CVPPP/ECCV Demos

EasyRFP: An Easy to Use Edge Computing Toolkit for Real-Time Field Phenotyping

Akshay L Chandra^*, Sai Vikas Desai^*, Hirafuji Masayuki, and 3 more authors

Extended Abstract at CVPPP & ECCV Academic Demonstrations, Aug 2020

Abs Video Code

We propose EasyRFP, an edge computing toolkit for real-time field phenotyping. Recent advances in deep learning have catalysed rapid progress in high throughput field phenotyping. Much research has been dedicated towards developing accurate and cost effective deep learning models to capture phenotyping traits such as plant stress, yield and plant growth stages. However, there is a shortage of software tools to promote the usage of such intelligent methods among plant phenotyping practitioners and researchers. To bridge this gap, we developed this, a Flask backend, Angular frontend software toolkit. Broadly speaking, our toolkit can be interfaced with a commercial GPU enabled micro computer (such as NVIDIA Jetson) and a digital camera. Precisely, our toolkit can be used to capture images and extract phenotypic traits in both real-time and in scheduled mode. Currently, we support classification, detection and instance segmentation tasks.

2019

BMVC

An Adaptive Supervision Framework for Active Learning in Object Detection

Sai Vikas Desai^*, Akshay L Chandra^*, Wei Guo, and 2 more authors

British Machine Vision Conference, Aug 2019

Abs arXiv Poster

Active learning approaches in computer vision generally involve querying strong labels for data. However, previous works have shown that weak supervision can be effective in training models for vision tasks while greatly reducing annotation costs. Using this knowledge, we propose an adaptive supervision framework for active learning and demonstrate its effectiveness on the task of object detection. Instead of directly querying bounding box annotations (strong labels) for the most informative samples, we first query weak labels and optimize the model. Using a switching condition, the required supervision level can be increased. Our framework requires little to no change in model architecture. Our extensive experiments show that the proposed framework can be used to train good generalizable models with much lesser annotation costs than the state of the art active learning approaches for object detection.