menu

Research Overview

This section is currently in the refinery!

Past Research

Previously, I worked as a Researcher at Preferred Networks, Japan, and, before that, as a Research Assistant at Video Analytics Lab, IISc, Bangalore, India. I have worked on various projects in Computer Vision, Machine Learning, Audio Processing, Human-Computer Interaction, and Robotics. I have been really fortunate to collaborate and get mentorship from some amazing people - Daniel Ritchie, R Venkatesh Babu, Siddhartha Chaudhuri, Shin-ichi Maeda, Adrien Gaidon, Rares Ambrus, Jason Naradowsky, and Fabrice Matulic to name a few.

Some of the central themes in my previous research:

  1. Novel methods for Computer Vision tasks (Semantic Segmentation, 3D Object Pose estimation, etc.),
  2. Stability of deep learnt features (Adversarial Attacks),
  3. Application of DL in Audio (source separation) and HCI (Tracking in VR).

For a complete list of published papers, visit Google Scholar.

Research:

Improving Unsupervised Visual Program Inference with Code Rewriting Families

A. Ganeshan, R. Kenny Jones and Daniel Ritchie

Oral (1.8%) IEEE / CVF International Conference on Computer Vision (ICCV), 2023

Programs offer compactness and structure that makes them an attractive representation for visual data. We explore how code rewriting can be used to improve systems for inferring programs from visual data. We first propose Sparse Intermittent Rewrite Injection(SIRI), a framework for unsupervised bootstrapped learning. SIRI sparsely applies code rewrite operations over a dataset of training programs, injecting the improved programs back into the training set. We design a family of rewriters for visual programming domains - Parameter Optimization (PO), Code Pruning (CP), and Code Grafting (CG). For three shape programming languages in 2D and 3D, we show that using SIRI with our family of rewriters improves performance - better reconstructions a...

Skill Generalization with Verbs

R. Ma, L. Lam, B. A. Spiegel, A. Ganeshan, B. Abbatematteo, R. Patel, D. Paulius, S. Tellex, G. Konidaris.

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023

It is imperative that robots can understand natural language commands issued by humans. Such commands typically contain verbs that signify what action should be performed on a given object and that are applicable to many objects. We propose a method for generalizing manipulation skills to novel objects using verbs. Our method learns a probabilistic classifier that determines whether a given object trajectory can be described by a specific verb. We show that this classifier accurately generalizes to novel object categories with an average accuracy of 76.69\% across 13 object categories and 14 verbs. We then perform policy search over the object kinematics to find an object trajectory that maximizes classifier prediction for a given verb. Our method allows ...

Warp-Refine Propagation Semi-Supervised Auto-labeling via Cycle-consistency

A. Ganeshan , Alexis Vallet, Yasunori Kudo, Shin-ichi Maeda, Tommi Kerola, Rares Ambrus, Dennis Park, Adrien Gaidon

IEEE / CVF International Conference on Computer Vision (ICCV), 2021

Deep learning models for semantic segmentation rely on expensive, large-scale, manually annotated datasets. Labelling is a tedious process that can take hours per image. Automatically annotating video sequences by propagating sparsely labeled frames through time is a more scalable alternative. In this work, we propose a novel label propagation method, termed Warp-Refine Propagation, that combines semantic cues with geometric cues to efficiently auto-label videos. Our method learns to refine geometrically-warped labels and infuse them with learned semantic priors in a semi-supervised setting by leveraging cycle consistency across time. We quantitatively show that our method improves label-propagation by a noteworthy margin of 13.1 mIoU on the ApolloScape d...

Phonetroller - Visual Representations of Fingers for Precise Touch Input with Mobile Phones in VR

Fabrice Matulic, A. Ganeshan, Hiroshi Fujiwara, Daniel Vogel

Conference on Human Factors in Computing Systems, CHI 21

Smartphone touch screens are potentially attractive for interaction in virtual reality (VR). However, the user cannot see the phone or their hands in a fully immersive VR setting, impeding their ability for precise touch input. We propose mounting a mirror above the phone screen such that the front-facing camera captures the thumbs on or near the screen. This enables the creation of semi-transparent overlays of thumb shadows and inference of fingertip hover points with deep learning, which help the user aim for targets on the phone. A study compares the effect of visual feedback on touch precision in a controlled task and qualitatively evaluates three example applications demonstrating the potential of the technique. The results show that the enabled styl...

Meta-Learning Extractors for Music Source Separation

David Samuel, A. Ganeshan, Jason Naradowsky

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-2020)

We propose a hierarchical meta-learning-inspired model for music source separation (Meta-TasNet) in which a generator model is used to predict the weights of individual extractor models. This enables efficient parameter-sharing, while still allowing for instrument-specific parameterization. Meta-TasNet is shown to be more effective than the models trained independently or in a multi-task setting, and achieve performance comparable with state-of-the-art methods. In comparison to the latter, our extractors contain fewer parameters and have faster run-time performance. We discuss important architectural considerations, and explore the costs and benefits of this approach.

FDA Feature Disruptive Attack

A. Ganeshan, B. S. Vivek, R. V. Babu

IEEE / CVF International Conference on Computer Vision (ICCV), 2019

Though Deep Neural Networks (DNN) show excellent performance across various computer vision tasks, several works show their vulnerability to adversarial samples, i.e., image samples with imperceptible noise engineered to manipulate the network's prediction. Adversarial sample generation methods range from simple to complex optimization techniques. Majority of these methods generate adversaries through optimization objectives that are tied to the pre-softmax or softmax output of the network. In this work we, (i) show the drawbacks of such attacks, (ii) propose two new evaluation metrics, Old Label New Rank (OLNR) and New Label Old Rank (NLOR) in order to quantify the extent of damage made by an attack, and (iii) propose a new adversarial attack FDA Feature...

Enhancing Semantic Segmentation by Learning Expertise between Confusing Classes

A. Ganeshan, G. S. Rajput, R. V. Babu

First International Workshop On Autonomous Navigation in Unconstrained Environments, ECCV 18

Semantic Segmentation is much more challenging in the presence of multiple similar classes, and high intra-class variations. Datasets such as AutoNUE model real-life scenarios, and feature. Large intra-class appearance variations, Presence of low-shot or novel classes. In such scenarios, simple deep-learning approaches can have high confusion among similar classes, and hence perform poorly. To yield improved performance in such a unconstrained dataset, it is important to clearly discern the differences between confusing classes. Hence, in our approach, we propose a novel Expertise-Layer to enhance the learned model’s discerning ability.

Object Pose Estimation from Monocular Image using Multi-View Keypoint Correspondence

J. N. Kundu*, A. Ganeshan*, M. V. Rahul*, R. V. Babu

Geometry Meets Deep Learning Workshop, at ECCV 2018.

Understanding the geometry and pose of objects in 2D images is a fundamental necessity for a wide range of real world applications. Driven by deep neural networks, recent methods have brought significant improvements to object pose estimation. However, they suffer due to scarcity of keypoint/pose-annotated real images and hence can not exploit the object's 3D structural information effectively. In this work, we propose a data-efficient method which utilizes the geometric regularity of intraclass objects for pose estimation. First, we learn pose-invariant local descriptors of object parts from simple 2D RGB images. These descriptors, along with keypoints obtained from renders of a fixed 3D template model are then used to generate keypoint correspondence ma...

Generalizable Data-free Objective for Crafting Universal Adversarial Perturbations

K. M. Reddy*, A. Ganeshan*, R. V. Babu

IEEE Transactions on Patter Analysis and Machine Intelligence, 2018

Machine learning models are susceptible to adversarial perturbations. small changes to input that can cause large changes in output. It is also demonstrated that there exist input-agnostic perturbations, called universal adversarial perturbations, which can change the inference of target model on most of the data samples. However, existing methods to craft universal perturbations are (i) task specific, (ii) require samples from the training data distribution, and (iii) perform complex optimizations. Additionally, because of the data dependence, fooling ability of the crafted perturbations is proportional to the available training data. In this paper, we present a novel, generalizable and data-free approaches for crafting universal adversarial perturbation...

iSPA-Net Iterative Semantic Pose Alignment Network

J. N. Kundu*, A. Ganeshan*, M. V. Rahul*, R. V. Babu

ACM International Conference on Multimedia 2018

Understanding and extracting 3D information of objects from monocular 2D images is a fundamental problem in computer vision. In the task of 3D object pose estimation, recent data driven deep neural network based approaches suffer from scarcity of real images with 3D keypoint and pose annotations. Drawing inspiration from human cognition, where the annotators use a 3D CAD model as structural reference to acquire ground-truth viewpoints for real images; we propose an iterative Semantic Pose Alignment Network, called iSPA-Net. Our approach focuses on exploiting semantic 3D structural regularity to solve the task of fine-grained pose estimation by predicting viewpoint difference between a given pair of images. Such image comparison based approach also allevia...