Aditya Ganeshan

Research Overview

This section is currently in the refinery!

Past Research

Previously, I worked as a Researcher at Preferred Networks, Japan, and, before that, as a Research Assistant at Video Analytics Lab, IISc, Bangalore, India. I have worked on various projects in Computer Vision, Machine Learning, Audio Processing, Human-Computer Interaction, and Robotics. I have been really fortunate to collaborate and get mentorship from some amazing people - Daniel Ritchie, R Venkatesh Babu, Siddhartha Chaudhuri, Shin-ichi Maeda, Adrien Gaidon, Rares Ambrus, Jason Naradowsky, and Fabrice Matulic to name a few.

Some of the central themes in my previous research:

Novel methods for Computer Vision tasks (Semantic Segmentation, 3D Object Pose estimation, etc.),

Stability of deep learnt features (Adversarial Attacks),

Application of DL in Audio (source separation) and HCI (Tracking in VR).

For a complete list of published papers, visit Google Scholar.

Research:

Pattern Analogies - Learning to Perform Programmatic Image Edits by Analogy

A. Ganeshan, Thibault Groueix, Paul Guerrero, Radomír Měch, Matthew Fisher and Daniel Ritchie

IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) 2025

Pattern images are everywhere in the digital and physical worlds, and tools to edit them are valuable. But editing pattern images is tricky - desired edits are often programmatic - structure-aware edits that alter the underlying program which generates the pattern. One could attempt to infer this underlying program, but current methods for doing so struggle with complex images and produce unorganized programs that make editing tedious. In this work, we introduce a novel approach to perform programmatic edits on pattern images. By using a pattern analogy—a pair of simple patterns to demonstrate the intended edit—and a learning-based generative model to execute these edits, our method allows users to intuitively edit patterns. To enable this paradigm, we in...

Project Page Publication

ParSEL - Parameterized Shape Editing with Language

A. Ganeshan, Ryan Y. Huang, Xianghao Xu, R. Kenny Jones and Daniel Ritchie

ACM Siggraph Asia 2024, Journal at Transactions on Graphics (TOG) 2024.

The ability to edit 3D assets from natural language presents a compelling paradigm to aid in the democratization of 3D content creation. However, while natural language is often effective at communicating general intent, it is poorly suited for specifying precise manipulation. To address this gap, we introduce ParSEL, a system that enables controllable editing of high-quality 3D assets from natural language. Given a segmented 3D mesh and an editing request, \textsc{ParSEL} produces a parameterized editing program. Adjusting the program parameters allows users to explore shape variations with a precise control over the magnitudes of edits. To infer editing programs which align with an input edit request, we leverage the abilities of large-language models (...

Project Page Publication

Learning to Edit Visual Programs with Self-Supervision

R Kenny Jones, Renhao Zhang, A. Ganeshan, Daniel Ritchie

Conference on Neural Information Processing Systems (NeurIPS), 2024

We design a system that learns how to edit visual programs. Our edit network consumes a complete input program and a visual target. From this input, we task our network with predicting a local edit operation that could be applied to the input program to improve its similarity to the target. In order to apply this scheme for domains that lack program annotations, we develop a self-supervised learning approach that integrates this edit network into a bootstrapped finetuning loop along with a network that predicts entire programs in one-shot. Our joint finetuning scheme, when coupled with an inference procedure that initializes a population from the one-shot model and evolves members of this population with the edit network, helps to infer more accurate visu...

Project Page Publication Code on GitHub

Creating Language-driven Spatial Variations of Icon Images

Xianghao Xu, A. Ganeshan, Karl DD Willis, Yewen Pu, Daniel Ritchie

Arxiv

Editing 2D icon images can require significant manual effort from designers. It involves manipulating multiple geometries while maintaining the logical or physical coherence of the objects depicted in the image. Previous language driven image editing methods can change the texture and geometry of objects in the image but fail at producing spatial variations, i.e. modifying spatial relations between objects while maintaining their identities. We present a language driven editing method that can produce spatial variations of icon images. Our method takes in an icon image along with a user's editing request text prompt and outputs an edited icon image reflecting the user's editing request. Our method is designed based on two key observations - (1) A user's e...

Publication

Open-universe indoor scene generation using llm program synthesis and uncurated object databases

Rio Aguina-Kang, Maxim Gumin, Do Heon Han, Stewart Morris, Seung Jean Yoo, A. Ganeshan, R Kenny Jones, Qiuhong Anna Wei, Kailiang Fu, Daniel Ritchie

Arxiv

We present a system for generating indoor scenes in response to text prompts. The prompts are not limited to a fixed vocabulary of scene descriptions, and the objects in generated scenes are not restricted to a fixed set of object categories -- we call this setting indoor scene generation. Unlike most prior work on indoor scene generation, our system does not require a large training dataset of existing 3D scenes. Instead, it leverages the world knowledge encoded in pre-trained large language models (LLMs) to synthesize programs in a domain-specific layout language that describe objects and spatial relations between them. Executing such a program produces a specification of a constraint satisfaction problem, which the system solves using a gradient-based ...

Publication

Improving Unsupervised Visual Program Inference with Code Rewriting Families

A. Ganeshan, R. Kenny Jones and Daniel Ritchie

Oral (1.8%) IEEE / CVF International Conference on Computer Vision (ICCV), 2023

Programs offer compactness and structure that makes them an attractive representation for visual data. We explore how code rewriting can be used to improve systems for inferring programs from visual data. We first propose Sparse Intermittent Rewrite Injection(SIRI), a framework for unsupervised bootstrapped learning. SIRI sparsely applies code rewrite operations over a dataset of training programs, injecting the improved programs back into the training set. We design a family of rewriters for visual programming domains - Parameter Optimization (PO), Code Pruning (CP), and Code Grafting (CG). For three shape programming languages in 2D and 3D, we show that using SIRI with our family of rewriters improves performance - better reconstructions a...

Project Page Publication Code on GitHub

Skill Generalization with Verbs

R. Ma, L. Lam, B. A. Spiegel, A. Ganeshan, B. Abbatematteo, R. Patel, D. Paulius, S. Tellex, G. Konidaris.

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023

It is imperative that robots can understand natural language commands issued by humans. Such commands typically contain verbs that signify what action should be performed on a given object and that are applicable to many objects. We propose a method for generalizing manipulation skills to novel objects using verbs. Our method learns a probabilistic classifier that determines whether a given object trajectory can be described by a specific verb. We show that this classifier accurately generalizes to novel object categories with an average accuracy of 76.69\% across 13 object categories and 14 verbs. We then perform policy search over the object kinematics to find an object trajectory that maximizes classifier prediction for a given verb. Our method allows ...

Publication

Warp-Refine Propagation Semi-Supervised Auto-labeling via Cycle-consistency

A. Ganeshan , Alexis Vallet, Yasunori Kudo, Shin-ichi Maeda, Tommi Kerola, Rares Ambrus, Dennis Park, Adrien Gaidon

IEEE / CVF International Conference on Computer Vision (ICCV), 2021

Deep learning models for semantic segmentation rely on expensive, large-scale, manually annotated datasets. Labelling is a tedious process that can take hours per image. Automatically annotating video sequences by propagating sparsely labeled frames through time is a more scalable alternative. In this work, we propose a novel label propagation method, termed Warp-Refine Propagation, that combines semantic cues with geometric cues to efficiently auto-label videos. Our method learns to refine geometrically-warped labels and infuse them with learned semantic priors in a semi-supervised setting by leveraging cycle consistency across time. We quantitatively show that our method improves label-propagation by a noteworthy margin of 13.1 mIoU on the ApolloScape d...

Publication

Phonetroller - Visual Representations of Fingers for Precise Touch Input with Mobile Phones in VR

Fabrice Matulic, A. Ganeshan, Hiroshi Fujiwara, Daniel Vogel

Conference on Human Factors in Computing Systems, CHI 21

Smartphone touch screens are potentially attractive for interaction in virtual reality (VR). However, the user cannot see the phone or their hands in a fully immersive VR setting, impeding their ability for precise touch input. We propose mounting a mirror above the phone screen such that the front-facing camera captures the thumbs on or near the screen. This enables the creation of semi-transparent overlays of thumb shadows and inference of fingertip hover points with deep learning, which help the user aim for targets on the phone. A study compares the effect of visual feedback on touch precision in a controlled task and qualitatively evaluates three example applications demonstrating the potential of the technique. The results show that the enabled styl...

Project Page Publication

Meta-Learning Extractors for Music Source Separation

David Samuel, A. Ganeshan, Jason Naradowsky

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-2020)

We propose a hierarchical meta-learning-inspired model for music source separation (Meta-TasNet) in which a generator model is used to predict the weights of individual extractor models. This enables efficient parameter-sharing, while still allowing for instrument-specific parameterization. Meta-TasNet is shown to be more effective than the models trained independently or in a multi-task setting, and achieve performance comparable with state-of-the-art methods. In comparison to the latter, our extractors contain fewer parameters and have faster run-time performance. We discuss important architectural considerations, and explore the costs and benefits of this approach.

Project Page Publication Code on GitHub

FDA Feature Disruptive Attack

A. Ganeshan, B. S. Vivek, R. V. Babu

IEEE / CVF International Conference on Computer Vision (ICCV), 2019

Though Deep Neural Networks (DNN) show excellent performance across various computer vision tasks, several works show their vulnerability to adversarial samples, i.e., image samples with imperceptible noise engineered to manipulate the network's prediction. Adversarial sample generation methods range from simple to complex optimization techniques. Majority of these methods generate adversaries through optimization objectives that are tied to the pre-softmax or softmax output of the network. In this work we, (i) show the drawbacks of such attacks, (ii) propose two new evaluation metrics, Old Label New Rank (OLNR) and New Label Old Rank (NLOR) in order to quantify the extent of damage made by an attack, and (iii) propose a new adversarial attack FDA Feature...

Project Page Publication Code on GitHub

Enhancing Semantic Segmentation by Learning Expertise between Confusing Classes

A. Ganeshan, G. S. Rajput, R. V. Babu

First International Workshop On Autonomous Navigation in Unconstrained Environments, ECCV 18

Semantic Segmentation is much more challenging in the presence of multiple similar classes, and high intra-class variations. Datasets such as AutoNUE model real-life scenarios, and feature. Large intra-class appearance variations, Presence of low-shot or novel classes. In such scenarios, simple deep-learning approaches can have high confusion among similar classes, and hence perform poorly. To yield improved performance in such a unconstrained dataset, it is important to clearly discern the differences between confusing classes. Hence, in our approach, we propose a novel Expertise-Layer to enhance the learned model’s discerning ability.

Poster

Object Pose Estimation from Monocular Image using Multi-View Keypoint Correspondence

J. N. Kundu*, A. Ganeshan*, M. V. Rahul*, R. V. Babu

Geometry Meets Deep Learning Workshop, at ECCV 2018.

Understanding the geometry and pose of objects in 2D images is a fundamental necessity for a wide range of real world applications. Driven by deep neural networks, recent methods have brought significant improvements to object pose estimation. However, they suffer due to scarcity of keypoint/pose-annotated real images and hence can not exploit the object's 3D structural information effectively. In this work, we propose a data-efficient method which utilizes the geometric regularity of intraclass objects for pose estimation. First, we learn pose-invariant local descriptors of object parts from simple 2D RGB images. These descriptors, along with keypoints obtained from renders of a fixed 3D template model are then used to generate keypoint correspondence ma...

Publication Code on GitHub

Generalizable Data-free Objective for Crafting Universal Adversarial Perturbations

K. M. Reddy*, A. Ganeshan*, R. V. Babu

IEEE Transactions on Patter Analysis and Machine Intelligence, 2018

Machine learning models are susceptible to adversarial perturbations. small changes to input that can cause large changes in output. It is also demonstrated that there exist input-agnostic perturbations, called universal adversarial perturbations, which can change the inference of target model on most of the data samples. However, existing methods to craft universal perturbations are (i) task specific, (ii) require samples from the training data distribution, and (iii) perform complex optimizations. Additionally, because of the data dependence, fooling ability of the crafted perturbations is proportional to the available training data. In this paper, we present a novel, generalizable and data-free approaches for crafting universal adversarial perturbation...

Project Page Publication Code on GitHub

iSPA-Net Iterative Semantic Pose Alignment Network

J. N. Kundu*, A. Ganeshan*, M. V. Rahul*, R. V. Babu

ACM International Conference on Multimedia 2018

Understanding and extracting 3D information of objects from monocular 2D images is a fundamental problem in computer vision. In the task of 3D object pose estimation, recent data driven deep neural network based approaches suffer from scarcity of real images with 3D keypoint and pose annotations. Drawing inspiration from human cognition, where the annotators use a 3D CAD model as structural reference to acquire ground-truth viewpoints for real images; we propose an iterative Semantic Pose Alignment Network, called iSPA-Net. Our approach focuses on exploiting semantic 3D structural regularity to solve the task of fine-grained pose estimation by predicting viewpoint difference between a given pair of images. Such image comparison based approach also allevia...

Publication Code on GitHub