Dhruti Shah

Hi, I'm Dhruti! I am a Machine Learning Engineer at Apple  in Zurich. I work on multimodal foundation models at AI/ML Apple.

I obtained my Masters in Data Science at École polytechnique fédérale de Lausanne (EPFL). I did my Master's thesis at IBM Research Zurich, in collaboration with Prof Rudiger Urbanke at EPFL.

I completed my Bachelors+Masters Dual Degree in Electrical Engineering from Indian Institute of Technology, Bombay (IITB), with a specialization in Communication and Signal Processing, in 2014. I was awarded the Shankar Dayal Sharma Institute Gold Medal for general proficiency, excellence in academic performance, extra curricular activities and social services among all graduating students.

I've had the pleasure of working at:

Apple , AI/ML, Zurich.
IBM Research, Zurich in the AI Automation team with Cristiano Malossi and Florian Scheidegger.
Computer Vision Lab, EPFL with Sena Kıcıroğlu, Pascal Fua.
Information Processing Group, EPFL with Prof Rudiger Urbanke.
Signal Processing Lab, IITB with Prof Nikhil Karamchandani.
Qualcomm, India.

I have led the IITB Sports contingent of 150 athletes to win the coveted Inter IIT Sports Meet. I love to play tennis and badminton, and I am an avid reader.

Email / CV / LinkedIn

Key Research Projects

	MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning [Paper] Apple Zurich May 2022 - Current Core author on MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning. Led the supervised finetuning efforts for multi-image reasoning and in-context learning. Developed efficient data pipelines for scalable model training.
	MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training [Paper], ECCV 2024 Apple Zurich May 2022 - Current Core author on MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training. Lead the implementation and benchmarking of image-text interleaved Supervised Finetuning pipeline for MM1. Working on core capabilities for in-house multimodal foundation models.
	Interactive Fast Annotation Method for Machine Learning pipelines Masters thesis, IBM Research Zurich August 2021 - January 2022 In recent times, deep learning methods have achieved remarkable performance on various computer vision tasks, thanks to the availability of large, well-curated data sets (e.g. ImageNet, COCO). However, there still exist several application scenarios where either the amount or quality of annotations is limited. This warrants a need for a solution in novel use cases that includes annotating or re-annotating application specific data. Existing tools for annotations present several limitations including operating with high-resolution images, as well as annotating rare application data, such as defects on concrete surfaces. To address these issues, we are working towards developing a method for auto-annotation.
	Improved Image Stitching for Defect and Anomaly Detection Research Intern, IBM Research Zurich [Patent] July 2020 - January 2021 Current image stitching algorithms suffer from scalability, low speed and distort component images for a smooth output. To address these challenges, we develop an image stitching method, using OpenCV, for a large number of planar images captured by a drone. Our algorithm runs on the GPU providing a speed-up of 30x, and ability to stitch more than 100 images in under 5 min. Our method was integrated with state-of-the-art defect detection methods, to localize defects on high-end civil engineering infrastructure. Further, we utilized image registration techniques to study the evolution of defects over time.
	Long-term motion prediction using keyposes Semester Project, CV Lab, EPFL (Publication in progress) February 2021 - June 2021 The problem of human pose motion forecasting can be tackled by decomposing the input sequence into few essential 'keyposes' and performing prediction over these keyposes. Current works determine the key poses using traditional k-Means clustering and perform sequence prediction using RNN-based architectures. In natural language processing and vision, transformers are becoming the de-facto model for sequence prediction. Therefore, we study the improvement obtained by replacing the LSTMs with the transformer architecture. Additionally, we explore the use of VQ-VAE based models to obtain a better set of keyposes. Report
	Top-m entity resolution Dual Degree thesis, Prof Nikhil Karamchandani, IITB (AAAI 2020) January 2018 - June 2019 We developed information theoretic bounds and algorithms to identify the top clusters for entity resolution in presence of an oracle. We considered two cases, one with a noisy oracle and the second with noisy side information matrix. Provided a theoretical proof and supporting empirical study (on Amazon Purchase Dataset) that our algorithm reduces the query complexity from O(n*2) to O(nlogn) in both cases. AAAI 2020 Paper, NCC 2020 Paper
	Feature enhancement for flash memory communication Summer Internship, Qualcomm May 2017 - July 2017 Modified and enhanced the primary tool used responsible for sending Operating System (OS) images from the source to flash memory on target, thereby loading and booting the OS on target. Optimized time requirements by 50% through compression of sparse files & sending smaller data chunks. Innovated the handling of partitions for NAND targets through integration with existing GPT partition tables. Tested the above enhancements and innovations on Qualcomm devices before real-world deployment

Cloned from here!