Dhruti Shah
Hi, I'm Dhruti! I am a Machine Learning Engineer at Apple in Zurich. I work on multimodal foundation models at AI/ML Apple.
I obtained my Masters in Data Science at École polytechnique fédérale de Lausanne (EPFL). I did my Master's thesis at IBM Research Zurich, in collaboration with Prof Rudiger Urbanke at EPFL.
I completed my Bachelors+Masters Dual Degree in Electrical Engineering from Indian Institute of Technology, Bombay (IITB), with a specialization in Communication and Signal Processing, in 2014. I was awarded the Shankar Dayal Sharma Institute Gold Medal for general proficiency, excellence in academic performance, extra curricular activities and social services among all graduating students.
I've had the pleasure of working at:
I have led the IITB Sports contingent of 150 athletes to win the coveted Inter IIT Sports Meet. I love to play tennis, badminton and chess, and I am an avid reader.
Email  / 
CV  / 
LinkedIn
|
|
|
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
[Paper]
Apple Zurich
May 2022 - Current
Core author on MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training. Lead the implementation and benchmarking of image-text interleaved Supervised Finetuning pipeline for MM1. Working on core capabilities for in-house multimodal foundation models.
|
|
Interactive Fast Annotation Method for Machine Learning pipelines
Masters thesis, IBM Research Zurich
August 2021 - January 2022
In recent times, deep learning methods have achieved remarkable performance on various computer vision tasks, thanks to the availability of large, well-curated data sets (e.g. ImageNet, COCO). However, there still exist several application scenarios where either the amount or quality of annotations is limited. This warrants a need for a solution in novel use cases that includes annotating or re-annotating application specific data. Existing tools for annotations present several limitations including operating with high-resolution images, as well as annotating rare application data, such as defects on concrete surfaces. To address these issues, we are working towards developing a method for auto-annotation.
|
|
Improved Image Stitching for Defect and Anomaly Detection
Research Intern, IBM Research Zurich
[Patent]
July 2020 - January 2021
Current image stitching algorithms suffer from scalability, low speed and distort component images for a smooth output. To address these challenges, we develop an image stitching method, using OpenCV, for a large number of planar images captured by a drone. Our algorithm runs on the GPU providing a speed-up of 30x, and ability to stitch more than 100 images in under 5 min. Our method was integrated with state-of-the-art defect detection methods, to localize defects on high-end civil engineering infrastructure. Further, we utilized image registration techniques to study the evolution of defects over time.
|
|
Long-term motion prediction using keyposes
Semester Project, CV Lab, EPFL
(Publication in progress)
February 2021 - June 2021
The problem of human pose motion forecasting can be tackled by decomposing the input sequence into few essential 'keyposes' and performing prediction over these keyposes. Current works determine the key poses using traditional k-Means clustering and perform sequence prediction using RNN-based architectures. In natural language processing and vision, transformers are becoming the de-facto model for sequence prediction. Therefore, we study the improvement obtained by replacing the LSTMs with the transformer architecture. Additionally, we explore the use of VQ-VAE based models to obtain a better set of keyposes.
Report
|
|
Top-m entity resolution
Dual Degree thesis, Prof Nikhil Karamchandani, IITB
(AAAI 2020)
January 2018 - June 2019
We developed information theoretic bounds and algorithms to identify the top clusters for entity resolution in presence of an oracle. We considered two cases, one with a noisy oracle and the second with noisy side information matrix. Provided a theoretical proof and supporting empirical study (on Amazon Purchase Dataset) that our algorithm reduces the query complexity from O(n*2) to O(nlogn) in both cases.
AAAI 2020 Paper, NCC 2020 Paper
|
|
Feature enhancement for flash memory communication
Summer Internship, Qualcomm
May 2017 - July 2017
Modified and enhanced the primary tool used responsible for sending Operating System (OS) images from the source to flash memory on target, thereby loading and booting the OS on target. Optimized time requirements by 50% through compression of sparse files & sending smaller data chunks. Innovated the handling of partitions for NAND targets through integration with existing GPT partition tables. Tested the above enhancements and innovations on Qualcomm devices before real-world deployment
|
|