Manasi Sharma

Hi there! I'm a Research Scientist in the Artificial Intelligence Group (Group 1) at MIT Lincoln Laboratory, where I focus on explainability & trustworthiness research and developing robust evaluation tools for high-stakes applications of Large Language Models (LLMs) [see "Professional / Research Experience" for more details]. Before that, I earned my Master's in Computer Science (Artificial Intelligence track) from Stanford University and my undergraduate degree from Columbia University, where I majored in Computer Science and Physics. I consider myself a generalist, passionate about applying AI to a diverse range of real-world challenges—from astronomy to autonomous vehicles. My current work is driven by an interest in harnessing generative models and AI agents for innovative applications while ensuring they are reliable and resilient.

Research: I am fortunate to have done research in the ILIAD Lab under Prof. Dorsa Sadigh, in which I worked on robotic learning using language-conditioned diffusion models, and the Stanford Vision Lab under Prof. Fei-Fei Li and Prof. Prof. Jiajun Wu, in which I worked on assembling action, goal, etc. labels for demonstrations for the BEHAVIOR project in robotic simulation benchmarking.

Side Projects: I have recently become interested in code generation & built an agentic application to convert code between programming languages (code is not publicly available). I also built an end-to-end application for A/B testing social media marketing materials (images & text) that are generated using GPT-4 and DALL-E and displayed on a Streamlit interface (link).

Teaching: I have also been a TA for some of the most popular courses at Stanford, including Prof. Chris Manning's Natural Language Processing course (see my Python tutorial on YouTube), Prof. Fei-Fei Li's Computer Vision course and Prof. Andrew Ng's Deep Learning course.

Internships: During the summer of 2022, I interned in the Autonomous Vehicles team at Nissan, working on assembling a LiDAR only classification system for road objects (including vegetation, trucks, cars and cyclists). The classification system was deployed in the AV fleet at Nissan that winter.

Undergraduate Experiences: As an undergrad, I was supervised by Prof. Daniel Hsu and Prof. Zoltan Haiman on interpreting astrophysical deep learning models for weak lensing using an array of saliency map methods. Before that, in 2019, I first developed my passion for AI at Caltech under Prof. Mansi Kasliwal , creating a real/bogus image classifier for the IR-Gattini telescope, which was later integrated into its data processing pipeline.

Areas of Experience: Deep Learning, Natural Language Processing, Generative AI, Agents, Reinforcement Learning & Decision Making, Computer Vision, Diffusion Models, Explainability, Robotics, & Graph Neural Networks.

Everything else: Outside of work, I love dancing (I was Captain of the Columbia Raas dance team), playing the keyboard and exploring the natural landscapes in the Bay Area!

Email  /  CV  /  Github  /  Google Scholar  /  LinkedIn  /  Twitter

Education



obj

Stanford University, School of Engineering

Master's in Computer Science
Graduated in June 2023
Advisors: Dorsa Sadigh (ILIAD Lab), Fei-Fei Li, Jiajun Wu (Stanford Vision Lab)
obj

Columbia University, Columbia College

BA in Computer Science (major) and Physics (minor)
Graduated in May 2021
Advisors: Daniel Hsu, Zoltan Haiman

Professional / Research Experience









obj

MIT Lincoln Laboratory

[August 2023 -- present]

  • Spearheaded an AI project on exploring Large Language Model (LLM) explainability & trust (ICML workshop '24) & built an open-source framework for testing LLM agents (llm-sandbox).
  • Architected an internal tool for efficient evaluation of a Retrieval Augmented Generation (RAG) system with over 15 distinct metrics & assessed it on referencing past recordings in a speech-to-text application, achieving 92% acc.
  • Conducted research on the feasibility of LLMs in high-stakes decision-making, as part of a multi-year collaboration of the Lab with MIT Prof.s Philip Isola & Jacob Andreas.
  • Launched 4 significant software updates to evaluation metrics in an open-source AI suite (maite), ramping internal use by 18%.
  • Co-leading key project on adversarial attacks targeting overreliance in interactions with coding agents.
  • obj

    Stanford University, ILIAD Lab

    [Oct 2022 -- Jun 2023]

  • Co-leading a project on creating a framework for the use of diffusion models for trajectory generation (producting the next action a robot should take) conditioned on a language instruction input (eg. "go left"), in a shared autonomy setup (output is a combination of the robot's and human's action).
  • Working on another project on using LLMs as zero-shot labelers of patterns in trajectory data (eg. directions or curvature).
  • obj

    Pear VC

    [Oct 2022 -- Jun 2023]

  • Member of the Pear Garage cohort (one 25 entrepreneurial engineering students who work on meaningful problems and build products to solve them), attended many networking and build sessions with VCs and investors in the generative AI space.
  • obj

    Nissan-Renault-Mitsubishi: Alliance Innovation Laboratory (AIL-SV)

    [June 2022 -- Sep 2022]

  • Internship in the Autonomous Vehicles team at Nissan.
  • Implemented LiDAR point-cloud classification using the 'SimpleView' supervised learning algorithm, which projects a 3D point-cloud onto the 6 2D frames and passes it through a CNN (ResNet backbone).
  • The model was able to classify cars, pedestrians, cyclists and vegetation with over 95% accuracy on real-world data and within the 10 Hz LiDAR processor timeframe.
  • Experimented with a number of architechtures including Graph-based methods like Grid-GCN and point-wise MLP methods like PointNet++, finally settling on the lightweight convolution based method SimpleView.
  • LiDAR point-cloud classification is exceedingly difficult and I encountered numerous challenges-- I was finally able to improve model performance and speed by sparisfying the network, using saliency-based weight-freezing and training on hard-negative samples.
  • The model has been deployed in Nissan Autonomous Vehicles since Winter '22.
  • obj

    Stanford University, Stanford Vision Laboratory

    [Oct 2021 -- June 2022]

    BEHAVIOR Project:
  • Led the development of the Knowledgebase for iGibson and BEHAVIOR-1K, an ImageNet-scale robotic simulation benchmark with a specifc-focus on human-relevant design. Paper accepted to the Conference of Robotics and Learning '22 and presented a talk at ECCV '22.
  • Mobilized ~20 crowd-workers to categorize ~5000 "how-to" articles and used zero-shot Natural Language Processing techniques with GPT-3 to annotate the Virtual Reality videos and generate >97% quality activity definitions in a predicate logic-based language.
  • ADDA (Attention-driven data augmentation):
  • Oversaw 5 person team project on 'Modulated Attention Dropout' technique to allow for better generalization of RL policies through task-importance aware dataset augmentation. Results showed a 2% increase on baseline Behavioral Cloning results.
  • obj

    Columbia University, Data Science Institute

    [Sep 2019 -- June 2021]
  • Under the guidance of Prof.'s Daniel Hsu and Zoltan Haiman, worked on targeting the "explainability" & trustworthiness of neural networks in the more traditional field of Astronomy.
  • Discovered that 89% of the output of a popular neural network that uses gravitational lensing maps to predict cosmological parameters (omega_m and sigma_8) was counterintuitively attributable to negative image regions (voids, black holes, etc.) as opposed to the bright image regions (stars, galaxies, etc.). Published results in APS Physical Review '20.
  • Also ran sanity checks on a number of popular saliency methods and found that gradient-based methods (eg. Grad-CAM / Input x Gradients, etc.) were most robust to model parameters.
  • obj

    California Institute of Technology, Division of Physics, Mathematics and Astronomy

    [Jun 2019 -- Aug 2019]
  • Under the guidance of Prof. Mansi Kasliwal at Caltech, pioneered the development of a flagship CNN-based real/bogus image classification system for Caltech's Gattini-IR Telescope using TensorFlow (link), which achieved ~97.5% accuracy on thousands of cosmic transient sources, and published results in PASP '20.
  • The model worked so well that it was deployed the model in the Telescope's data processing pipeline (still active), replacing the manual classification process.
  • I also used the results of the model to identify high-confidence transient sources that I performed optical followup on by operating the 200-inch telescope at the Palomar Observatory in Southern California.
  • obj

    Columbia University, Department of Physics

    [Jun 2018 -- May 2019]
  • Worked with Prof. Charles Hailey in the NuSTAR Group as a Laidlaw Research Intern.
  • Modeled/analyzed data from NASA's NuSTAR telescope for 'AM Her' & 'HU Aqr' sources to determine key parameters such as temperature and periodicity.
  • Publications






    obj Why Would You Suggest That? Human Trust in Language Model Responses
    Manasi Sharma, Ho Chit Siu, Rohan Paleja, Jaime D. Peña
    ICML Humans, Algorithmic Decision-Making and Society Workshop, 2024
    Paper link

    The emergence of Large Language Models (LLMs) has revealed a growing need for human-AI collaboration, especially in creative decision making scenarios where trust and reliance are paramount. Through human studies and model evaluations on the open-ended News Headline Generation task from the LaMP benchmark, we analyze how the framing and presence of explanations affect user trust and model performance. Overall, we provide evidence that adding an explanation in the model response to justify its reasoning significantly increases self-reported user trust in the model when the user has the opportunity to compare various responses. Position and faithfulness of these explanations are also important factors. However, these gains disappear when users are shown responses independently, suggesting that humans trust all model responses, including deceptive ones, equitably when they are shown in isolation. Our findings urge future research to delve deeper into the nuanced evaluation of trust in human-machine teaming systems.

    obj Exploring and Improving the Spatial Reasoning Abilities of Large Language Models
    Manasi Sharma
    NeurIPS Instruction Tuning and Following Workshop, 2023
    Paper link

    Large Language Models (LLMs) represent formidable tools for sequence modeling, boasting an innate capacity for general pattern recognition. Nevertheless, their broader spatial reasoning capabilities remain insufficiently explored. In this paper, we investigate the zero-shot performance of LLMs when confronted with a limited dataset comprising 3D robotic trajectory data and associated tasks, such as directional and motion labeling. Additionally, we introduce a novel prefix-based prompting mechanism, which yields a 30\% improvement on the 3D trajectory data and an increase of up to 16\% on SpartQA tasks when contrasted with the conventional vanilla prompt baseline (with gains over Chain-of-Thought prompting as well). The experimentation with 3D trajectory data offers an intriguing glimpse into the manner in which LLMs engage with numerical and spatial information, thus laying a solid foundation for the identification of target areas for future enhancements.

    obj BEHAVIOR-1K: A Benchmark for Embodied AI with 1,000 Everyday Activities and Realistic Simulation
    Chengshu Li, Cem Gokmen, ... Manasi Sharma...,
    Conference on Robot Learning (CoRL), 2022
    Paper link

    We present BEHAVIOR-1K, a comprehensive simulation benchmark for human-centered robotics, motivated by the results of an extensive survey on `what do you want robots to do for you?'. It includes the definition of 1,000 everyday activities, grounded in 50 scenes (houses, gardens, restaurants, offices, etc.) with more than 3,000 objects annotated with physical and semantic properties. It also includes OmniGibson, a novel simulator that supports these activities via realistic physics simulation and rendering of rigid bodies, deformable bodies, and liquids.

    obj Interpreting deep learning models for weak lensing
    Jose Manuel Zorrilla Matilla, Manasi Sharma, Daniel Hsu, Zoltan Haiman
    American Physical Society, PHYSICAL REVIEW D, 2020
    Paper link

    Deep neural networks (DNNs) are powerful algorithms that have been proven capable of extracting non-Gaussian information from weak lensing (WL) datasets. We apply a series of well-established saliency methods to interpret the DNN and find that the most relevant pixels are those with extreme K values. For noiseless maps, regions with negative K account for 69%-89%. of the attribution of the DNN output, defined as the square of the saliency in input space. In the presence of shape noise, the attribution concentrates in high-convergence regions, with 36%-68% of the attribution in regions with highest 3rd standard deviation of K values.

    obj Palomar Gattini-IR: Survey Overview, Data Processing System, on-Sky Performance and First Results
    Kishalay De, Matthew J. Hankins, ... Manasi Sharma, ...,
    Publications of the Astronomical Society of the Pacific, vol. 132., 2020
    Paper link

    Palomar Gattini-IR is a new wide-field, near-infrared (NIR) robotic time domain survey operating at Palomar Observatory. Using a 30 cm telescope mounted with a H2RG detector, Gattini-IR achieves a field of view (FOV) of 25 sq. deg. with a pixel scale of 8 7 in J-band. Here, we describe the system design, survey operations, data processing system and on-sky performance of Palomar Gattini-IR. To automatically distinguish between an astrophysical source and image subtraction artifacts, we use a ML based real-bogus (RB) classification scheme. Bogus candidates were compiled using a labeling scheme on Zooniverse, a citizen science web portal which allows set up of individual projects usually pertaining to classification and data visualization. The performance of the model was evaluated using the following metrics: accuracy on the test set of 0.975, a Matthews correlation coefficient of 0.949 and an F1 score of 0.977.

    Teaching Experience



    obj

    Stanford School of Engineering

    [Mar 2022 -- Jun 2022]
    Graduate Teaching Assistant
  • Managed weekly 'Discussion Sections' of 75+ students for some of the most popular CS courses at Stanford (>500 students); held office hours, constructed & graded HWs. Received >95% excellent reviews ('Very/Extremely Effective').
  • Taught and covered topics such as backpropagation, convolutional neural networks, visual transformers & attention, RNNs, YOLO, object detection, etc.
  • Courses taught:
  • CS 231N (Deep Learning for Computer Vision, Prof. Fei-Fei Li)
  • CS 230 (Deep Learning, Prof. Andrew Ng):
  • CS 224N (Natural Language Processing, Prof. Chris Manning)
  • Some wonderful feedback I received:
  • "I specifically went to Manasi's office hours to ask conceptual related questions because she was always so good at explaining concepts. She never missed a beat and managed to answer all my questions each time such that I fully understood the concept at the end."
  • "Helpful, very clear, always willing to get back to me if she wasn't sure how to answer my questions immediately - appreciate how helpful she was!"
  • obj

    Columbia University, MATH 1201-1202 (Calculus III and IV)

    [Mar 2022 -- Jun 2022]
    Columbia University, Department of Mathematics
  • Graded assignments, led weekly office hours, etc., for ~150 students in the undergraduate Calculus III class. Consistently received >80% excellent reviews.
  • Taught and covered topics such as multiple integrals, Green's theorem, vector calculus, Fourier analysis, etc.
  • Community Engagement


    obj

    Graduate Community Chair, Stanford Woman in Computer Science

    [Jun 2022 -- June 2023]

  • Organized multiple events for the graduate community including mixers, industry alumni panels, lunches, talks, etc.; spearheaded a focus on MS students through the setup of an MS student alumni panel and care-package mixer.
  • Managed over $3000 in funds from the CS department and Engineering schools to fund events for graduate students.
  • Stanford Course Projects

      Debiasing Models for Out-of-domain Generalization - CS224N (NLP for Deep Learning)

      Winter '22
    • Exceeded BERT's performance on out-of-domain question-answering data by 2.5% by using debiasing models (link).
    • Crowd Aware Intent-based Reinforcement Learning - CS333 (Algorithms for Interactive Robotics)

      Winter '22
    • Reduced collision rate in crowd navigation by 50% by leveraging human latent intent reinforcement learning (link).
    • Predicting Drug Interactions with Graph Neural Networks - CS224W (Machine Learning with Graphs)

      Fall '21
    • Used the Graph Isomorphism Network to exceed 11th place on ogbl-ddli leaderboard (link, selected for course website).
    • Optimizing Wind Turbine Placement Subject to Turbine Wakes - CS238 (Decision Making Under Uncertainty)

      Fall '21
    • Applied Q-Learning to windfarms to generate sensible layouts that maximize power, subject to wake constraints (link).
    • LIMES: LIME for Image Segmentation - CS329T (Trustworthy Machine Learning)

      Spring '22
    • Devised a LIME algorithm variant for facial segmentation that achieves explainability like gradient-based methods.
    • Monte-Carlo Tree Search Player - CS227B (General Game Playing)

      Spring '22
    • Designed a player to play any game, using MCTS, multi-threating, grounding, etc.; placed 8th in the class (link).
    • TurtleBot Autonomous System - CS237A (Principles of Robot Autonomy)

      Fall '22
    • Deploying incrementally built autonomy stack on TurtleBot for self-driving capabilities in a mock urban environment.

    Honors & Awards

    • 1 of 18 accepted to the GFSD (Graduate Fellowships for STEM Diversity) Program. -- [Mar '22]
    • 1 of 50 accepted into Google's CS Research Mentorship Program (CSRMP), Class of 2022A. -- [Feb '22]
    • Selected for the final round of the GEM Fellowship. -- [Jan '22]
    • Dean's List (in 6 out of 7 graded semesters, awarded to top 20%), Columbia University. -- [Fall '17 - Fall '20]
    • Columbia Undergraduate Research Fellowship (URF), Columbia College Summer Funding Program. -- [May '20]
    • Visiting Undergraduate Research Program (VURP) Award, California Institute of Technology. -- [May '19]
    • 1 of 25 awarded Laidlaw Undergraduate Research & Leadership Scholarship, Columbia University. -- ['18 - '19]
    • Andy Grove Scholarship for Intel Employees' Children, Intel Foundation. -- [Feb '19]

    Leadership Roles and Extra-curriculars

    • Graduate Community Chair, Women in Computer Science, Stanford University. -- [Jun '22 - present]
    • Elected as Social Chair for GradSWE, Stanford University. -- [Jun '22]
    • Founder & Project Leader, COVID-19 Public Hub website highlighting Columbia research. -- [Jan '22]
    • Corporate Chair, Women in Computer Science, Columbia University. -- [Apr '20 - Jun '20]
    • Class 3 Curriculum Developer (AI section), Girls Who Code, Columbia University. -- [Feb '20 - Aug '20]
    • Executive Board UG Student Coordinator, Columbia Society for Women in Physics. -- [Sep '18 - Sep '19]
    • Captain, 'Columbia Raas' Dance Team (member since Sep 2017), Columbia University. -- [Apr '20 - Jun '21]
    • Crew Captain, New Student Orientation Program, Columbia University. -- [Aug '20 - Sep '20]
    • Jam Leader, Columbia Design Jam. -- [May '20 - Jul '20]
    • Co-President, 'Symposium in India' Student Club, Columbia University. -- [Sep '18 - May '19