Manasi Sharma

Hi there! I'm a Research Engineer at Scale AI, where I work on reinforcement learning, evals, and post-training research for training frontier models with better reasoning capabilities for state-of-the-art agents, eg., browser / CUA agents. Before that, I earned my Master's in Computer Science (Artificial Intelligence track) from Stanford University and my undergraduate degree from Columbia University, where I majored in Computer Science and Physics. I consider myself a generalist, passionate about applying AI to a diverse range of real-world challenges, from astronomy to autonomous vehicles to now agents.

Research: Previously, I was Research Scientist in the Artificial Intelligence Group (Group 1) at MIT Lincoln Laboratory, where I focused on explainability & trustworthiness research and developing robust evaluation tools for high-stakes applications of Large Language Models (LLMs). I am also fortunate to have done research in the ILIAD Lab under Prof. Dorsa Sadigh at Stanford, in which I worked on robotic learning using language-conditioned diffusion models, and the Stanford Vision Lab under Prof. Fei-Fei Li and Prof. Prof. Jiajun Wu, in which I worked on assembling action, goal, etc. labels for demonstrations for the BEHAVIOR project in robotic simulation benchmarking. As an undergrad, I was supervised by Prof. Daniel Hsu and Prof. Zoltan Haiman on interpreting astrophysical deep learning models for weak lensing using an array of saliency map methods. Before that, in 2019, I first developed my passion for AI at Caltech under Prof. Mansi Kasliwal , creating a real/bogus image classifier for the IR-Gattini telescope, which was later integrated into its data processing pipeline.

Teaching: I have also been a TA for some of the most popular courses at Stanford, including Prof. Chris Manning's Natural Language Processing course (see my Python tutorial on YouTube), Prof. Fei-Fei Li's Computer Vision course and Prof. Andrew Ng's Deep Learning course.

Areas of Experience: Deep Learning, Natural Language Processing, Generative AI, Agents, Reinforcement Learning & Decision Making, Computer Vision, Diffusion Models, Explainability, Robotics, & Graph Neural Networks.

Everything else: Outside of work, I love dancing (I was Captain of the Columbia Raas dance team), playing the keyboard and exploring the natural landscapes in the Bay Area!

Email / Google Scholar / LinkedIn / Twitter / CV / Github

Education

	Stanford University, School of Engineering Master's in Computer Science Graduated in June 2023 Advisors: Dorsa Sadigh (ILIAD Lab), Fei-Fei Li, Jiajun Wu (Stanford Vision Lab)
	Columbia University, Columbia College BA in Computer Science (major) and Physics (minor) Graduated in May 2021 Advisors: Daniel Hsu, Zoltan Haiman

Publications

	Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models Vaskar Nath, Elaine Lau, Anisha Gunjal, Manasi Sharma, Nikhil Baharte, Sean Hendryx NeurIPS The First Workshop on Efficient Reasoning, 2025 Paper link We study the process through which reasoning models trained with reinforcement learning on verifiable rewards (RLVR) can learn to solve new problems. We find that RLVR drives performance in two main ways: (1) by compressing pass@k into pass@1 and (2) via "capability gain" in which models learn to solve new problems that they previously could not solve even at high k. We find that while capability gain exists across model scales, learning to solve new problems is primarily driven through self-distillation. We demonstrate these findings across model scales ranging from 0.5B to 72B parameters on >500,000 reasoning problems with prompts and verifiable final answers across math, science, and code domains. We further show that we can significantly improve pass@k rates by leveraging natural language guidance for the model to consider within context while still requiring the model to derive a solution chain from scratch. Based of these insights, we derive Guide -- a new class of online training algorithms. Guide adaptively incorporates hints into the model's context on problems for which all rollouts were initially incorrect and adjusts the importance sampling ratio for the "off-policy" trajectories in order to optimize the policy for contexts in which the hints are no longer present. We describe variants of Guide for GRPO and PPO and empirically show that Guide-GRPO on 7B and 32B parameter models improves generalization over its vanilla counterpart with up to 4% macro-average improvement across math benchmarks. We include careful ablations to analyze Guide's components and theoretically analyze Guide's learning efficiency.
	Why Would You Suggest That? Human Trust in Language Model Responses Manasi Sharma, Ho Chit Siu, Rohan Paleja, Jaime D. Peña ICML Humans, Algorithmic Decision-Making and Society Workshop, 2024 Paper link The emergence of Large Language Models (LLMs) has revealed a growing need for human-AI collaboration, especially in creative decision making scenarios where trust and reliance are paramount. Through human studies and model evaluations on the open-ended News Headline Generation task from the LaMP benchmark, we analyze how the framing and presence of explanations affect user trust and model performance. Overall, we provide evidence that adding an explanation in the model response to justify its reasoning significantly increases self-reported user trust in the model when the user has the opportunity to compare various responses. Position and faithfulness of these explanations are also important factors. However, these gains disappear when users are shown responses independently, suggesting that humans trust all model responses, including deceptive ones, equitably when they are shown in isolation. Our findings urge future research to delve deeper into the nuanced evaluation of trust in human-machine teaming systems.
	Exploring and Improving the Spatial Reasoning Abilities of Large Language Models Manasi Sharma NeurIPS Instruction Tuning and Following Workshop, 2023 Paper link Large Language Models (LLMs) represent formidable tools for sequence modeling, boasting an innate capacity for general pattern recognition. Nevertheless, their broader spatial reasoning capabilities remain insufficiently explored. In this paper, we investigate the zero-shot performance of LLMs when confronted with a limited dataset comprising 3D robotic trajectory data and associated tasks, such as directional and motion labeling. Additionally, we introduce a novel prefix-based prompting mechanism, which yields a 30\% improvement on the 3D trajectory data and an increase of up to 16\% on SpartQA tasks when contrasted with the conventional vanilla prompt baseline (with gains over Chain-of-Thought prompting as well). The experimentation with 3D trajectory data offers an intriguing glimpse into the manner in which LLMs engage with numerical and spatial information, thus laying a solid foundation for the identification of target areas for future enhancements.
	BEHAVIOR-1K: A Benchmark for Embodied AI with 1,000 Everyday Activities and Realistic Simulation Chengshu Li, Cem Gokmen, ... Manasi Sharma..., Conference on Robot Learning (CoRL), 2022 Paper link We present BEHAVIOR-1K, a comprehensive simulation benchmark for human-centered robotics, motivated by the results of an extensive survey on `what do you want robots to do for you?'. It includes the definition of 1,000 everyday activities, grounded in 50 scenes (houses, gardens, restaurants, offices, etc.) with more than 3,000 objects annotated with physical and semantic properties. It also includes OmniGibson, a novel simulator that supports these activities via realistic physics simulation and rendering of rigid bodies, deformable bodies, and liquids.
	Interpreting deep learning models for weak lensing Jose Manuel Zorrilla Matilla, Manasi Sharma, Daniel Hsu, Zoltan Haiman American Physical Society, PHYSICAL REVIEW D, 2020 Paper link Deep neural networks (DNNs) are powerful algorithms that have been proven capable of extracting non-Gaussian information from weak lensing (WL) datasets. We apply a series of well-established saliency methods to interpret the DNN and find that the most relevant pixels are those with extreme K values. For noiseless maps, regions with negative K account for 69%-89%. of the attribution of the DNN output, defined as the square of the saliency in input space. In the presence of shape noise, the attribution concentrates in high-convergence regions, with 36%-68% of the attribution in regions with highest 3rd standard deviation of K values.
	Palomar Gattini-IR: Survey Overview, Data Processing System, on-Sky Performance and First Results Kishalay De, Matthew J. Hankins, ... Manasi Sharma, ..., Publications of the Astronomical Society of the Pacific, vol. 132., 2020 Paper link Palomar Gattini-IR is a new wide-field, near-infrared (NIR) robotic time domain survey operating at Palomar Observatory. Using a 30 cm telescope mounted with a H2RG detector, Gattini-IR achieves a field of view (FOV) of 25 sq. deg. with a pixel scale of 8 7 in J-band. Here, we describe the system design, survey operations, data processing system and on-sky performance of Palomar Gattini-IR. To automatically distinguish between an astrophysical source and image subtraction artifacts, we use a ML based real-bogus (RB) classification scheme. Bogus candidates were compiled using a labeling scheme on Zooniverse, a citizen science web portal which allows set up of individual projects usually pertaining to classification and data visualization. The performance of the model was evaluated using the following metrics: accuracy on the test set of 0.975, a Matthews correlation coefficient of 0.949 and an F1 score of 0.977.

Teaching Experience

Stanford School of Engineering

[Mar 2022 -- Jun 2022]

Graduate Teaching Assistant

Managed weekly 'Discussion Sections' of 75+ students for some of the most popular CS courses at Stanford (>500 students); held office hours, constructed & graded HWs. Received >95% excellent reviews ('Very/Extremely Effective').

Taught and covered topics such as backpropagation, convolutional neural networks, visual transformers & attention, RNNs, YOLO, object detection, etc.

Courses taught:

CS 231N (Deep Learning for Computer Vision, Prof. Fei-Fei Li)

CS 230 (Deep Learning, Prof. Andrew Ng):

CS 224N (Natural Language Processing, Prof. Chris Manning)

Some wonderful feedback I received:

"I specifically went to Manasi's office hours to ask conceptual related questions because she was always so good at explaining concepts. She never missed a beat and managed to answer all my questions each time such that I fully understood the concept at the end."

"Helpful, very clear, always willing to get back to me if she wasn't sure how to answer my questions immediately - appreciate how helpful she was!"

Columbia University, MATH 1201-1202 (Calculus III and IV)

[Mar 2022 -- Jun 2022]

Columbia University, Department of Mathematics

Graded assignments, led weekly office hours, etc., for ~150 students in the undergraduate Calculus III class. Consistently received >80% excellent reviews.

Taught and covered topics such as multiple integrals, Green's theorem, vector calculus, Fourier analysis, etc.

Community Engagement

Graduate Community Chair, Stanford Woman in Computer Science

[Jun 2022 -- June 2023]

Organized multiple events for the graduate community including mixers, industry alumni panels, lunches, talks, etc.; spearheaded a focus on MS students through the setup of an MS student alumni panel and care-package mixer.

Managed over $3000 in funds from the CS department and Engineering schools to fund events for graduate students.

Manasi Sharma

Education

Stanford University, School of Engineering

Columbia University, Columbia College

Publications

Teaching Experience

Stanford School of Engineering

[Mar 2022 -- Jun 2022]

Columbia University, MATH 1201-1202 (Calculus III and IV)

[Mar 2022 -- Jun 2022]

Community Engagement

Graduate Community Chair, Stanford Woman in Computer Science

[Jun 2022 -- June 2023]