Manasi Sharma

Hi there! I'm a Research Engineer at Scale AI, where I work on evaluations, benchmarking, and reinforcement learning / post-training research to train frontier models with better reasoning capabilities for state-of-the-art agents like browser & computer-use agents. My work has been published at top-tier Machine Learning conference venues such as ICLR, ICML, and NeurIPS. Previously, I was a Research Scientist in the Artificial Intelligence Group (Group 1) at MIT Lincoln Laboratory, where I focused on explainability & trustworthiness research and developing robust evaluation tools for high-stakes applications of Large Language Models (LLMs). Before that, I earned my Master's in Computer Science (Artificial Intelligence track) from Stanford University and my undergraduate degree from Columbia University, where I majored in Computer Science and Physics. I consider myself a generalist, passionate about applying AI to a diverse range of real-world challenges, from astronomy to autonomous vehicles to now agents.

I am fortunate to have worked with many different professors over the course of my academic career. I conducted research in the ILIAD Lab under Prof. Dorsa Sadigh at Stanford, in which I worked on robotic learning using language-conditioned diffusion models, and the Stanford Vision Lab under Prof. Fei-Fei Li and Prof. Jiajun Wu, in which I worked on assembling action, goal, etc. labels for demonstrations for the BEHAVIOR project in robotic simulation benchmarking. As an undergrad, I was supervised by Prof. Daniel Hsu and Prof. Zoltan Haiman on interpreting astrophysical deep learning models for weak lensing using an array of saliency map methods. Before that, in 2019, I first developed my passion for AI at Caltech under Prof. Mansi Kasliwal, creating a real/bogus image classifier for the IR-Gattini telescope, which was later integrated into its data processing pipeline.

My research interests include AI evaluation methodologies, agents, reinforcement learning, and AI safety and alignment.

Everything else: Outside of work, I love dancing (I was Captain of the Columbia Raas dance team), playing the keyboard and exploring the natural landscapes in the Bay Area!

Manasi Sharma

Updates

Education

Stanford University

Stanford University, School of Engineering

Master's in Computer Science

Graduated in June 2023

Advisors: Dorsa Sadigh (ILIAD Lab), Fei-Fei Li, Jiajun Wu (Stanford Vision Lab)

Columbia University

Columbia University, Columbia College

BA in Computer Science with concentration in Physics

Graduated in May 2021

Advisors: Daniel Hsu, Zoltan Haiman

Work Experience

Scale AI

Mar 2025 — Present
ML Research Engineer, Reasoning & Agents Research Team, Scale Labs — San Francisco, CA
  • Research Lead for computer-use & browser-use agents, spearheading 3 first-author publications (7 total) on evaluations at top-tier conferences (ICLR, ICML); including ResearchRubrics, referenced in a Perplexity AI agent release.
  • Collaborated with external organizations like the Center for AI Safety at Berkeley on benchmarks for economically useful tasks (Remote Labor Index) and tool-use (MCP-Atlas), referenced in OpenAI & Anthropic's latest model cards.
  • Post-trained models with RL to analyze strategies for optimal data learning and contributed to research on an efficient RL algorithm with intermediate rewards (NeurIPS '25 workshop).
  • Collaborated with researchers at top frontier labs, offering consultation on advanced computer-use data types involving programmatic & agentic verifiers.

MIT Lincoln Laboratory

Aug 2023 — Mar 2025
Research Scientist, Artificial Intelligence Technology Group — San Francisco, CA (Remote)
  • Spearheaded an AI project on LLM explainability & trust and built an open-source framework for testing LLM agents (llm-sandbox), published at ICML '24; Collaborated with MIT Profs. Philip Isola & Jacob Andreas.
  • Architected an internal tool for efficient evaluation of a Retrieval Augmented Generation (RAG) system with 15+ distinct metrics, achieving 92% accuracy.

Nissan Research

Jun 2022 — Sep 2022
Research Intern, Autonomous Systems, Alliance Innovation Lab — Santa Clara, CA
  • Engineered an end-to-end LiDAR 3D point-cloud classification system for autonomous vehicles, achieving >95% accuracy, ~2% FPR and 85% reduction in runtime. Deployed in Nissan Autonomous Vehicles beginning Winter '22.

Google

Mar 2022 — Jun 2022
CS Research Mentorship Program — Mountain View, CA
  • Ideated research projects on language models & diffusion models as part of Google's CSRMP.

Publications

2026
ICLR
ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents
Manasi Sharma, C.B. Zhang, Chaitanya Bandi, et al.
The Fourteenth International Conference on Learning Representations (ICLR), 2026
ICML
Rubric Robustness: Evaluating the Sensitivity of Rubrics-Based Benchmarks to Simple Perturbations
Manasi Sharma, Brendon Kenstler, Brendon Liu
ICML Main Conference Poster, 2026
arXiv
Agentic Rubrics for Test-Time Scaling of Computer-Use Agents
Manasi Sharma, Donghan Zhang, Brendon Liu
Preprint, May 2026
arXiv
MCP-Atlas: A Large-Scale Benchmark for Tool-Use Competency with Real MCP Servers
Chaitanya Bandi, Brian Hertzberg, ..., Manasi Sharma, et al.
Preprint, January 2026
ICML
SpreadsheetArena: Decomposing Preference in LLM Generation of Spreadsheet Workbooks
Sathvika Kundurthy, Catherine Na, ..., Manasi Sharma, et al.
ICML Main Conference Poster, 2026
2025
arXiv
Remote Labor Index: Measuring AI Automation of Remote Work
Mantas Mazeika, A. Gas, ..., Manasi Sharma, et al.
Preprint, October 2025
NeurIPS
Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models
Vaskar Nath, Elaine Lau, Anisha Gunjal, Manasi Sharma, Nikhil Baharte, Sean Hendryx
NeurIPS The First Workshop on Efficient Reasoning, 2025
2024
ICML
Why Would You Suggest That? Human Trust in Language Model Responses
Manasi Sharma, Ho Chit Siu, Rohan Paleja, Jaime D. Peña
ICML Humans, Algorithmic Decision-Making and Society Workshop, 2024
2023
NeurIPS
Exploring and Improving the Spatial Reasoning Abilities of Large Language Models
Manasi Sharma
NeurIPS Instruction Tuning and Following Workshop, 2023
2022
CoRL
BEHAVIOR-1K: A Benchmark for Embodied AI with 1,000 Everyday Activities and Realistic Simulation
Chengshu Li, Cem Gokmen, ..., Manasi Sharma, ...
Conference on Robot Learning (CoRL), 2022
2020
Phys Rev D
Interpreting deep learning models for weak lensing
Jose Manuel Zorrilla Matilla, Manasi Sharma, Daniel Hsu, Zoltan Haiman
American Physical Society, Physical Review D, 2020
PASP
Palomar Gattini-IR: Survey Overview, Data Processing System, on-Sky Performance and First Results
Kishalay De, Matthew J. Hankins, ..., Manasi Sharma, ...
Publications of the Astronomical Society of the Pacific, vol. 132, 2020

Teaching Experience

Stanford Engineering

Stanford School of Engineering

Mar 2022 — Jun 2023

Graduate Teaching Assistant

  • Managed weekly Discussion Sections of 75+ students for some of the most popular CS courses at Stanford (>600 students); held office hours, constructed & graded HWs. Received >95% excellent reviews.
  • Courses: CS 224N (NLP, Prof. Manning), CS 231N (Computer Vision, Prof. Fei-Fei Li), CS 230 (Deep Learning, Prof. Andrew Ng)
Columbia University

Columbia University, Department of Mathematics

Sep 2019 — Jun 2021

Undergraduate Teaching Assistant for Calculus III (across 4 semesters)

  • Graded assignments, led weekly office hours for ~150 students. Consistently received >80% excellent reviews.

Community Engagement

Stanford WiCS

Graduate Community Chair, Stanford Women in Computer Science

Jun 2022 — Jun 2023
  • Organized multiple events for the graduate community including mixers, industry alumni panels, lunches, talks, etc.; spearheaded a focus on MS students through the setup of an MS student alumni panel and care-package mixer.
  • Managed over $3000 in funds from the CS department and Engineering schools to fund events for graduate students.

Honors

Areas of Experience

Deep Learning, Natural Language Processing, Generative AI, Agents, Reinforcement Learning & Decision Making, Computer Vision, Diffusion Models, Explainability, Robotics, & Graph Neural Networks.