Manasi Sharma

Hi there! I'm a Research Engineer at Scale AI, where I work on evaluations, benchmarking, and reinforcement learning / post-training research, to train frontier models with better reasoning capabilities for state-of-the-art agents like browser & computer-use agents. My work has been published at conferences such as ICLR, ICML, and NeurIPS. Previously, I was a Research Scientist at MIT Lincoln Laboratory. I earned my MS in Computer Science (AI track) from Stanford and my BA in Computer Science and Physics from Columbia. I'm a generalist passionate about applying AI to diverse real-world challenges, from astronomy to autonomous vehicles to agents.

I am fortunate to have worked with many different professors over the course of my academic career. I conducted research in the ILIAD Lab under Prof. Dorsa Sadigh at Stanford on robotic learning using language-conditioned diffusion models, and the Stanford Vision Lab under Prof. Fei-Fei Li and Prof. Jiajun Wu on the BEHAVIOR robotic simulation benchmark. As an undergrad, I worked with Prof. Daniel Hsu and Prof. Zoltan Haiman on interpretable deep learning for astrophysics, and at Caltech with Prof. Mansi Kasliwal on image classification for the Gattini-IR telescope.

My research interests include AI evaluation methodologies, agents, reinforcement learning, and AI safety and alignment.

Everything else: Outside of work, I love dancing (I was Captain of the Columbia Raas dance team), playing the keyboard and exploring the natural landscapes in the Bay Area!

Manasi Sharma

Updates

Education

Stanford University

Stanford University, School of Engineering

Master's in Computer Science

Graduated in June 2023

Advisors: Dorsa Sadigh (ILIAD Lab), Fei-Fei Li, Jiajun Wu (Stanford Vision Lab)

Columbia University

Columbia University, Columbia College

BA in Computer Science with concentration in Physics

Graduated in May 2021

Advisors: Daniel Hsu, Zoltan Haiman

Work Experience

Scale AI

Scale AI

Mar 2025 — Present
ML Research Engineer, Reasoning & Agents Research Team, Scale Labs — San Francisco, CA
  • Research Lead for computer-use & browser-use agents, spearheading 3 first-author publications (7 total) on evaluations at top-tier conferences (ICLR, ICML); including ResearchRubrics, referenced in a Perplexity AI agent release.
  • Collaborated with external organizations like the Center for AI Safety at Berkeley on benchmarks for economically useful tasks (Remote Labor Index) and tool-use (MCP-Atlas), referenced in OpenAI & Anthropic's latest model cards.
  • Post-trained models with RL to analyze strategies for optimal data learning and contributed to research on an efficient RL algorithm with intermediate rewards (NeurIPS '25 workshop).
  • Collaborated with researchers at top frontier labs, offering consultation on advanced computer-use data types involving programmatic & agentic verifiers.
MIT Lincoln Laboratory

MIT Lincoln Laboratory

Aug 2023 — Mar 2025
Research Scientist, Artificial Intelligence Technology Group — San Francisco, CA (Remote)
  • Spearheaded an AI project on LLM explainability & trust and built an open-source framework for testing LLM agents (llm-sandbox), published at ICML '24; Collaborated with MIT Profs. Philip Isola & Jacob Andreas.
  • Architected an internal tool for efficient evaluation of a Retrieval Augmented Generation (RAG) system with 15+ distinct metrics, achieving 92% accuracy.
Nissan Research

Nissan Research

Jun 2022 — Sep 2022
Research Intern, Autonomous Systems, Alliance Innovation Lab — Santa Clara, CA
  • Engineered an end-to-end LiDAR 3D point-cloud classification system for autonomous vehicles, achieving >95% accuracy, ~2% FPR and 85% reduction in runtime. Deployed in Nissan Autonomous Vehicles beginning Winter '22.
Google

Google

Mar 2022 — Jun 2022
CS Research Mentorship Program — Mountain View, CA
  • Ideated research projects on language models & diffusion models and attended various mentorship events as part of Google's CSRMP.

Publications

2026
ICLR
ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents
Manasi Sharma, Chen Bo Calvin Zhang, Chaitanya Bandi, et al.
The Fourteenth International Conference on Learning Representations (ICLR), 2026
ICML
RubricRobustness: Evaluating the Sensitivity of Rubrics-Based Benchmarks to Simple Perturbations
Manasi Sharma, Brad Kenstler, Bing Liu
International Conference on Machine Learning (ICML), 2026 (arXiv coming soon)
Preprint
Agentic Rubrics for Test-Time Scaling of Computer-Use Agents
Manasi Sharma, Daniel Zhang, Bing Liu
Preprint, May 2026 (arXiv coming soon)
arXiv
MCP-Atlas: A Large-Scale Benchmark for Tool-Use Competency with Real MCP Servers
Chaithanya Bandi, Ben Hertzberg, ..., Manasi Sharma, et al.
Preprint, January 2026
ICML
SpreadsheetArena: Decomposing Preference in LLM Generation of Spreadsheet Workbooks
Srivatsa Kundurthy, Clara Na, Michael Handley, Zach Kirshner, Chen Bo Calvin Zhang, Manasi Sharma, Emma Strubell, John Ling
International Conference on Machine Learning (ICML), 2026
2025
arXiv
Remote Labor Index: Measuring AI Automation of Remote Work
Mantas Mazeika, Alice Gatti, ..., Manasi Sharma, et al.
Preprint, October 2025
NeurIPS
Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models
Vaskar Nath, Elaine Lau, Anisha Gunjal, Manasi Sharma, Nikhil Baharte, Sean Hendryx
NeurIPS The First Workshop on Efficient Reasoning, 2025
2024
ICML
Why Would You Suggest That? Human Trust in Language Model Responses
Manasi Sharma, Ho Chit Siu, Rohan Paleja, Jaime D. Peña
ICML, Humans, Algorithmic Decision-Making and Society Workshop, 2024
2023
NeurIPS
Exploring and Improving the Spatial Reasoning Abilities of Large Language Models
Manasi Sharma
NeurIPS Instruction Tuning and Following Workshop, 2023
2022
CoRL
BEHAVIOR-1K: A Benchmark for Embodied AI with 1,000 Everyday Activities and Realistic Simulation
Chengshu Li, Cem Gokmen, ..., Manasi Sharma, ...
Conference on Robot Learning (CoRL), 2022
2020
Phys Rev D
Interpreting deep learning models for weak lensing
Jose Manuel Zorrilla Matilla, Manasi Sharma, Daniel Hsu, Zoltan Haiman
Physical Review D, 102(12), American Physical Society, 2020
PASP
Palomar Gattini-IR: Survey Overview, Data Processing System, on-Sky Performance and First Results
Kishalay De, Matthew J. Hankins, ..., Manasi Sharma, ...
Publications of the Astronomical Society of the Pacific, vol. 132, 2020

Teaching Experience

Stanford Engineering

Stanford School of Engineering

Mar 2022 — Jun 2023

Graduate Teaching Assistant

  • Managed weekly Discussion Sections of 75+ students for some of the most popular CS courses at Stanford (>600 students); held office hours, constructed & graded HWs. Received >95% excellent reviews.
  • Courses: CS 224N (NLP, Prof. Manning), CS 231N (Computer Vision, Prof. Fei-Fei Li), CS 230 (Deep Learning, Prof. Andrew Ng)
  • Recorded the Python tutorial for Stanford's CS 224N course on YouTube
Columbia University

Columbia University, Department of Mathematics

Sep 2019 — Jun 2021

Undergraduate Teaching Assistant for Calculus III (across 4 semesters)

  • Graded assignments, led weekly office hours for ~150 students. Consistently received >80% excellent reviews.

Community Engagement

Stanford WiCS

Graduate Community Chair, Stanford Women in Computer Science

Jun 2022 — Jun 2023
  • Organized multiple events for the graduate community including mixers, industry alumni panels, lunches, talks, etc.; spearheaded a focus on MS students through the setup of an MS student alumni panel and care-package mixer.
  • Managed over $3000 in funds from the CS department and Engineering schools to fund events for graduate students.

Honors

Areas of Experience

Deep Learning, Natural Language Processing, Generative AI, Agents, Reinforcement Learning & Decision Making, Computer Vision, Diffusion Models, Explainability, Robotics, & Graph Neural Networks.