hey, i'm atrey desai.

I am a third-year undergraduate student studying computer science and linguistics with a minor in korean studies at the University of Maryland.

I am fortunate to be advised by Professors Rachel Rudinger and Jordan Boyd-Graber.

Research interests

Language models are increasingly capable, but our methods for measuring and building that capability lag behind. I work on evaluation and data pipelines for reliable NLP, namely:

1. Benchmark validity and the limits of what our evaluations actually measure

2. Human-AI collaboration in data annotation and curation

3. Evaluation for systems that reason, perceive, and act in the world

research

see all

Under Review at ARR 2025 preprint

Test-Time Reasoners Are Strategic Multiple-Choice Test-Takers

Nishant Balepur, Atrey Desai, Rachel Rudinger

While choices-only success is often deemed problematic, reasoning traces reveal that LLMs use less problematic strategies like inferring missing questions, challenging claims that partial-input success is always a flaw. Consequently, reasoning traces could help separate problematic data from less problematic reasoning.

arXiv PDF

Code

Under Review at ARR 2025 preprint

BenchMarker: An Education-Inspired Toolkit for Highlighting Flaws in Multiple-Choice Benchmarks

Nishant Balepur, Bhavya Rajasekaran, Jane Oh, Michael Xie, Atrey Desai, Jordan Boyd-Graber

arXiv PDF

Code

TLS (Oral), Under Review at ARR 2025 preprint

Filling in the Mechanisms: How do LMs Learn Filler-Gap Dependencies under Developmental Constraints?

Atrey Desai, Sathvik Nair