Owen Oertell

I study computer science at Cornell University. My research interests are in decision-making (reinforcement learning, bandits) and generative modeling (diffusion models, LLMs). I am fortunate to work with professors Wen Sun, Robert Kleinberg, and Kianté Brantley.

Currently, I am a research scientist intern at Databricks working on deep research. Previously, I was a research intern at NVIDIA and a software engineering intern at DRW.

Outside of research, I enjoy mathematics, art, music, literature, and drone photography. A picture of me can be found here.

ascii image of the humble administrator's garden. suzhou, china

Publications

See my Google Scholar for the most up-to-date list.

KARL: Knowledge Agents via Reinforcement Learning
KARL: Knowledge Agents via Reinforcement Learning
Jonathan D. Chang, Andrew Drozdov, Shubham Toshniwal, Owen Oertell, Alexander Trott, Jacob Portes, Abhay Gupta, et al. Jonathan D. Chang, Andrew Drozdov, Shubham Toshniwal, Owen Oertell, Alexander Trott, Jacob Portes, Abhay Gupta, Pallavi Koppol, Ashutosh Baheti, Sean Kulinski, Ivan Zhou, Irene Dea, Krista Opsahl-Ong, Simon Favreau-Lessard, Sean Owen, Jose Javier Gonzalez Ortiz, Arnav Singhvi, Xabi Andrade, Cindy Wang, Kartik Sreenivasan, Sam Havens, Jialu Liu, Peyton DeNiro, Wen Sun, Michael Bendersky, and Jonathan Frankle
Tech Report. [paper] [abstract]

We present a system for training enterprise search agents via reinforcement learning that achieves state-of-the-art performance across a diverse suite of hard-to-verify agentic search tasks. Our work makes four core cont…

Heuristics Considered Harmful: RL With Random Rewards Should Not Make LLMs Reason
Heuristics Considered Harmful: RL With Random Rewards Should Not Make LLMs Reason
Owen Oertell*, Wenhao Zhan*, Gokul Swamy, Zhiwei Steven Wu, Kiante Brantley, Jason Lee, and Wen Sun
NYRL 2024. [paper] [abstract]

Recent work has shown that for particular combinations of base model and training algorithm, *reinforcement learning with random rewards* (RLRR) improves the performance of LLMs on certain math reasoning benchmarks. This…

Efficient Controllable Diffusion via Optimal Classifier Guidance
Efficient Controllable Diffusion via Optimal Classifier Guidance
Owen Oertell*, Shikun Sun*, Yiding Chen*, Jin Peng Zhou, Zhiyong Wang, and Wen Sun

The controllable generation of diffusion models aims to steer the model to generate samples that optimize some given objective functions. It is desirable for a variety of applications including image generation, molecule…

REBEL: Reinforcement Learning via Regressing Relative Rewards
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, et al. Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, and Wen Sun
NeurIPS 2024. [paper] [abstract]

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generati…