I lead a team of applied scientists at Amazon focused on creating training data for large language models. My work centers on developing methods to improve LLMs’ knowledge and reasoning capabilities, with the goal of making them more useful and reliable tools for people.
More broadly, I’m interested in advancing the capabilities of foundation models through better data, evaluation, and understanding.
I previously worked on information extraction from natural language and semi-structured text. I hold a PhD in Computer Science & Engineering from the University of Washington, where I was advised by Hannaneh Hajishirzi and additionally supervised by Xin Luna Dong. Our work on the Ceres project was the subject of Luna's keynote address at the 2020 Conference on Information and Knowledge Management (CIKM).
My background also includes a BA in English from Harvard University and an MA in Interdisciplinary Computer Science from Mills College. Prior to my PhD, I worked with the Big Data Analytics & Machine Intelligence group at NASA Langley Research Center, taught at Peking University, and did some time in the business world.
I regularly serve as a reviewer for conferences and journals in AI and NLP, including ACL, NAACL, EMNLP, KDD, AAAI, AKBC, and TKDE.