Weili Cao

Weili Cao

Incoming PhD Student, Duke University
M.S. & B.S., UC San Diego
Email: w2cao [at] ucsd [dot] edu

Scholar LinkedIn GitHub CV

My previous research involved evaluating LLMs' performance on aggregating information from biomedical reports.
I also worked on a project that trained a state space model (Mamba-2 130M & 1.3B) to efficiently retrieve information from long documents for question answering tasks.

Publications

Single-Pass Document Scanning for Question Answering
Weili Cao*, Jianyou Wang*, Youze Zheng, Longtian Bao, Qirui Zheng, Taylor Berg-Kirkpatrick, Ramamohan Paturi, Leon Bergen
Preprint, 2025
TL;DR: We trained State Space Models (Mamba-2) for long-context Q&A, achieving performance comparable to GPT-4o on extremely long documents while being more computationally efficient.
@misc{cao2025singlepassdocumentscanningquestion, title={Single-Pass Document Scanning for Question Answering}, author={Weili Cao and Jianyou Wang and Youze Zheng and Longtian Bao and Qirui Zheng and Taylor Berg-Kirkpatrick and Ramamohan Paturi and Leon Bergen}, year={2025}, eprint={2504.03101}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2504.03101}, }
Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark
Jianyou Wang*, Weili Cao*, Longtian Bao, Youze Zheng, Gil Pasternak, Kaicheng Wang, Xiaoyue Wang, Ramamohan Paturi, Leon Bergen
arXiv Preprint, 2024
TL;DR: A benchmark for assessing biases and limitations in the methodology of biomedical studies.
@misc{wang2024measuringriskbiasbiomedical, title={Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark}, author={Jianyou Wang and Weili Cao and Longtian Bao and Youze Zheng and Gil Pasternak and Kaicheng Wang and Xiaoyue Wang and Ramamohan Paturi and Leon Bergen}, year={2024}, eprint={2411.18831}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2411.18831}, }