My previous research involved evaluating LLMs' performance on aggregating information from biomedical reports.
I also worked on a project that trained a state space model (Mamba-2 130M & 1.3B) to efficiently retrieve information from long documents for question answering tasks.
Publications
Single-Pass Document Scanning for Question Answering
Preprint, 2025
TL;DR: We trained State Space Models (Mamba-2) for long-context Q&A, achieving performance comparable to GPT-4o on extremely long documents while being more computationally efficient.
@misc{cao2025singlepassdocumentscanningquestion, title={Single-Pass Document Scanning for Question Answering}, author={Weili Cao and Jianyou Wang and Youze Zheng and Longtian Bao and Qirui Zheng and Taylor Berg-Kirkpatrick and Ramamohan Paturi and Leon Bergen}, year={2025}, eprint={2504.03101}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2504.03101}, }
Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark
arXiv Preprint, 2024
TL;DR: A benchmark for assessing biases and limitations in the methodology of biomedical studies.
@misc{wang2024measuringriskbiasbiomedical, title={Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark}, author={Jianyou Wang and Weili Cao and Longtian Bao and Youze Zheng and Gil Pasternak and Kaicheng Wang and Xiaoyue Wang and Ramamohan Paturi and Leon Bergen}, year={2024}, eprint={2411.18831}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2411.18831}, }
EvidenceBench: A Benchmark for Extracting Evidence from Biomedical Papers
arXiv Preprint, 2024
TL;DR: A comprehensive benchmark designed to evaluate the ability of models to extract supporting evidence for biomedical claims from research papers.
@misc{wang2024evidencebench, title={EvidenceBench: A Benchmark for Extracting Evidence from Biomedical Papers}, author={Jianyou Wang and Weili Cao and Kaicheng Wang and Xiaoyue Wang and Ashish Dalvi and Gino Prasad and Qishan Liang and Hsuan-lin Her and Mingwang and Qin Yang and Gene W. Yeo and David E Neal and Maxim Khan and Christopher D. Rosin and Ramamohan Paturi and Leon Bergen}, year={2024}, eprint={2504.18736}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2504.18736} }
IR2: Information Regularization for Information Retrieval
LREC-COLING, 2024
TL;DR: We introduce a new regularization method that improves the training of dual-encoder retrieval models by regularizing the information capacity of document representations.
@inproceedings{wang-etal-2024-ir2, title = {{IR}2: Information Regularization for Information Retrieval}, author = {Wang, Jianyou and Wang, Kaicheng and Wang, Xiaoyue and Cao, Weili and Paturi, Ramamohan and Bergen, Leon}, booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)}, month = may, year = {2024}, address = {Torino, Italia}, publisher = {ELRA and ICCL}, url = {https://aclanthology.org/2024.lrec-main.810/}, pages = {9261--9284} }
BIRCO: A Benchmark of Information Retrieval Tasks with Complex Objectives
arXiv Preprint, 2024
TL;DR: We present BIRCO, a new benchmark featuring nine information retrieval tasks that require models to understand complex objectives beyond simple semantic similarity.
@misc{wang2024birco, title={BIRCO: A Benchmark of Information Retrieval Tasks with Complex Objectives}, author={Xiaoyue Wang and Jianyou Wang and Weili Cao and Kaicheng Wang and Ramamohan Paturi and Leon Bergen}, year={2024}, eprint={2402.14151}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2402.14151} }