Xiaohan Wang

Postdoc, Stanford University

xhanwang [AT] stanford.edu

Bio

I am a Postdoc at Stanford University. I am affiliated with MARVL and Stanford AI Lab, where I am fortunate to be advised by Prof. Serena Yeung.

My research focuses on Video Understanding, Multimodal Learning, and AI for Healthcare.

Previously, I received my Ph.D. from University of Technology Sydney, under the supervision of Prof. Yi Yang. I got my B.E. from University of Science and Technology of China. I was also fortunate to collabrate with researchers from Baidu Research and Facebook AI Research during my PhD.

News

Publications

Most recent publications on Google Scholar.
indicates equal contribution.

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Xiaohan Wang*, Yuhui Zhang*, Orr Zohar, Serena Yeung-Levy

ECCV (2024)

project paper code

Video-STaR: Bootstrapping Weak Video Supervision for Visual Instruction Tuning

Orr Zohar, Xiaohan Wang, Yonatan Bitton, Idan Szpektor, Serena Yeung-Levy

arXiv preprint (2024)

project paper code

Why are Visually-Grounded Language Models Bad at Image Classification?

Yuhui Zhang, Alyssa Unell, Xiaohan Wang, Dhruba Ghosh, Yuchang Su, Ludwig Schmidt, Serena Yeung-Levy

NeurIPS (2024)

project paper code

Describing Differences in Image Sets with Natural Language

Lisa Dunlap*, Yuhui Zhang*, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell*, Jacob Steinhardt*, Joseph E. Gonzalez*, Serena Yeung-Levy*

CVPR (2024) Oral (90/11532)

project paper code

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

Shuai Zhao, Xiaohan Wang, Linchao Zhu, Yi Yang

ICLR (2024)

project paper code

LANA: A Language-Capable Navigator for Instruction Following and Generation

Xiaohan Wang, Wenguan Wang, Jiayi Shao, Yi Yang

CVPR (2023)

paper code

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang

CVPR (2023)

paper code

Gloss-Free End-to-End Sign Language Translation

Kezhou Lin, Xiaohan Wang, Linchao Zhu, Ke Sun, Bang Zhang, Yi Yang

ACL (2023) Oral

paper code

Action Sensitivity Learning for Temporal Action Localization

Jiayi Shao, Xiaohan Wang, Ruijie Quan, Junjun Zheng, Jiang Yang, Yi Yang

ICCV (2023)

paper code

Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark

Jiaxu Miao, Xiaohan Wang, Yu Wu, Wei Li, Xu Zhang, Yunchao Wei, Yi Yang

CVPR (2022)

paper code

Interactive Prototype Learning for Egocentric Action Recognition

Xiaohan Wang, Linchao Zhu, Heng Wang, Yi Yang

ICCV (2021)

paper

Symbiotic Attention for Egocentric Action Recognition with Object-centric Alignment

Xiaohan Wang, Linchao Zhu, Yu Wu, Yi Yang

T-PAMI (2021)

paper code

T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval

Xiaohan Wang, Linchao Zhu, Yi Yang

CVPR (2021)

paper code

Symbiotic Attention with Privileged Information for Egocentric Action Recognition

Xiaohan Wang, Yu Wu, Linchao Zhu, Yi Yang

AAAI (2020) Oral

paper code

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Xiaohan Wang*, Yuhui Zhang*, Orr Zohar, Serena Yeung-Levy

ECCV (2024)

project paper code

Video-STaR: Bootstrapping Weak Video Supervision for Visual Instruction Tuning

Orr Zohar, Xiaohan Wang, Yonatan Bitton, Idan Szpektor, Serena Yeung-Levy

arXiv preprint (2024)

project paper code

Why are Visually-Grounded Language Models Bad at Image Classification?

Yuhui Zhang, Alyssa Unell, Xiaohan Wang, Dhruba Ghosh, Yuchang Su, Ludwig Schmidt, Serena Yeung-Levy

NeurIPS (2024)

project paper code

Describing Differences in Image Sets with Natural Language

Lisa Dunlap*, Yuhui Zhang*, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell*, Jacob Steinhardt*, Joseph E. Gonzalez*, Serena Yeung-Levy*

CVPR (2024) Oral (90/11532)

project paper code

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

Shuai Zhao, Xiaohan Wang, Linchao Zhu, Yi Yang

ICLR (2024)

project paper code

LANA: A Language-Capable Navigator for Instruction Following and Generation

Xiaohan Wang, Wenguan Wang, Jiayi Shao, Yi Yang

CVPR (2023)

paper code

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang

CVPR (2023)

paper code

Gloss-Free End-to-End Sign Language Translation

Kezhou Lin, Xiaohan Wang, Linchao Zhu, Ke Sun, Bang Zhang, Yi Yang

ACL (2023) Oral

paper code

Action Sensitivity Learning for Temporal Action Localization

Jiayi Shao, Xiaohan Wang, Ruijie Quan, Junjun Zheng, Jiang Yang, Yi Yang

ICCV (2023)

paper code

Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark

Jiaxu Miao, Xiaohan Wang, Yu Wu, Wei Li, Xu Zhang, Yunchao Wei, Yi Yang

CVPR (2022)

paper code

Interactive Prototype Learning for Egocentric Action Recognition

Xiaohan Wang, Linchao Zhu, Heng Wang, Yi Yang

ICCV (2021)

paper

Symbiotic Attention for Egocentric Action Recognition with Object-centric Alignment

Xiaohan Wang, Linchao Zhu, Yu Wu, Yi Yang

T-PAMI (2021)

paper code

T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval

Xiaohan Wang, Linchao Zhu, Yi Yang

CVPR (2021)

paper code

Symbiotic Attention with Privileged Information for Egocentric Action Recognition

Xiaohan Wang, Yu Wu, Linchao Zhu, Yi Yang

AAAI (2020) Oral

paper code

Acknowledgements

This website uses the website design and template by Martin Saveski.