I am a Postdoc at Stanford University. I am affiliated with MARVL and Stanford AI Lab, where I am fortunate to be advised by Prof. Serena Yeung.
My research focuses on Video Understanding, Multimodal Learning, and AI for Healthcare.
Previously, I received my Ph.D. from University of Technology Sydney, under the supervision of Prof. Yi Yang. I got my B.E. from University of Science and Technology of China. I was also fortunate to collabrate with researchers from Baidu Research and Facebook AI Research during my PhD.
Most recent publications on Google Scholar.
‡ indicates equal contribution.
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Xiaohan Wang*, Yuhui Zhang*, Orr Zohar, Serena Yeung-Levy
ECCV (2024)
Video-STaR: Bootstrapping Weak Video Supervision for Visual Instruction Tuning
Orr Zohar, Xiaohan Wang, Yonatan Bitton, Idan Szpektor, Serena Yeung-Levy
arXiv preprint (2024)
Why are Visually-Grounded Language Models Bad at Image Classification?
Yuhui Zhang, Alyssa Unell, Xiaohan Wang, Dhruba Ghosh, Yuchang Su, Ludwig Schmidt, Serena Yeung-Levy
NeurIPS (2024)
Describing Differences in Image Sets with Natural Language
Lisa Dunlap*, Yuhui Zhang*, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell*, Jacob Steinhardt*, Joseph E. Gonzalez*, Serena Yeung-Levy*
CVPR (2024) Oral (90/11532)
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models
Shuai Zhao, Xiaohan Wang, Linchao Zhu, Yi Yang
ICLR (2024)
LANA: A Language-Capable Navigator for Instruction Following and Generation
Xiaohan Wang, Wenguan Wang, Jiayi Shao, Yi Yang
CVPR (2023)
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang
CVPR (2023)
Gloss-Free End-to-End Sign Language Translation
Kezhou Lin, Xiaohan Wang, Linchao Zhu, Ke Sun, Bang Zhang, Yi Yang
ACL (2023) Oral
Action Sensitivity Learning for Temporal Action Localization
Jiayi Shao, Xiaohan Wang, Ruijie Quan, Junjun Zheng, Jiang Yang, Yi Yang
ICCV (2023)
Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark
Jiaxu Miao, Xiaohan Wang, Yu Wu, Wei Li, Xu Zhang, Yunchao Wei, Yi Yang
CVPR (2022)
Interactive Prototype Learning for Egocentric Action Recognition
Xiaohan Wang, Linchao Zhu, Heng Wang, Yi Yang
ICCV (2021)
Symbiotic Attention for Egocentric Action Recognition with Object-centric Alignment
Xiaohan Wang, Linchao Zhu, Yu Wu, Yi Yang
T-PAMI (2021)
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
Xiaohan Wang, Linchao Zhu, Yi Yang
CVPR (2021)
Symbiotic Attention with Privileged Information for Egocentric Action Recognition
Xiaohan Wang, Yu Wu, Linchao Zhu, Yi Yang
AAAI (2020) Oral
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Xiaohan Wang*, Yuhui Zhang*, Orr Zohar, Serena Yeung-Levy
ECCV (2024)
Video-STaR: Bootstrapping Weak Video Supervision for Visual Instruction Tuning
Orr Zohar, Xiaohan Wang, Yonatan Bitton, Idan Szpektor, Serena Yeung-Levy
arXiv preprint (2024)
Why are Visually-Grounded Language Models Bad at Image Classification?
Yuhui Zhang, Alyssa Unell, Xiaohan Wang, Dhruba Ghosh, Yuchang Su, Ludwig Schmidt, Serena Yeung-Levy
NeurIPS (2024)
Describing Differences in Image Sets with Natural Language
Lisa Dunlap*, Yuhui Zhang*, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell*, Jacob Steinhardt*, Joseph E. Gonzalez*, Serena Yeung-Levy*
CVPR (2024) Oral (90/11532)
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models
Shuai Zhao, Xiaohan Wang, Linchao Zhu, Yi Yang
ICLR (2024)
LANA: A Language-Capable Navigator for Instruction Following and Generation
Xiaohan Wang, Wenguan Wang, Jiayi Shao, Yi Yang
CVPR (2023)
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang
CVPR (2023)
Gloss-Free End-to-End Sign Language Translation
Kezhou Lin, Xiaohan Wang, Linchao Zhu, Ke Sun, Bang Zhang, Yi Yang
ACL (2023) Oral
Action Sensitivity Learning for Temporal Action Localization
Jiayi Shao, Xiaohan Wang, Ruijie Quan, Junjun Zheng, Jiang Yang, Yi Yang
ICCV (2023)
Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark
Jiaxu Miao, Xiaohan Wang, Yu Wu, Wei Li, Xu Zhang, Yunchao Wei, Yi Yang
CVPR (2022)
Interactive Prototype Learning for Egocentric Action Recognition
Xiaohan Wang, Linchao Zhu, Heng Wang, Yi Yang
ICCV (2021)
Symbiotic Attention for Egocentric Action Recognition with Object-centric Alignment
Xiaohan Wang, Linchao Zhu, Yu Wu, Yi Yang
T-PAMI (2021)
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
Xiaohan Wang, Linchao Zhu, Yi Yang
CVPR (2021)
Symbiotic Attention with Privileged Information for Egocentric Action Recognition
Xiaohan Wang, Yu Wu, Linchao Zhu, Yi Yang
AAAI (2020) Oral
This website uses the website design and template by Martin Saveski.