About me

I am a researcher at Beijing Academy of Artificial Intelligence (BAAI). I received my PhD degree from the University of Chinese Academy of Sciences, supervised by Prof. Qixiang Ye.

My research background includes vision foundational models, object detection, and self-supervised pretraining. I am currently immersed in researching large multimodal models, with a specific focus on empowering machines with general intelligence.

My email: xszhang@baai.ac.cn

Education

  • Harbin Institute of Technology, Weihai, Sept. 2014 - Jul. 2018
    B.S. in School of Information Science and Engineering
  • University of Chinese Academy of Sciences, Sept. 2018 - Jul. 2023
    Ph.D. student in School of Electronic, Electrical and Communication Engineering

Recent Projects

Quan Sun*, Jinsheng Wang*, Qiying Yu*, Yufeng Cui, Fan Zhang, Xiaosong Zhang, Xinlong Wang. EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters. arXiv:2402.04252.
[arxiv]

Quan Sun*, Yufeng Cui*, Xiaosong Zhang*, Fan Zhang*, Qiying Yu*, Zhengxiong Luo, Yueze Wang, Yongming Rao, Jingjing Liu, Tiejun Huang, Xinlong Wang. Generative Multimodal Models are In-Context Learners. CVPR 2024.
[arxiv] [code] [demo] [project page]

Qiying Yu*, Quan Sun*, Xiaosong Zhang, Yufeng Cui, Fan Zhang, Xinlong Wang, Jingjing Liu. CapsFusion: Rethinking Image-Text Data at Scale. CVPR 2024.
[arxiv] [code&data]

Quan Sun*, Qiying Yu*, Yufeng Cui*, Fan Zhang*, Xiaosong Zhang*, Yueze Wang, Hongcheng Gao, Jingjing Liu, Tiejun Huang, Xinlong Wang. Generative Pretraining in Multimodality. ICLR 2024.
[arxiv] [code]

Xinlong Wang*, Xiaosong Zhang*, Yue Cao*, Wen Wang, Chunhua Shen, Tiejun Huang. SegGPT: Segmenting Everything In Context. ICCV 2023.
[arxiv] [code] [demo]

First-author Publications

Feng Liu*, Xiaosong Zhang*, Zhiliang Peng, Zonghao Guo, Fang Wan, Xiangyang Ji, Qixiang Ye. Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection. ICCV 2023.

Xiaosong Zhang*, Yunjie Tian*, Wei Huang, Qixiang Ye, Qi Dai, Lingxi Xie, Qi Tian. HiViT: A Simpler and More Efficient Design of Hierarchical Vision Transformer. ICLR 2023.

Xiaosong Zhang, Fang Wan, Chang Liu, Xiangyang Ji and Qixiang Ye. Learning to Match Anchors for Visual Object Detection. TPAMI 2021.

Xiaosong Zhang, Fang Wan, Chang Liu, Rongrong Ji and Qixiang Ye. FreeAnchor: Learning to Match Anchors for Visual Object Detection. NeurIPS 2019.

Co-author Publications

Zonghao Guo, Chang Liu, Xiaosong Zhang, Jianbin Jiao, Xiangyang Ji, and Qixiang Ye. Beyond Bounding-Box: Convex-hull Feature Adaptation for Oriented and Densely Packed Object Detection. CVPR 2021.

Zonghao Guo, Xiaosong Zhang, Chang Liu, Xiangyang Ji, Jianbin Jiao, Qixiang Ye. Convex-hull Feature Adaptation for Oriented and Densely Packed Object Detection. TCSVT 2022.

Feng Liu, Xiaosong Zhang, Fang Wan, Xiangyang Ji, Qixiang Ye. Domain Contrast for Domain Adaptive Object Detection. TCSVT 2021.

Zhiliang Peng, Wei Huang, Zonghao Guo, Xiaosong Zhang, Jianbin Jiao, Qixiang Ye. Long-tailed Distribution Adaptation. ACM MM 2021.

Chang Liu, Fang Wan, Wei Ke, Zhuowei Xiao, Yuan Yao, Xiaosong Zhang, Qixiang Ye. Orthogonal Decomposition Network for Pixel-Wise Binary Classification. CVPR 2019.