Tiancheng Gu

Tiancheng Gu | 顾天承

Ph.D. Student

The University of Sydney

Research Interests

  • Large-scale Synthetic Data
  • CPT&PT of LVLMs in Video Understanding
  • CLIP Pre-Training

About Me

I am a third-year Ph.D. student at the University of Sydney, under the supervision of A/Professor Weidong Cai. I previously completed a B.Eng. (Honours) in 2022 at the Australian National University, where I was advised by Professor Hongdong Li. I also hold a B.Sci. in Computer Science, awarded in 2021 by the University of Melbourne.

🔥 News

2025.10
🎉 UniME-V2 has been accepted by AAAI 2026 as Oral Presentation!
2025.07
🎉 UniME and RealSyn have been accepted by ACM MM 2025! All members oral!
2024.10
🎉 CLIP-CID has been accepted by AAAI 2025!
2024.10
🎉 ORID has been accepted by WACV 2025 as Oral Presentation!
2024.09
🎉 RWKV-CLIP has been accepted by EMNLP 2024 Main!
2024.05
🎉 LaPA has been accepted by CVPR 2024 Workshop!
2023.10
🎉 COMG has been accepted by WACV 2024!
2023.03
🎉 Started my Ph.D. journey at The University of Sydney!

📄 Selected Publications

(* means equal contribution, 📧 means corresponding author)

2026

AAAI'26 Oral UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

Tiancheng Gu*, Kaicheng Yang*, Kaichen Zhang, Xiang An, Yueyi Zhang📧, Weidong Cai, Jiankang Deng📧, Lidong Bing
AAAI Conference on Artificial Intelligence (AAAI'26) CORE A* CCF-A
UniME-V2 Overview
TL;DR

UniME-V2 leverages Multimodal LLMs as judges to identify high-quality hard negatives and learns fine-grained semantic distinctions through soft-label distribution alignment, achieving SOTA performance in universal multimodal retrieval.

Under Review DanQing-100M: A Large-scale Chinese Vision-Language Pre-training Dataset

Hengyu Shen*, Tiancheng Gu*, Bin Qin, Shuo Tan, Zelong Sun Jun Wang, Nan Wu, Xiang An, Ziyong Feng, Kaicheng Yang📧
Technique Report
DanQing-100M Overview
TL;DR

DanQing is a curated dataset of 100 million Chinese image-text pairs sourced from 2024–2025 web data. By implementing a rigorous multi-stage filtering pipeline that retains only the top 9.54% of raw data, it achieves superior quality and temporal relevance over existing benchmarks. Models pre-trained on DanQing achieve state-of-the-art (SOTA) performance in zero-shot classification, cross-modal retrieval, and Chinese-centric multimodal reasoning.

2025

ACM MM'25 Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs

Tiancheng Gu*, Kaicheng Yang*, Yanzhao Zhang, Yingda chen, Dingkun Long, Weidong Cai, JianKang Deng📧
ACM International Conference on Multimedia (ACM MM'25) CORE A* CCF-A
UniME Overview
TL;DR

UniME is a novel two-stage framework that empowers Multimodal Large Language Models (MLLMs) to learn universal and discriminative representations for diverse downstream tasks. By leveraging textual discriminative knowledge distillation and hard negative enhanced instruction tuning, UniME overcomes the limitations of traditional models (like CLIP’s token truncation and bag-of-words behavior), achieving state-of-the-art (SOTA) performance on the MMEB benchmark and various retrieval tasks.

ACM MM'25 RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm

Tiancheng Gu*, Kaicheng Yang*, Chaoyi Zhang, Yin Xie, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai📧, Jiankang Deng📧
ACM International Conference on Multimedia (ACM MM'25) CORE A* CCF-A
RealSyn Overview
TL;DR

RealSyn is a scalable paradigm designed to unlock the potential of unpaired multimodal interleaved documents for vision-language pre-training. By integrating a hierarchical retrieval of realistic texts with LLM-based synthetic caption generation, it constructs a high-quality, semantically balanced dataset of up to 100 million pairs. Models pre-trained on RealSyn consistently achieve state-of-the-art (SOTA) performance across zero-shot transfer, robustness, and retrieval benchmarks, outperforming traditional datasets like LAION and YFCC.

AAAI'25 CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination

Kaicheng Yang*, Tiancheng Gu*, Xiang An, Haiqiang Jiang, Xiangzi Dai, Ziyong Feng, Weidong Cai📧, Jiankang Deng📧
AAAI Conference on Artificial Intelligence (AAAI'25) CORE A* CCF-A
CLIP-CID Overview
TL;DR

CLIP-CID is an efficient distillation framework for vision-language models that utilizes semantic balancing to prune redundant training data and cluster-instance discrimination to achieve state-of-the-art performance with lower computational costs.

WACV'25 Oral ORID: Organ-Regional Information Driven Framework for Radiology Report Generation

Tiancheng Gu*, Kaicheng Yang*, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai📧
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV'25) CORE A
ORID Overview
TL;DR

We propose ORID, an Organ-Regional Information Driven framework that enhances radiology report generation by fusing organ-specific diagnostic descriptions with visual features and employing a GNN-based importance analysis to focus on clinically relevant regions while filtering out noise.

2024

EMNLP'24 RWKV-CLIP: A Robust Vision-Language Representation Learner

Tiancheng Gu*, Kaicheng Yang*, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai📧, Jiankang Deng📧
Conference on Empirical Methods in Natural Language Processing (EMNLP'24) CORE A* CCF-B
RWKV-CLIP Overview
TL;DR

RWKV-CLIP is an efficient and robust vision-language model that replaces traditional Transformer backbones with an RWKV-driven architecture and incorporates an LLM-based data refinement framework to achieve SOTA performance with significantly reduced memory and inference costs.

CVPR Workshop'24 Oral LaPA: Latent Prompt Assist Model For Medical Visual Question Answering

Tiancheng Gu, Kaicheng Yang, Dongnan Liu, Weidong Cai📧
TL;DR
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPR'24 Workshop)

WACV'24 Complex Organ Mask Guided Radiology Report Generation

Tiancheng Gu, Dongnan Liu, Zhiyuan Li, Weidong Cai📧
TL;DR
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV'24) CORE A

🏆 Honors & Awards

2023

University of Sydney International Stipend Scholarship

The University of Sydney

2023

University of Sydney International Tuition Fee Scholarship

The University of Sydney

💼 Experiences

Alibaba

Qwen Team

Tongyi Lab, Alibaba Group

2025.08 - Now

Miromind AI

Miromind AI

Shanda Group

2025.05 - 2025.08

Alibaba

ModelScope Team

Tongyi Lab, Alibaba Group

2025.02 - 2025.05

DeepGlint

Glint Lab

DeepGlint Technologies Co. Ltd

2024.02 - 2025.02

Huawei

Huawei Technologies Co. Ltd

2021.03 - 2021.09

🎓 Education

University of Sydney

Ph.D. in Computer Science

The University of Sydney

2023.03 - Now

Supervisor: A/Prof. Weidong Cai & Dr. Dongnan Liu

Australian National University

B.Eng. in Computer Engineering (Honours)

The Australian National University

2022.03 - 2022.12

Advisor: Prof. Hongdong Li

University of Melbourne

B.Sci. in Computer Science

The University of Melbourne

2019.03 - 2021.12

💼 Academic Service

Conference Reviewer

EMNLP, NAACL, ACMMM, AAAI, ACL, ECCV, CVPR

Served as a reviewer for top-tier computer vision, natural language processing, and machine learning conferences.

×