I am a Master’s student in the College of Computer Science and Technology at Zhejiang University, under the supervision of Professor Zhou Zhao. I also completed my Bachelor’s degree at Zhejiang University. My research focuses on 3D/2D Computer Vision and Multi-modal Learning.

📝 Publications

NeurIPS 2024

Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers.

Haifeng Huang, Yilun Chen, Zehan Wang, Rongjie Huang, Runsen Xu, Tai Wang, Luping Liu, Xize Cheng, Yang Zhao, Jiangmiao Pang, Zhou Zhao

Chat-Scene is a 3D LLM which processes both point clouds and multi-view images for 3D scene understanding, excelling in tasks such as 3D grounding, captioning, and question answering.
(Sep. 2024) Ranked 1st on the grounding benchmark ScanRefer and the captioning benchmark Scan2Cap.

Arxiv 2024

Grounded 3D-LLM with Referent Tokens.

Yilun Chen*, Shuai Yang*, Haifeng Huang*, Tai Wang, Ruiyuan Lyu, Runsen Xu, Dahua Lin, Jiangmiao Pang.

Grounded 3D-LLM establishes a correspondence between 3D scenes and language phrases through referent tokens.
Create a large-scale grounded scene caption dataset at phrase-level.

Arxiv 2023

Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes.

Zehan Wang*, Haifeng Huang*, Yang Zhao, Ziang Zhang, Zhou Zhao.

Chat-3D is one of the frist 3D LLMs.

ICCV 2023

Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding.

Zehan Wang*, Haifeng Huang*, Yang Zhao, Linjun Li, Xize Cheng, Yichen Zhu, Aoxiong Yin, Zhou Zhao

The first weakly-supervised 3D visual grounding method.

NeurIPS 2022

Towards Effective Multi-modal Interchanges in Zero-resource Sounding Object Localization.

Yang Zhao*, Chen Zhang*, Haifeng Huang*, Haoyuan Li, Zhou Zhao

A method for sounding object localization without training on any prior data in this field.