I am a third-year Master’s student in the College of Computer Science and Technology at Zhejiang University, under the supervision of Professor Zhou Zhao. I also completed my Bachelor’s degree at Zhejiang University.
My research focuses on Multi-modal Learning, 3D Scene Understanding, and Embodied AI. Currently, I am researching to enhance multi-modal perception and reasoning capabilities for robot policies during my internship at OpenRobotLab, advised by Yilun Chen and Jiangmiao Pang.
📝 Publications
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers.
Haifeng Huang, Yilun Chen, Zehan Wang, Rongjie Huang, Runsen Xu, Tai Wang, Luping Liu, Xize Cheng, Yang Zhao, Jiangmiao Pang, Zhou Zhao
Grounded 3D-LLM with Referent Tokens.
Yilun Chen*, Shuai Yang*, Haifeng Huang*, Tai Wang, Ruiyuan Lyu, Runsen Xu, Dahua Lin, Jiangmiao Pang.
- Grounded 3D-LLM establishes a correspondence between 3D scenes and language phrases through referent tokens.
- Create a large-scale grounded scene caption dataset at phrase-level.
Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes.
Zehan Wang*, Haifeng Huang*, Yang Zhao, Ziang Zhang, Zhou Zhao.
- Chat-3D is one of the frist 3D LLMs.
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding.
Zehan Wang*, Haifeng Huang*, Yang Zhao, Linjun Li, Xize Cheng, Yichen Zhu, Aoxiong Yin, Zhou Zhao
- The first weakly-supervised 3D visual grounding method.
Towards Effective Multi-modal Interchanges in Zero-resource Sounding Object Localization.
Yang Zhao*, Chen Zhang*, Haifeng Huang*, Haoyuan Li, Zhou Zhao
- A method for sounding object localization without training on any prior data in this field.