Haoyang Huang (黄浩洋)
We are currently seeking highly motivated researchers and interns to join us for unified multimodal models: huanghaoyang.ocean@jd.com (work) or hhy338189@gmail.com (personal).
Haoyang Huang is is currently a senior expert at the Image and Multimodal Lab of JD Discovery Academy, leading research on unified multimodal foundation models. He holds a master's degree from Peking University and has won numerous top global data mining competition awards, as well as the Kaggle Master title. At Microsoft Research Asia, he focused on multimodal and multilingual foundation models, being one of the earliest researchers to develop multilingual foundation models. He designed the multilingual model Unicoder, which covers 100 languages, and applied this technology to the Bing Search team and Microsoft Translator, significantly improving their understanding of low-resource languages. He also participated in the WMT21 machine translation competition, winning first place globally. In addition, he led the development of M3P, the world’s first multilingual multimodal pre-trained model. He has published dozens of papers in top conferences such as CVPR, ACL, EMNLP, and AAAI, with over 2,000 citations on Google Scholar. In 2024, he joined StepFun, leading the development and open-sourcing of the 30B StepVideo model series (Step-Video-T2V, Step-Video-TI2V).
黄浩洋现任京东探索学院图像与多模态实验室资深专家,带领团队开展统一多模态基础模型研究。他拥有北京大学硕士学位,多次获得全球顶级数据挖掘比赛奖项,并获得 Kaggle Master 称号。在微软亚洲研究院,他专注于多模态与多语言基础模型的研究,最早开展多语言基础模型的设计,推出了覆盖 100 种语言的多语言模型 Unicoder,并将相关技术应用于微软必应搜索团队和微软机器翻译,显著提升了对小语种的理解能力,且参与 WMT21 机器翻译比赛获得全球第一名。此外,他还主导了全球首个多语言多模态预训练模型 M3P 的研发。他在 CVPR、ACL、EMNLP、AAAI 等顶级会议发表了数十篇论文,谷歌学术引用量超过 2,000 次。2024 年,他加入 StepFun,主导并开源了 30B StepVideo 模型系列(Step-Video-T2V、Step-Video-TI2V)。
Highlights
-
Multimodal Foundation Model: M3P (CVPR, 2021), UNIVL (Preprint, 2021), HCN (ACL, 2021), NUWA-LIP (CVPR, 2023), Step-Video-T2V (Technical Report, 2025), Step-Video-TI2V (Technical Report, 2025)
- Multilingual Foundation Model: Unicoder (EMNLP, 2019), XLT (EMNLP, 2023), Div-Ref (NAACL, 2023), CoD (EMNLP, 2023), GLAN(EMNLP, 2024), PED (ACL, 2024), LAPE(ACL, 2024)
- Machine Translation: XLM-T (Preprint, 2020), WMT21 First-Place Report (SIGMT, 2021), MANMT (ACL 2021), LVP-M3 (EMNLP, 2022)
Reports & Awards
- 接力DeepSeek,开源两款国产多模态大模型 机器之心, 量子位, 2025.
- Multilingual translation at scale: 10000 language pairs and beyond Microsoft Translator, 2021.
- 重建巴别塔之多语言翻译模型 微软亚洲研究院, 2021.
- Champion of WMT-21 Machine Translation. Workshop on Machine Translation, 2021.
- 多语言与多模态在预训练中结合 Paperweekly B站, 2021.
- AI-at-Scale-in-Bing Microsoft, 2020.
- 跨语言预训练,提高机器推理的迁移能力 微软研究院, 2019.
- Kaggle Master Award. Kaggle, 2017.
- Gold Medal in Kaggle Quora Question Pairs Competition. Kaggle, 2017.
Academic Service
- ACL, EMNLP, NAACL, COLING, AAAI, IJCAI, NeurIPS, ICLR, ICML, CVPR
Publications
-
Step-Video-Ti2v Technical Teport
Haoyang Huang, Guoqing Ma, Nan Duan, Xing Chen, Changyi Wan, Ranchen Ming, Tianyu Wang, Bo Wang, Zhiying Lu.
Technical Report, 2025.
-
Step-Video-T2v Technical Teport
Guoqing Ma, Haoyang Huang, Kun Yan, Liangyu Chen, Nan Duan.
Technical Report, 2025.
-
STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives
Bo Wang, Haoyang Huang, Zhiying Lu, Fengyuan Liu, Guoqing Ma, Jianlong Yuan, Yuan Zhang, Nan Duan, Daxin Jiang.
Preprint, 2025.
-
Generative pre-trained autoregressive diffusion transformer
Yuan Zhang, Jiacheng Jiang, Guoqing Ma, Zhiying Lu, Haoyang Huang, Jianlong Yuan, Nan Duan.
Preprint, 2025.
-
Respond in my language: Mitigating language inconsistency in response generation based on large language models
Liang Zhang, Qin Jin, Haoyang Huang, Dongdong Zhang, Furu Wei.
ACL, 2024.
-
Language-specific neurons: The key to multilingual capabilities in large language models
Tianyi Tang, Wenyang Luo, Haoyang Huang, Dongdong Zhang, Xiaolei Wang, Xin Zhao, Furu Wei, Ji-Rong Wen.
ACL, 2024.
-
Synthetic data (almost) from scratch: Generalized instruction tuning for language models
Haoran Li, Qingxiu Dong, Zhengyang Tang, Chaojun Wang, Xingxing Zhang, Haoyang Huang, Shaohan Huang, Xiaolong Huang, Zeqiang Huang, Dongdong Zhang.
EMNLP Finding, 2024.
-
TRIP: Accelerating Document-level Multilingual Pre-training via Triangular Document-level Pre-training on Parallel Data Triplets
Hongyuan Lu, Haoyang Huang, Shuming Ma, Dongdong Zhang, Wai Lam, Zhaochuan Gao, Anthony Aue, Arul Menezes, Furu Wei.
EMNLP, 2024.
-
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References
Tianyi Tang, Hongyuan Lu, Yuchen Jiang, Haoyang Huang, Dongdong Zhang, Xin Zhao, Tom Kocmi, Furu Wei.
NAACL, 2024.
-
Chain-of-Dictionary Prompting Elicits Translation in Large Language Models
Hongyuan Lu, Haoran Yang, Haoyang Huang, Dongdong Zhang, Wai Lam, Furu Wei.
EMNLP, 2024.
-
Not All Languages Are Created Equal in LLMs
Haoyang Huang, Tianyi Tang, Dongdong Zhang, Wayne Xin Zhao, Ting Song, Yan Xia, Furu Wei.
EMNLP, 2023.
-
HanoiT: Enhancing Context-aware Translation via Selective Context
Jian Yang, Yuwei Yin, Shuming Ma, Liqun Yang, Hongcheng Guo, Haoyang Huang, Dongdong Zhang, Yutao Zeng, Zhoujun Li, Furu Wei.
DASFFAA, 2023.
-
NÜWA-LIP: Language Guided Image Inpainting with Defect-free VQGAN
Minheng Ni, Chenfei Wu, Haoyang Huang, Daxin Jiang, Wangmeng Zuo, Nan Duan.
CVPR, 2023.
-
GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
Jian Yang, Shuming Ma, Li Dong, Shaohan Huang, Haoyang Huang, Yuwei Yin, Dongdong Zhang, Liqun Yang, Furu Wei, Zhoujun Li.
ACL, 2023.
-
LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation
Hongcheng Guo, Jiaheng Liu, Haoyang Huang, Jian Yang, Zhoujun Li, Dongdong Zhang, Zheng Cui.
EMNLP, 2022.
-
BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation
Yuchen Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Jian Yang, Haoyang Huang, Rico Sennrich, Ryan Cotterell, Mrinmaya Sachan, Ming Zhou.
NAACL, 2022.
-
M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training
Minheng Ni, Haoyang Huang, Lin Su, Edward Cui, Taroon Bharti, Lijuan Wang, Dongdong Zhang, Nan Duan.
CVPR, 2021.
-
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
Huaishao Luo, Lei Ji, Botian Shi, Haoyang Huang, Nan Duan, Tianrui Li, Jason Li, Taroon Bharti, Ming Zhou.
Preprint, 2021.
-
Hierarchical Context-aware Network for Dense Video Event Captioning
Lei Ji, Xianglin Guo, Haoyang Huang, Xilin Chen.
ACL, 2021.
-
Xgpt: Cross-modal generative pre-training for image captioning
Qiaolin Xia, Haoyang Huang, Nan Duan, Dongdong Zhang, Lei Ji, Zhifang Sui, Edward Cui, Taroon Bharti, Ming Zhou.
NLPCC, 2021.
-
Multilingual Agreement for Multilingual Neural Machine Translation
Jian Yang, Yuwei Yin, Shuming Ma, Haoyang Huang, Dongdong Zhang, Zhoujun Li, Furu Wei.
ACL, 2021.
-
Improving Multilingual Neural Machine Translation with Auxiliary Source Languages
Weijia Xu, Yuwei Yin, Shuming Ma, Dongdong Zhang, Haoyang Huang.
EMNLP finding, 2021.
-
Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task
Jian Yang, Haoyang Huang, Shuming Ma, Dongdong Zhang, Li Dong, Shaohan Huang, Alexandre Muzio, Saksham Singhal, Hany Hassan, Xia Song, Furu Wei.
SIGMT, 2021.
-
XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
Shuming Ma, Jian Yang, Haoyang Huang, Zewen Chi, Li Dong, Dongdong Zhang, Hany Hassan Awadalla, Alexandre Muzio, Akiko Eriguchi, Saksham Singhal, Xia Song, Arul Menezes, Furu Wei.
Arxiv, 2020 .
-
Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks
Haoyang Huang, Yaobo Liang, Nan Duan, Ming Gong, Linjun Shou, Daxin Jiang, Ming Zhou.
EMNLP, 2019.