Haozhe Ji 计昊哲

I am Haozhe Ji, a final-year Ph.D. student from CoAI Group in the Dept. of Computer Science and Technology, Tsinghua University, advised by Prof. Minlie Huang. Previously, I received my bachelor degree from the Dept. of Electronic Engineering, Tsinghua University, and got a gold medal at the Chinese Physics Olympiad. Please find my CV here [English].

My research is driven by the goal of developing theoretically grounded and scalable methods to improve neural language models in the areas of natural language generation and language model alignment. Specifically, my work aims to develop practical algorithms and systems that address the fundamental limitations of the standard paradigm of language modeling in a principled manner.

Firstly, in terms of the choice of modeling, my research explores model families beyond auto-regressive models (ARMs) which possess a strong local inductive bias, to facilitate more accurate modeling of the growing volume of data. This includes practical realization of theoretically more expressive model families, e.g., energy-based models 2 Language Model Decoding as Direct Metrics Optimization
Haozhe Ji, Pei Ke, Hongning Wang, Minlie Huang.
ICLR 2024.
, latent variable models 6 DiscoDVT: Generating Long Text with Discourse-Aware Discrete Variational Transformer
Haozhe Ji, Minlie Huang.
EMNLP 2021.
, and semi-parametric models 8 Language generation with multi-hop reasoning on commonsense knowledge graph
Haozhe Ji, Pei Ke, Shaohan Huang, Furu Wei, Xiaoyan Zhu, Minlie Huang.
EMNLP 2020.
.

Secondly, in terms of the problem of learning, my research advocates for quality-aware learning objectives beyond maximum likelihood estimation (MLE) which is biased towards coverage. These new objectives are theoretically grounded in probability metrics that facilitate quality assessment, including reverse KL divergence 1 Towards Efficient and Exact Optimization of Language Model Alignment
Haozhe Ji, Cheng Lu, Yilin Niu, Pei Ke, Hongning Wang, Jun Zhu, Jie Tang, Minlie Huang.
ICML 2024.
and total variation distance 3 Tailoring Language Generation Models under Total Variation Distance
Haozhe Ji, Pei Ke, Zhipeng Hu, Rongsheng Zhang, Minlie Huang.
ICLR 2023.
to accommodate the growth of high-quality data annotations in various forms.

News

  • I am looking for industrial or academic positions this upcoming year!
  • [06/2024] I gave a recent talk summarizing my work and thoughts on the Theoretical Limitations of Language Modeling and Beyond [slides] at ByteDance.
  • [05/2024] Our EXO paper is accepted at ICML 2024. See you in Vienna 🇦🇹🎡 and feel free to reach out!
  • [03/2024] I gave a talk on our recent work, Towards Efficient Exact Optimization (EXO) of Language Model Alignment [slides].

Publications

* indicates equal contribution.

  1. Haozhe Ji, Cheng Lu, Yilin Niu, Pei Ke, Hongning Wang, Jun Zhu, Jie Tang, Minlie Huang.
    Towards Efficient and Exact Optimization of Language Model Alignment.
    International Conference on Machine Learning ICML 2024.
    [paper] [repo]

  2. Haozhe Ji, Pei Ke, Hongning Wang, Minlie Huang.
    Language Model Decoding as Direct Metrics Optimization.
    International Conference on Learning Representations, ICLR 2024.
    [paper]

  3. Haozhe Ji, Pei Ke, Zhipeng Hu, Rongsheng Zhang, Minlie Huang.
    Tailoring Language Generation Models under Total Variation Distance.
    International Conference on Learning Representations, ICLR 2023.
    (Oral / Notable top 5%)

    [paper] [repo]

  4. Pei Ke, Haozhe Ji, Zhenyu Yang, Yi Huang, Junlan Feng, Xiaoyan Zhu, Minlie Huang.
    Curriculum-Based Self-Training Makes Better Few-Shot Learners for Data-to-Text Generation
    Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022.
    [paper]

  5. Haozhe Ji, Rongsheng Zhang, Zhenyu Yang, Zhipeng Hu, Minlie Huang.
    LaMemo: Language modeling with look-ahead memory
    Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2022. (Oral)
    [paper] [repo]

  6. Haozhe Ji, Minlie Huang.
    DiscoDVT: Generating Long Text with Discourse-Aware Discrete Variational Transformer
    Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021. (Oral)
    [paper] [repo]

  7. Pei Ke, Haozhe Ji, Yu Ran, Xin Cui, Liwei Wang, Linfeng Song, Xiaoyan Zhu, Minlie Huang.
    Jointgt: Graph-text joint representation learning for text generation from knowledge graphs
    Findings of the Association for Computational Linguistics, Findings of ACL 2021.
    [paper] [repo]

  8. Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
    CPM: A large-scale generative Chinese pre-trained language model
    AI Open.
    [paper]

  9. Haozhe Ji, Pei Ke, Shaohan Huang, Furu Wei, Xiaoyan Zhu, Minlie Huang.
    Language generation with multi-hop reasoning on commonsense knowledge graph
    Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020. (Oral)
    [paper] [repo]

  10. Pei Ke*, Haozhe Ji*, Siyang Liu, Xiaoyan Zhu, Minlie Huang.
    Sentilare: Linguistic knowledge enhanced language representation for sentiment analysis
    Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020.
    [paper] [repo]

  11. Haozhe Ji, Pei Ke, Shaohan Huang, Furu Wei, Minlie Huang.
    Generating commonsense explanation by extracting bridge concepts from reasoning paths
    Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, AACL 2020.
    [paper]

  12. Yankai Lin, Haozhe Ji, Zhiyuan Liu, Maosong Sun.
    Denoising Distantly Supervised Open-Domain Question Answering
    Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018.
    [paper] [repo]

Honors & Awards

  • Tang Junyuan (唐君远) Scholarship, Tsinghua University, 2023
  • Sohu Scholarship, Tsinghua University, 2022
  • Yang Huiyan (杨惠妍) Scholarship, Tsinghua University, 2021
  • Comprehensive Merit Scholarship, Tsinghua University, 2017/2019
  • Gold Medal, 32nd Chinese Physics Olympiads (CPhO), 2015
  • Distinguished Honor Roll (Top 1%), American Mathematics Contest 12A (AMC 12A), 2015

Education

Services

Reviewer/Program Committee: ACL, EMNLP, NAACL, ARR

Teaching

I was the Head TA of the undergraduate course Artificial Neural Network, instructed by Minlie Huang (2021 Fall, 2022 Fall, 2023 Fall).

I coauthored the NLP textbook Modern Natural Language Generation (in Chinese). Specifically, I mainly drafted the fourth chapter Transformer-based Language Generation Model.

Personal

I am a cellist :violin: in the Tsinghua University Symphony Orchestra (TUSO). Explore our 30th-anniversary concert, featuring a performance of Symphony No. 8 by Antonín Dvořák.