Profile

Overview

I am a Master's student in Cyberspace Security at USTC, advised by Prof. Qi Chu and Prof. Nenghai Yu.

My research focuses on LLM evaluation, safety, alignment, and realistic benchmark design, with recent work on dialogue, search, reasoning, and multimodal systems; from Nov. 2024 to Dec. 2025, I was an algorithm intern at ByteDance Seed, where I worked on evaluation-centric systems for large-model applications.

Position
M.Eng. student at USTC; former algorithm intern at ByteDance Seed (2024.11–2025.12).
Research
LLM evaluation, safety, alignment, reasoning benchmarks, and AI security.
Approach
Build realistic testbeds and evaluation frameworks that reveal hidden model weaknesses.

Highlights

News

  1. 2026-02

    WorldTravel introduced a realistic multimodal travel-planning benchmark spanning 150 real-world scenarios and 2,000+ rendered webpages, revealing a sharp drop in feasibility from text-only to multimodal settings.

  2. 2025-11

    DiscoX introduced an expert-domain discourse-level translation benchmark focused on document coherence, terminology consistency, and cross-sentence faithfulness.

  3. 2025-11

    MME-CC introduced a multimodal benchmark for cognitive-capacity evaluation that stresses reasoning-intensive visual-language understanding rather than shallow perception.

  4. 2025-09

    FinSearchComp introduced a financial search-and-reasoning benchmark with open data, simulating analyst-style workflows such as time-sensitive retrieval, evidence synthesis, and multi-step investigation.

  5. 2025-09

    MARS-Bench was accepted to EMNLP 2025 Findings as a benchmark for long interactive sports-commentary dialogue, emphasizing motivation transfer, cross-turn dependency, and multi-turn robustness.

Show older news
  1. 2025-02

    CryptoX introduced a compositional reasoning benchmark for LLMs, using cryptography-inspired structure to isolate reasoning gaps that broad QA evaluations often obscure.

  2. 2025-01

    Hello Again! was accepted to NAACL 2025 for its study of long-term personalized dialogue with memory retrieval and dynamic persona modeling across sessions.

  3. 2024-06

    The preprint version of Hello Again! introduced a model-agnostic personalized dialogue agent for long-term memory and persona-aware interaction.

  4. 2023-12

    Joined LDS Lab at USTC and began research on LLM evaluation, dialogue systems, and safety-oriented questions.

Research

Publications

  1. 2026 ICLR

    FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning

    Liang Hu, Jianpeng Jiao, Jiashuo Liu, Yanle Ren, Zhoufutu Wen, Kaiyuan Zhang, Xuanliang Zhang, Xiang Gao, Tianci He, Fei Hu, Yali Liao, Zaiyuan Wang, Chenghao Yang, Qianyu Yang, Mingren Yin, Zhiyuan Zeng, Ge Zhang, Xinyi Zhang, Xiying Zhao, Zhenwei Zhu, Hongseok Namkoong, Wenhao Huang, Yuwen Tang

    ICLR 2026 Poster · ByteDance Seed

  2. 2026 ICLR

    DiscoX: Benchmarking Discourse-Level Translation in Expert Domains

    Xiying Zhao, Zhoufutu Wen, Zhixuan Chen, Jingzhe Ding, Jianpeng Jiao, Shuai Li, Xi Li, Danni Liang, Shengda Long, Qianqian Liu, Xianbo Wu, Hongwan Gao, Xiang Gao, Liang Hu, Jiashuo Liu, Mengyun Liu, Weiran Shi, Chenghao Yang, Qianyu Yang, Xuanliang Zhang, Ge Zhang, Wenhao Huang

    ICLR 2026 Poster

  3. 2025 NAACL

    Lead authorship. Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

    Hao Li*, Chenghao Yang*, An Zhang, Yang Deng, Xiang Wang, Tat-Seng Chua

    NAACL 2025 (Long Paper) · * equal contribution · † corresponding author

  4. 2025 EMNLP

    Lead authorship. MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation

    Chenghao Yang*, Yinbo Luo*, Zhoufutu Wen, Qi Chu, Tao Gong, Longxiang Liu, Kaiyuan Zhang, Jianpeng Jiao, Ge Zhang, Wenhao Huang, Nenghai Yu

    EMNLP 2025 Findings · * equal contribution · † corresponding author

Preprints & Manuscripts

  1. 2026 arXiv

    Recent preprint

    Lead authorship. WorldTravel: A Realistic Multimodal Travel-Planning Benchmark with Tightly Coupled Constraints

    Zexuan Wang*, Chenghao Yang*, Yingqi Que, Zhenzhu Yang, Huaqing Yuan, Yiwen Wang, Zhengxuan Jiang, Shengjie Fang, Zhenhe Wu, Zhaohui Wang, Zhixin Yao, Jiashuo Liu, Jincheng Ren, Yuzhen Li, Yang Yang, Jiaheng Liu, Jian Yang, Zaiyuan Wang, Ge Zhang, Zhoufutu Wen, Wenhao Huang

    arXiv preprint, 2026 · * equal contribution · † corresponding author

  2. 2026 Manuscript

    Lead authorship. Agent4Weakness: An Agentic Framework for In-Depth Model Weakness Discovery

    Xuanliang Zhang*, Chenghao Yang*, Zhoufutu Wen, Dingzirui Wang, Ge Zhang, Xiying Zhao, Tianren Feng, Jianpeng Jiao, Jingkai Liu, Zaiyuan Wang, Zuo Wang, Wenya Wu, Zhou Huan, Jin Chen, Wenhao Huang, Qingfu Zhu, Wanxiang Che

    under review · * equal contribution · † corresponding author

  3. 2026 Manuscript

    Lead authorship. When Agents Look the Same: Quantifying Distillation-Induced Similarity in Tool-Use Behaviors

    Chenghao Yang, Yuning Zhang, Zhoufutu Wen, Qi Chu, Jiaheng Liu, Tao Gong, Nenghai Yu

    under review · † corresponding author

  4. 2025 arXiv

    Lead authorship. MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive Capacity

    Kaiyuan Zhang*, Chenghao Yang*, Zhoufutu Wen, Sihang Yuan, Qiuyue Wang, Chaoyi Huang, Guosheng Zhu, He Wang, Huawenyu Lu, Jianing Wen, Jianpeng Jiao, Lishu Luo, Longxiang Liu, Sijin Wu, Xiaolei Zhu, Xuanliang Zhang, Ge Zhang, Yi Lin, Guang Shi, Chaoyou Fu, Wenhao Huang

    arXiv 2025 · * equal contribution · † corresponding author

  5. 2025 arXiv

    CryptoX: Compositional Reasoning Evaluation of Large Language Models

    Jiajun Shi*, Chaoren Wei*, Liqun Yang*, Zekun Moore Wang, Chenghao Yang, Ge Zhang, Stephen Huang, Tao Peng, Jian Yang, Zhoufutu Wen

    arXiv 2025 · * equal contribution · † corresponding author

Background

Experience

  1. 2025-

    M.Eng., Cyberspace Security

    University of Science and Technology of China

    Advisors: Prof. Qi Chu & Prof. Nenghai Yu

    Research on LLM safety, evaluation, alignment, and benchmark design.

  2. 2024.11–2025.12

    Algorithm Intern

    ByteDance Seed, Seed-Evaluation Team

    Beijing, China

    Developed realistic evaluation pipelines and benchmark suites for large-model applications.

  3. 2023-2024

    Research Intern

    NExT++ Lab, National University of Singapore

    Remote · Supervised by Research Fellow An Zhang

    Conducted research on LLM-based dialogue agents for long-term personalization and memory-aware response generation.

  4. 2021-2025

    B.Eng., Information Security

    University of Science and Technology of China

    Coursework and early research in information security, machine learning, and AI security.

Honors

Awards

  1. Outstanding Graduate Award USTC 2025
  2. Outstanding Student Bronze Award USTC 2024
  3. Wang Xiaomo Talent Program Scholarship ×4 USTC 2021-2024