Profile
Overview
My research focuses on evaluation of large language models and LLM-based agents. I believe that good evaluation should go beyond ranking — it should reflect how models are actually used and reveal where they fall short in realistic settings, such as multi-turn dialogue, agentic tasks, and multimodal reasoning.
I am currently at USTC, advised by Prof. Qi Chu and Prof. Nenghai Yu. Previously, I was an algorithm intern at ByteDance Seed (Nov. 2024 – Dec. 2025), working on the Seed-Evaluation team.
I am actively looking for research internship opportunities. Feel free to reach out if you are interested in collaboration.
- Research
- Evaluation of LLMs and LLM-based agents.
- Approach
- Evaluations grounded in realistic scenarios that go beyond ranking.
Highlights
News
-
2026-05
WorldTravel was accepted to ICML 2026 as a poster, presenting a realistic multimodal travel-planning benchmark spanning 150 real-world scenarios and 2,000+ rendered webpages.
Accepted Paper -
2026-04
ExcelBench was released by Humanlaya as a benchmark report for agentic spreadsheet work, evaluating formula construction, formatting control, cross-sheet dependencies, and safe editing.
Benchmark Report -
2026-04
When Agents Look the Same was accepted to ACL 2026 Main Conference, introducing RPS and AGS to quantify distillation-induced similarity in LLM agent tool-use behavior.
Accepted Paper -
2025-11
DiscoX introduced an expert-domain discourse-level translation benchmark focused on document coherence, terminology consistency, and cross-sentence faithfulness.
arXiv Paper -
2025-11
MME-CC introduced a multimodal benchmark for cognitive-capacity evaluation that stresses reasoning-intensive visual-language understanding rather than shallow perception.
arXiv Paper -
2025-09
-
2025-09
Show older news
-
2025-02
-
2025-01
Hello Again! was accepted to NAACL 2025 for its study of long-term personalized dialogue with memory retrieval and dynamic persona modeling across sessions.
Accepted Paper -
2024-06
The preprint version of Hello Again! introduced a model-agnostic personalized dialogue agent for long-term memory and persona-aware interaction.
-
2023-12
Joined LDS Lab at USTC and began research on LLM evaluation, dialogue systems, and safety-oriented questions.
Research
Publications
= first author or co-first author.
-
2026 ICML
-
2026 ACL
-
2026 ICLR
-
2026 ICLR
-
2025 NAACL
Lead authorship. Hello Again! LLM-powered Personalized Agent for Long-term Dialogue
-
2025 EMNLP
Preprints & Manuscripts
-
2026 Benchmark
-
2026 Manuscript
Lead authorship. Agent4Weakness: An Agentic Framework for In-Depth Model Weakness Discovery
-
2025 arXiv
-
2025 arXiv
Background
Experience
-
2025-
M.Eng., Cyberspace Security
University of Science and Technology of China
Research on evaluation of LLMs and LLM-based agents.
-
2024.11–2025.12
Algorithm Intern
ByteDance Seed, Seed-Evaluation Team
Developed realistic evaluation pipelines and benchmark suites for large-model applications.
-
2023-2024
Research Intern
NExT++ Lab, National University of Singapore
-
2021-2025
B.Eng., Information Security
University of Science and Technology of China
Honors
Awards
- Outstanding Graduate Award USTC 2025
- Wang Xiaomo Talent Program Scholarship ×4 USTC 2021-2024