[2024-12-27]Patrick Tsung-Han Wu, PHD student, UC Berkeley"Rethinking "Benchmarks" in AI Applications Post ChatGPT-Era"

2024-12-02
呂宜娟

Title：Rethinking "Benchmarks" in AI Applications Post ChatGPT-Era
Date：2024/12/27 15:40-17:00
Location：CSIE R103
Speakers：Patrick Tsung-Han Wu, PHD student, UC Berkeley

Host: Prof. Shang-Tse Chen

Abstract:

Benchmarks are vital for driving progress across AI applications, serving as a foundation for defining success and inspiring innovation. In the post-ChatGPT era, their design faces new challenges due to the growing capabilities of large models and increasingly complex tasks. This talk highlights two key principles for creating effective benchmarks: comprehensive evaluation metrics and robust dataset design. On the evaluation front, we explore the shift from traditional, objective metrics to human-aligned metrics, exemplified by the "CLAIR-A" case study on LLMs as evaluators. For dataset design, we emphasize diverse, representative, and controlled datasets, illustrated by the "Visual Haystacks" case study for long-context visual understanding. Together, these approaches enable benchmarks to better reflect real-world challenges and drive meaningful AI progress.

Bio:
Tsung-Han (Patrick) Wu is a second-year CS PhD student at UC Berkeley, advised by Prof. Trevor Darrell and Prof. Joseph E. Gonzalez. His recent work focuses on exploring the zero-shot applications and addressing the limitations of Large (Vision) Language Models. Before becoming a PhD student, he earned an MS and BS in Computer Science and Information Engineering from National Taiwan University (NTU). For more information, please visit his personal website: https://tsunghan-wu.github.io/.