[2024-12-27]Patrick Tsung-Han Wu, PHD student,  UC Berkeley"Rethinking "Benchmarks" in AI Applications Post ChatGPT-Era"

  • 2024-12-02
  • 呂宜娟
TitleRethinking "Benchmarks" in AI Applications Post ChatGPT-Era
Date2024/12/27  15:40-17:00
LocationCSIE R103
SpeakersPatrick Tsung-Han Wu, PHD student,  UC Berkeley
Host: Prof. Shang-Tse Chen

Abstract:

Benchmarks are vital for driving progress across AI applications, serving as a foundation for defining success and inspiring innovation. In the post-ChatGPT era, their design faces new challenges due to the growing capabilities of large models and increasingly complex tasks. This talk highlights two key principles for creating effective benchmarks: comprehensive evaluation metrics and robust dataset design. On the evaluation front, we explore the shift from traditional, objective metrics to human-aligned metrics, exemplified by the "CLAIR-A" case study on LLMs as evaluators. For dataset design, we emphasize diverse, representative, and controlled datasets, illustrated by the "Visual Haystacks" case study for long-context visual understanding. Together, these approaches enable benchmarks to better reflect real-world challenges and drive meaningful AI progress.




Bio:
 Tsung-Han (Patrick) Wu is a second-year CS PhD student at UC Berkeley, advised by Prof. Trevor Darrell and Prof. Joseph E. Gonzalez. His recent work focuses on exploring the zero-shot applications and addressing the limitations of Large (Vision) Language Models. Before becoming a PhD student, he earned an MS and BS in Computer Science and Information Engineering from National Taiwan University (NTU). For more information, please visit his personal website: https://tsunghan-wu.github.io/.