Benchmarks in Leipzig
mathematicsresearchlarge language modelsartificial intelligencebenchmarking
Author: root-parent
Date: 6/6/2026
Article Summary:
A group of mathematicians created a dataset of research-level mathematics questions with known answers and evaluated the performance of large language models (LLMs) in solving these questions.