Benchmarks & Metrics: Models Increasingly Overfitted to Leaderboards
Over the past two years, the AI NY community has been actively reviewing and discussing various benchmarks while tracking the rapid progress of new models. What has become increasingly clear is that m
Federico Ulfo