Goto

Collaborating Authors

 hunk


GitGoodBench: A Novel Benchmark For Evaluating Agentic Performance On Git

arXiv.org Artificial Intelligence

Benchmarks for Software Engineering (SE) AI agents, most notably SWE-bench, have catalyzed progress in programming capabilities of AI agents. However, they overlook critical developer workflows such as Version Control System (VCS) operations. To address this issue, we present GitGoodBench, a novel benchmark for evaluating AI agent performance on VCS tasks. GitGoodBench covers three core Git scenarios extracted from permissive open-source Python, Java, and Kotlin repositories. Our benchmark provides three datasets: a comprehensive evaluation suite (900 samples), a rapid prototyping version (120 samples), and a training corpus (17,469 samples). We establish baseline performance on the prototyping version of our benchmark using GPT-4o equipped with custom tools, achieving a 21.11% solve rate overall. We expect GitGoodBench to serve as a crucial stepping stone toward truly comprehensive SE agents that go beyond mere programming.


I Launched a Project to Determine the NBA's Hottest Players. This Year, We Lost Our Minds.

Slate

Well, the NBA just rolled out its All-Star weekend all over us, culminating in a splash of hardware for Damian Lillard, a lackluster game I turned off in the third quarter, and a surprise spotlight on Mac McClung, who my friend Jennifer said "looks like his name is Air Bud." I'm from L.A.; I can respect the pageantry. But the weekend obscured what many of us appreciate most about NBA players: Many of them are very hot, and they deserve to be recognized for it. As such, I assembled a select committee to honor the year's biggest smokeshows: the All-Hunk NBA teams. The All-Hunk NBA team is a near-annual honor bestowed on the hunkiest players in the league during the NBA season.


AI Genius Created a Virtual Baby Who Can Laugh, Cry and Play the Piano

#artificialintelligence

Mark Sagar, the artificial intelligence genius, has created a virtual reality baby who can read words from a book, laugh, cry, and even play the piano. Sagar's company, Soul Machines Ltd., is trying to humanize AI, writes Bloomberg. He thinks one key to making humans feel more connected to AI is to make the virtual beings more lifelike, reports Bloomberg. That's why Soul Machines' creations have human voices and can wince and grin. Eventually, Sagar would like to produce the first wave of "likable, believable virtual assistants that work as customer service agents and breathe life into hunks of plastic."