Goto

Collaborating Authors

 wayne


GSM8K-V: Can Vision Language Models Solve Grade School Math Word Problems in Visual Contexts

Yuan, Fan, Yan, Yuchen, Jiang, Yifan, Zhao, Haoran, Feng, Tao, Chen, Jinyan, Lou, Yanwei, Zhang, Wenqi, Shen, Yongliang, Lu, Weiming, Xiao, Jun, Zhuang, Yueting

arXiv.org Artificial Intelligence

Vision language models (VLMs) achieve unified modeling of images and text, enabling them to accomplish complex real-world tasks through perception, planning, and reasoning. Among these tasks, reasoning is particularly representative, with mathematical reasoning serving as a prominent example. It highlights the high-level capability of VLMs to comprehend mathematical information in images and to perform sophisticated reasoning. Recently, numerous visual mathematical reasoning benchmarks have been proposed, but they are often restricted to geometry, lack coverage of math word problems, and rarely assess reasoning across multiple images. To address these gaps, we introduce GSM8K-V, a purely visual multi-image mathematical reasoning benchmark. GSM8K-V is built by systematically mapping each sample from the widely used text-based GSM8K into visual form. Through a carefully designed automated image-generation pipeline combined with meticulous human annotation, we curate 1,319 high-quality samples. We evaluate a wide range of open-source and closed-source models on GSM8K-V. Results show that although existing VLMs have nearly saturated performance on text-based GSM8K, there remains substantial room for improvement on GSM8K-V. For example, the best-performing model, Gemini-2.5-Pro, achieves 95.22% accuracy on GSM8K but only 46.93% on GSM8K-V. We conduct a comprehensive analysis of GSM8K-V, examining the limitations of current models as well as potential directions for improvement. GSM8K-V offers a new perspective on visual mathematical reasoning and establishes a benchmark to guide the development of more robust and generalizable VLMs.


When Bond Villain Meets Tech Billionaire

Slate

This story is part of Future Tense Fiction, a monthly series of short stories from Future Tense and Arizona State University's Center for Science and the Imagination about how technology and science will change our lives. After the regrettable incidents on the island (the old island), the Doctor kept a low profile. Many thought he was dead. There was safety in that once. Now the greater safety is in being known. What plans he had, back in the day! If only … but no, this is just the sort of negative spiral his therapist has warned him about. He has remade himself as an altruist, a philanthropist, and he means for his efforts to have maximum impact.


Trapped in a Video Game with "Free Guy"

The New Yorker

The hero of "Free Guy" is a guy named Guy (Ryan Reynolds). He has a best buddy named Buddy (Lil Rel Howery), and they live in a city named Free City. What, however, is the nature of their liberty? Guy wakes up every morning, dons an identical blue shirt, buys a cup of coffee, and goes to a bank, where he works as a teller. His customary greeting is "Don't have a good day.


How a ghostly outline revealed the secret of Modigliani's lost lover

#artificialintelligence

No one wants to be reminded of a failed relationship by having the ex's portrait hanging around. After Amedeo Modigliani and his lover, Beatrice Hastings, broke up, the Italian artist is thought to have obliterated her memory by painting another woman's likeness over his portrait of her. So he might not be too happy to learn that science has now brought back that "lost" portrait, using artificial intelligence, an X-ray and 3D-printing to re-create the painting, with full colour and textured brushstrokes. Portrait of a Girl, a 1917 masterpiece, is owned by the Tate, which was taken aback in 2018 to discover an earlier portrait beneath the picture. X-rays revealed the ghostly outlines of a full-length figure, prompting the then curator, Nancy Ireson, to suggest that it was a portrait of Hastings, and that Modigliani "might have painted her out" after their intense two-year relationship ended in 1916.


Batman Movie Script Written By AI After Watching 1000 Hours Footage

#artificialintelligence

Well, I don't know if you are a Batman fan or not, but if you are an AI enthusiastic then this Batman movie script written by AI bot will definitely going to make you crazy. The next movie of Batman is likely to come in 2021 but in the meantime, DC fans are doing what they can to get their Caped Crusader fix. One of such DC fan grab an AI bot with him and watch the old episodes with it. Keaton Patti is the name of this fan and he has created an AI bot whose specialty is writing scripts. Keaton Patti has trained his bot over mover than 1,000 hours of Batman films.


5 Steps to Get Started with AI in the Enterprise

#artificialintelligence

Plenty has been written about artificial intelligence (AI) and its game-changing potential. But, like many technologies before it, AI poses an important question. Are the potential benefits enough to get the enterprise on board? And if so, why is everyone so nervous about getting going? It will only take a quick search on your favorite search engine or social networking site before you are sufficiently overwhelmed by the multitude of views on AI, cognitive and automation technologies.


Fortnite Is a Huge Success -- And a Sign of What's to Come in Gaming

TIME - Tech

This year that game is undeniably Fortnite Battle Royale, an online free-for-all that every teen in America suddenly seems to be playing. It's not just kids, though–everyone from rapper Drake to Los Angeles Laker Josh Hart is a fan. That groundswell of support has propelled Fortnite from a simple video game into a cultural sensation, with hundreds of millions of fans worldwide who play the game, wear the gear and even learn the characters' victory dances. "Fortnite is another in a long line of games like World of Warcraft or Guitar Hero or Minecraft that is changing everything underfoot," says Mat Piscatella, a video-game industry analyst with research firm NPD Group. Fortnite's big draw is a madcap multiplayer mode that drops up to 100 players on an island in a last-person-standing showdown.


Danger, danger Will Robinson: Modernizing risk mitigation systems with AI

#artificialintelligence

How do you define artificial intelligence? Would you define it differently if it was your job to prevent fraud and financial crimes, where the risks are constantly shifting? In a recent meeting with banking executives responsible for fraud and financial crimes risk mitigation, Wayne Thompson, Manager of Data Science Technologies at SAS was discussing AI. Wayne asked each executive to write on a note card his/her definition of Artificial Intelligence (AI). All of these definitions are correct.


FlashText - A library faster than Regular Expressions for NLP tasks

@machinelearnbot

As the decades went on, differing interpretations of the character emerged. The late 1960s Bruce Wayne television series used a camp aesthetic, which continued to be associated with the character for years after the show ended. Various creators worked to return the character to his dark roots, culminating in 1986 with The Dark Knight Returns by Frank Miller. The success of Warner Bros.\' live-action Bruce Wayne feature films have helped maintain the character\'s prominence in mainstream culture.[8]\n\nAn American cultural icon, Bruce Wayne has garnered enormous popularity and is among the most identifiable comic book characters.


'First Apple computer' sells for 815,000

BBC News

A prototype Apple 1, a holy-grail item in electronics memorabilia, has been sold for 815,000 ( 618,000). Apple co-founders Steve Jobs and Steve Wozniak built just 200 of the computers in 1976. The model auctioned this week contains tell-tale signs that it is a prototype, probably made prior to its manufacturing run. One computer historian says it is "one of the first, if not the first ever" Apple computer. This "celebration edition" Apple 1 was expected to make 1m, but auctioneer Charitybuzz told the BBC that the final bid was 815,000.