Moroni
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Europe > Switzerland (0.04)
- North America > United States > North Carolina (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (0.92)
- North America > United States > Virginia (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (2 more...)
You May Speak Freely: Improving the Fine-Grained Visual Recognition Capabilities of Multimodal Large Language Models with Answer Extraction
Lawrence, Logan, Saha, Oindrila, Wei, Megan, Sun, Chen, Maji, Subhransu, Van Horn, Grant
Despite the renewed interest in zero-shot visual classification due to the rise of Multimodal Large Language Models (MLLMs), the problem of evaluating free-form responses of auto-regressive models remains a persistent challenge. Most existing works focus on language-only tasks or don't consider Multiple Choice Questions (MCQs) beyond 5-way options, both of which are critical capabilities to solve tasks in Fine-Grained Visual Classification (FGVC) where choice counts are in the hundreds to thousands and the choices are highly related. Furthermore, in this highly multi-way MCQ setting it is not clear how to extend LLM choice extraction to retrieval-based problems, where computing probabilities over the choice set is computationally costly. In this work we investigate nlg2choice, a simple two-stage method which first asks the MLLM an open-ended question for the task with minimal constraints, then uses text-only constrained decoding to predict the most likely choice. In retrieval settings, we compute the probability of the constrained response taking that choice with an early stopping method to significantly improve throughput. Our results show improvement over a suite of seven fine-grained visual datasets when evaluating in terms of classification and retrieval, and show that this performance holds over the various ways that users of LLMs can implement tasks in natural language.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (3 more...)
Deep transfer learning for image classification: a survey
Plested, Jo, Phiri, Musa, Gedeon, Tom
Deep neural networks such as convolutional neural networks (CNNs) and transformers have achieved many successes in image classification in recent years. It has been consistently demonstrated that best practice for image classification is when large deep models can be trained on abundant labelled data. However there are many real world scenarios where the requirement for large amounts of training data to get the best performance cannot be met. In these scenarios transfer learning can help improve performance. To date there have been no surveys that comprehensively review deep transfer learning as it relates to image classification overall. However, several recent general surveys of deep transfer learning and ones that relate to particular specialised target image classification tasks have been published. We believe it is important for the future progress in the field that all current knowledge is collated and the overarching patterns analysed and discussed. In this survey we formally define deep transfer learning and the problem it attempts to solve in relation to image classification. We survey the current state of the field and identify where recent progress has been made. We show where the gaps in current knowledge are and make suggestions for how to progress the field to fill in these knowledge gaps. We present a new taxonomy of the applications of transfer learning for image classification. This taxonomy makes it easier to see overarching patterns of where transfer learning has been effective and, where it has failed to fulfill its potential. This also allows us to suggest where the problems lie and how it could be used more effectively. We show that under this new taxonomy, many of the applications where transfer learning has been shown to be ineffective or even hinder performance are to be expected when taking into account the source and target datasets and the techniques used.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > California (0.04)
- Asia > Middle East > Jordan (0.04)
- (9 more...)
- Overview (1.00)
- Research Report > New Finding (0.92)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Challenging the Abilities of Large Language Models in Italian: a Community Initiative
Nissim, Malvina, Croce, Danilo, Patti, Viviana, Basile, Pierpaolo, Attanasio, Giuseppe, Musacchio, Elio, Rinaldi, Matteo, Borazio, Federico, Francis, Maria, Gili, Jacopo, Scalena, Daniel, Altuna, Begoña, Azurmendi, Ekhi, Basile, Valerio, Bentivogli, Luisa, Bisazza, Arianna, Bolognesi, Marianna, Brunato, Dominique, Caselli, Tommaso, Casola, Silvia, Cassese, Maria, Cettolo, Mauro, Collacciani, Claudia, De Cosmo, Leonardo, Di Buono, Maria Pia, Esuli, Andrea, Etxaniz, Julen, Ferrando, Chiara, Fidelangeli, Alessia, Frenda, Simona, Fusco, Achille, Gaido, Marco, Galassi, Andrea, Galli, Federico, Giordano, Luca, Goffetti, Mattia, Gonzalez-Dios, Itziar, Gregori, Lorenzo, Grundler, Giulia, Iannaccone, Sandro, Jiang, Chunyang, La Quatra, Moreno, Lagioia, Francesca, Lo, Soda Marem, Madeddu, Marco, Magnini, Bernardo, Manna, Raffaele, Mercorio, Fabio, Merlo, Paola, Muti, Arianna, Nastase, Vivi, Negri, Matteo, Onorati, Dario, Palmieri, Elena, Papi, Sara, Passaro, Lucia, Pensa, Giulia, Piergentili, Andrea, Potertì, Daniele, Puccetti, Giovanni, Ranaldi, Federico, Ranaldi, Leonardo, Ravelli, Andrea Amelio, Rosola, Martina, Ruzzetti, Elena Sofia, Samo, Giuseppe, Santilli, Andrea, Santin, Piera, Sarti, Gabriele, Sartor, Giovanni, Savoldi, Beatrice, Serino, Antonio, Seveso, Andrea, Siciliani, Lucia, Torroni, Paolo, Varvara, Rossella, Zaninello, Andrea, Zanollo, Asya, Zanzotto, Fabio Massimo, Zeinalipour, Kamyar, Zugarini, Andrea
The rapid progress of Large Language Models (LLMs) has transformed natural language processing and broadened its impact across research and society. Yet, systematic evaluation of these models, especially for languages beyond English, remains limited. "Challenging the Abilities of LAnguage Models in ITAlian" (CALAMITA) is a large-scale collaborative benchmarking initiative for Italian, coordinated under the Italian Association for Computational Linguistics. Unlike existing efforts that focus on leaderboards, CALAMITA foregrounds methodology: it federates more than 80 contributors from academia, industry, and the public sector to design, document, and evaluate a diverse collection of tasks, covering linguistic competence, commonsense reasoning, factual consistency, fairness, summarization, translation, and code generation. Through this process, we not only assembled a benchmark of over 20 tasks and almost 100 subtasks, but also established a centralized evaluation pipeline that supports heterogeneous datasets and metrics. We report results for four open-weight LLMs, highlighting systematic strengths and weaknesses across abilities, as well as challenges in task-specific evaluation. Beyond quantitative results, CALAMITA exposes methodological lessons: the necessity of fine-grained, task-representative metrics, the importance of harmonized pipelines, and the benefits and limitations of broad community engagement. CALAMITA is conceived as a rolling benchmark, enabling continuous integration of new tasks and models. This makes it both a resource -- the most comprehensive and diverse benchmark for Italian to date -- and a framework for sustainable, community-driven evaluation. We argue that this combination offers a blueprint for other languages and communities seeking inclusive and rigorous LLM evaluation practices.
- North America > United States > Montana (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- (36 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Law (1.00)
- Health & Medicine (1.00)
- Information Technology (0.92)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Energy-based Autoregressive Generation for Neural Population Dynamics
Ge, Ningling, Dai, Sicheng, Zhu, Yu, Yu, Shan
Understanding brain function represents a fundamental goal in neuroscience, with critical implications for therapeutic interventions and neural engineering applications. Computational modeling provides a quantitative framework for accelerating this understanding, but faces a fundamental trade-off between computational efficiency and high-fidelity modeling. To address this limitation, we introduce a novel Energy-based Autoregressive Generation (EAG) framework that employs an energy-based transformer learning temporal dynamics in latent space through strictly proper scoring rules, enabling efficient generation with realistic population and single-neuron spiking statistics. Evaluation on synthetic Lorenz datasets and two Neural Latents Benchmark datasets (MC Maze and Area2 bump) demonstrates that EAG achieves state-of-the-art generation quality with substantial computational efficiency improvements, particularly over diffusion-based methods. Beyond optimal performance, conditional generation applications show two capabilities: generalizing to unseen behavioral contexts and improving motor brain-computer interface decoding accuracy using synthetic neural data. These results demonstrate the effectiveness of energy-based modeling for neural population dynamics with applications in neuroscience research and neural engineering.
Beyond Multiple Choice: Verifiable OpenQA for Robust Vision-Language RFT
Liu, Yesheng, Li, Hao, Xu, Haiyu, Pei, Baoqi, Wang, Jiahao, Zhao, Mingxuan, Zheng, Jingshu, He, Zheqi, Yao, JG, Qin, Bowen, Yang, Xi, Zhang, Jiajun
Multiple-choice question answering (MCQA) has been a popular format for evaluating and reinforcement fine-tuning (RFT) of modern multimodal language models. Its constrained output format allows for simplified, deterministic automatic verification. However, we find that the options may leak exploitable signals, which makes the accuracy metrics unreliable for indicating real capabilities and encourages explicit or implicit answer guessing behaviors during RFT. We propose ReVeL (Rewrite and Verify by LLM), a framework that rewrites multiple-choice questions into open-form questions while keeping answers verifiable whenever possible. The framework categorizes questions according to different answer types, apply different rewriting and verification schemes, respectively. When applied for RFT, we converted 20k MCQA examples and use GRPO to finetune Qwen2.5-VL models. Models trained on ReVeL-OpenQA match MCQA accuracy on multiple-choice benchmarks and improve OpenQA accuracy by about six percentage points, indicating better data efficiency and more robust reward signals than MCQA-based training. When used for evaluation, ReVeL also reveals up to 20 percentage points of score inflation in MCQA benchmarks (relative to OpenQA), improves judging accuracy, and reduces both cost and latency. We will release code and data publicly.
- North America > United States > South Dakota (0.05)
- Europe > Spain (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (4 more...)
- Questionnaire & Opinion Survey (0.56)
- Research Report (0.50)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
- Europe > Germany > Brandenburg > Potsdam (0.04)
- Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)
- Europe > Germany > North Rhine-Westphalia (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Energy (0.46)
- Government (0.46)
- Education (0.46)
- Banking & Finance > Economy (0.45)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- Asia > Malaysia (0.04)
- Africa > Comoros > Grande Comore > Moroni (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)