Goto

Collaborating Authors

 nota


None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering

Tam, Zhi Rui, Wu, Cheng-Kuang, Lin, Chieh-Yen, Chen, Yun-Nung

arXiv.org Artificial Intelligence

Multiple-choice exam questions with "None of the above" (NA) options have been extensively studied in educational testing, in which existing research suggests that they better assess true knowledge. However, their impact on Large Language Models (LLMs) evaluation remains underexplored. Through systematic experiments with 28 LLMs on the MMLU benchmark, we examine how NA options affect model performance and confidence calibration. Our analysis reveals that NA options, when used as the correct answer, lead to a consistent 30-50\% performance drop across models regardless of scale--suggesting that LLMs lack the meta-cognitive ability to systematically evaluate and reject all given options when none are correct. This degradation shows strong domain dependence, with minimal impact on mathematical reasoning (14.6\% drop) but severe effects on tasks requiring uncertainty handling like business ethics (48.1\% drop). Our results highlight important implications for benchmark design and raise questions about LLMs' ability to handle uncertainty in real-world applications.


NOTA: Multimodal Music Notation Understanding for Visual Large Language Model

Tang, Mingni, Li, Jiajia, Yang, Lu, Zhang, Zhiqiang, Tian, Jinghao, Li, Zuchao, Zhang, Lefei, Wang, Ping

arXiv.org Artificial Intelligence

Symbolic music is represented in two distinct forms: two-dimensional, visually intuitive score images, and one-dimensional, standardized text annotation sequences. While large language models have shown extraordinary potential in music, current research has primarily focused on unimodal symbol sequence text. Existing general-domain visual language models still lack the ability of music notation understanding. Recognizing this gap, we propose NOTA, the first large-scale comprehensive multimodal music notation dataset. It consists of 1,019,237 records, from 3 regions of the world, and contains 3 tasks. Based on the dataset, we trained NotaGPT, a music notation visual large language model. Specifically, we involve a pre-alignment training phase for cross-modal alignment between the musical notes depicted in music score images and their textual representation in ABC notation. Subsequent training phases focus on foundational music information extraction, followed by training on music notation analysis. Experimental results demonstrate that our NotaGPT-7B achieves significant improvement on music understanding, showcasing the effectiveness of NOTA and the training pipeline. Our datasets are open-sourced at https://huggingface.co/datasets/MYTH-Lab/NOTA-dataset.


Few-shot Open Relation Extraction with Gaussian Prototype and Adaptive Margin

Guo, Tianlin, Zhang, Lingling, Wang, Jiaxin, Lei, Yuokuo, Li, Yifei, Wang, Haofen, Liu, Jun

arXiv.org Artificial Intelligence

Few-shot relation extraction with none-of-the-above (FsRE with NOTA) aims at predicting labels in few-shot scenarios with unknown classes. FsRE with NOTA is more challenging than the conventional few-shot relation extraction task, since the boundaries of unknown classes are complex and difficult to learn. Meta-learning based methods, especially prototype-based methods, are the mainstream solutions to this task. They obtain the classification boundary by learning the sample distribution of each class. However, their performance is limited because few-shot overfitting and NOTA boundary confusion lead to misclassification between known and unknown classes. To this end, we propose a novel framework based on Gaussian prototype and adaptive margin named GPAM for FsRE with NOTA, which includes three modules, semi-factual representation, GMM-prototype metric learning and decision boundary learning. The first two modules obtain better representations to solve the few-shot problem through debiased information enhancement and Gaussian space distance measurement. The third module learns more accurate classification boundaries and prototypes through adaptive margin and negative sampling. In the training procedure of GPAM, we use contrastive learning loss to comprehensively consider the effects of range and margin on the classification of known and unknown classes to ensure the model's stability and robustness. Sufficient experiments and ablations on the FewRel dataset show that GPAM surpasses previous prototype methods and achieves state-of-the-art performance.


More than 1,000 students pledge not to work at Google and Amazon due to Project Nimbus

Engadget

No Tech for Apartheid (NOTA), a coalition of tech workers demanding big tech companies to drop their contracts with the Israeli government, is close to reaching its goal for a campaign asking students not to work with Google and Amazon. As Wired reports, more than 1,100 people who identified themselves as STEM students and young workers have taken the pledge to refuse jobs from the companies "for powering Israel's Apartheid system and genocide against Palestinians." Based on its website, NOTA's goal is to gather 1,200 signatures for the campaign. "As young people and students in STEM and beyond, we refuse to have any part in these horrific abuses. We're joining the #NoTechForApartheid campaign to demand Amazon and Google immediately end Project Nimbus," part of the pledge reads.


Towards Realistic Few-Shot Relation Extraction: A New Meta Dataset and Evaluation

Alam, Fahmida, Islam, Md Asiful, Vacareanu, Robert, Surdeanu, Mihai

arXiv.org Artificial Intelligence

We introduce a meta dataset for few-shot relation extraction, which includes two datasets derived from existing supervised relation extraction datasets - NYT29 (Takanobu et al., 2019; Nayak and Ng, 2020) and WIKI-DATA (Sorokin and Gurevych, 2017) - as well as a few-shot form of the TACRED dataset (Sabo et al., 2021). Importantly, all these few-shot datasets were generated under realistic assumptions such as: the test relations are different from any relations a model might have seen before, limited training data, and a preponderance of candidate relation mentions that do not correspond to any of the relations of interest. Using this large resource, we conduct a comprehensive evaluation of six recent few-shot relation extraction methods, and observe that no method comes out as a clear winner. Further, the overall performance on this task is low, indicating substantial need for future research. We release all versions of the data, i.e., both supervised and few-shot, for future research.


Nota raises $14.7M to adapt biometrics, AI models for edge applications

#artificialintelligence

Nota, which provides technology to optimize AI models, announced that it has closed a $14.7 million Series B funding round. The company's technology is another important piece of the puzzle when it comes to helping resource-constrained edge devices run applications such as biometric identification. Participants in the funding round included Stonebridge Ventures, LB Investment, DS Asset, Intervest, and Company K Partners. The fresh funding comes roughly a year after Nota closed its Series A round with $6.7 million. Nota has raised a total of $23 million to date.


Match Introduces a Human Matchmaking Element to Its Dating App

WSJ.com: WSJD - Technology

They will be guided by members' answers to four questions on, for instance, what a person would change about their dating life, what sort of person a user gravitates toward, and others. The company created the feature because, for some singles, the pandemic has added a degree of urgency to finding a long-term relationship, said Amarnath Thombre, chief executive of Match Group Americas. "People have had enough time to reflect on what really matters to me, what makes me happy," Mr. Thombre said. "They're also being a little more clear on who they want to be." Get weekly insights into the ways companies optimize data, technology and design to drive success with their customers and employees. The experts can narrow down the candidate pool, bringing forward people who might be more compatible, said Rachel DeAlto, chief dating expert at Match.