mel
MEL: Legal Spanish Language Model
Sánchez, David Betancur, García, Nuria Aldama, Jiménez, Álvaro Barbero, Nieto, Marta Guerrero, Morales, Patricia Marsà, Salas, Nicolás Serrano, Hernán, Carlos García, Coll, Pablo Haya, Ponsoda, Elena Montiel, Ibáñez, Pablo Calleja
Legal texts, characterized by complex and specialized terminology, present a significant challenge for Language Models. Adding an underrepresented language, such as Spanish, to the mix makes it even more challenging. While pre-trained models like XLM-RoBERTa have shown capabilities in handling multilingual corpora, their performance on domain specific documents remains underexplored. This paper presents the development and evaluation of MEL, a legal language model based on XLM-RoBERTa-large, fine-tuned on legal documents such as BOE (Bolet\'in Oficial del Estado, the Spanish oficial report of laws) and congress texts. We detail the data collection, processing, training, and evaluation processes. Evaluation benchmarks show a significant improvement over baseline models in understanding the legal Spanish language. We also present case studies demonstrating the model's application to new legal texts, highlighting its potential to perform top results over different NLP tasks.
- Law (1.00)
- Government > Regional Government (0.46)
A2SB: Audio-to-Audio Schrodinger Bridges
Kong, Zhifeng, Shih, Kevin J, Nie, Weili, Vahdat, Arash, Lee, Sang-gil, Santos, Joao Felipe, Jukic, Ante, Valle, Rafael, Catanzaro, Bryan
Audio in the real world may be perturbed due to numerous factors, causing the audio quality to be degraded. The following work presents an audio restoration model tailored for high-res music at 44.1kHz. SB), is capable of both bandwidth extension (predicting high-frequency components) and inpainting (re-generating missing segments). SB is end-to-end without need of a vocoder to predict waveform outputs, able to restore hour-long audio inputs, and trained on permissively licensed music data. SB is capable of achieving state-of-the-art bandwidth extension and inpainting quality on several out-of-distribution music test sets. Our demo website is https: //research.nvidia.com/labs/adlr/A2SB/ Audio in the real world may be perturbed due to numerous factors such as recording devices, data compression, and online transferring. For instance, certain recording devices and compression methods may result in low sampling rate, and online transferring may cause a short audio segment to be lost. These problems are usually ill-posed (Narayanaswamy et al., 2021; Moliner et al., 2023) and are usually solved with data-driven generative models. Many of these methods are task-specific, designed for the speech domain, or trained to only restore the degraded magnitude - which requires an additional vocoder to transform restored magnitude into waveform.
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (3 more...)
- Leisure & Entertainment (0.93)
- Media > Music (0.67)
ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish
Gallego, Fernando, López-García, Guillermo, Gasco-Sánchez, Luis, Krallinger, Martin, Veredas, Francisco J.
Advances in natural language processing techniques, such as named entity recognition and normalization to widely used standardized terminologies like UMLS or SNOMED-CT, along with the digitalization of electronic health records, have significantly advanced clinical text analysis. This study presents ClinLinker, a novel approach employing a two-phase pipeline for medical entity linking that leverages the potential of in-domain adapted language models for biomedical text mining: initial candidate retrieval using a SapBERT-based bi-encoder and subsequent re-ranking with a cross-encoder, trained by following a contrastive-learning strategy to be tailored to medical concepts in Spanish. This methodology, focused initially on content in Spanish, substantially outperforming multilingual language models designed for the same purpose. This is true even for complex scenarios involving heterogeneous medical terminologies and being trained on a subset of the original data. Our results, evaluated using top-k accuracy at 25 and other top-k metrics, demonstrate our approach's performance on two distinct clinical entity linking Gold Standard corpora, DisTEMIST (diseases) and MedProcNER (clinical procedures), outperforming previous benchmarks by 40 points in DisTEMIST and 43 points in MedProcNER, both normalized to SNOMED-CT codes. These findings highlight our approach's ability to address language-specific nuances and set a new benchmark in entity linking, offering a potent tool for enhancing the utility of digital medical records. The resulting system is of practical value, both for large scale automatic generation of structured data derived from clinical records, as well as for exhaustive extraction and harmonization of predefined clinical variables of interest.
- Europe > Spain > Andalusia > Málaga Province > Málaga (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Health & Medicine > Health Care Technology > Medical Record (1.00)
- Health & Medicine > Therapeutic Area (0.93)
MEL: Efficient Multi-Task Evolutionary Learning for High-Dimensional Feature Selection
Wang, Xubin, Shangguan, Haojiong, Huang, Fengyi, Wu, Shangrui, Jia, Weijia
Feature selection is a crucial step in data mining to enhance model performance by reducing data dimensionality. However, the increasing dimensionality of collected data exacerbates the challenge known as the "curse of dimensionality", where computation grows exponentially with the number of dimensions. To tackle this issue, evolutionary computational (EC) approaches have gained popularity due to their simplicity and applicability. Unfortunately, the diverse designs of EC methods result in varying abilities to handle different data, often underutilizing and not sharing information effectively. In this paper, we propose a novel approach called PSO-based Multi-task Evolutionary Learning (MEL) that leverages multi-task learning to address these challenges. By incorporating information sharing between different feature selection tasks, MEL achieves enhanced learning ability and efficiency. We evaluate the effectiveness of MEL through extensive experiments on 22 high-dimensional datasets. Comparing against 24 EC approaches, our method exhibits strong competitiveness. Additionally, we have open-sourced our code on GitHub at https://github.com/wangxb96/MEL.
- Research Report > Promising Solution (0.48)
- Research Report > New Finding (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
The Courthouse on the Moon
This story is part of Future Tense Fiction, a monthly series of short stories from Future Tense and Arizona State University's Center for Science and the Imagination about how technology and science will change our lives. The other homesteaders, mostly engineers and technicians, seemed to enjoy outings in the lunar rover. But for Eugene, this was a grinding chore that frayed his nerves. Suddenly, Mel's soothing feminine voice reverberated in his cochlear implant. "Would you like some affirmations?" You are a well-respected judge … You have worked hard to get here, to this special time and place …" As Mel went on, it seemed the suit hugged his chest a little less tightly. He relaxed his grip on the wheel. Why, he wondered, had he not remembered this technique without her prompting? Strange how the basic principles of cognitive psych were always slipping from his mind. Fortunately, she was there to remind him. "You are someone who wants what is best for the American lunar community and ...
- North America > United States > Arizona (0.24)
- Europe > Norway > Svalbard and Jan Mayen > Svalbard > Longyearbyen (0.04)
- Law > Litigation (1.00)
- Law > Government & the Courts (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
Open-set Face Recognition with Neural Ensemble, Maximal Entropy Loss and Feature Augmentation
Vareto, Rafael Henrique, Günther, Manuel, Schwartz, William Robson
Open-set face recognition refers to a scenario in which biometric systems have incomplete knowledge of all existing subjects. Therefore, they are expected to prevent face samples of unregistered subjects from being identified as previously enrolled identities. This watchlist context adds an arduous requirement that calls for the dismissal of irrelevant faces by focusing mainly on subjects of interest. As a response, this work introduces a novel method that associates an ensemble of compact neural networks with a margin-based cost function that explores additional samples. Supplementary negative samples can be obtained from external databases or synthetically built at the representation level in training time with a new mix-up feature augmentation approach. Deep neural networks pre-trained on large face datasets serve as the preliminary feature extraction module. We carry out experiments on well-known LFW and IJB-C datasets where results show that the approach is able to boost closed and open-set identification rates.
- Europe > Switzerland > Zürich > Zürich (0.05)
- South America > Brazil > Minas Gerais > Belo Horizonte (0.04)
- North America > United States (0.04)
- Research Report > New Finding (0.49)
- Research Report > Promising Solution (0.34)
An elementary belief function logic
Dubois, Didier, Godo, Lluis, Prade, Henri
Non-additive uncertainty theories, typically possibility theory, belief functions and imprecise probabilities share a common feature with modal logic: the duality properties between possibility and necessity measures, belief and plausibility functions as well as between upper and lower probabilities extend the duality between possibility and necessity modalities to the graded environment. It has been shown that the all-or-nothing version of possibility theory can be exactly captured by a minimal epistemic logic (MEL) that uses a very small fragment of the KD modal logic, without resorting to relational semantics. Besides, the case of belief functions has been studied independently, and a belief function logic has been obtained by extending the modal logic S5 to graded modalities using {\L}ukasiewicz logic, albeit using relational semantics. This paper shows that a simpler belief function logic can be devised by adding {\L}ukasiewicz logic on top of MEL. It allows for a more natural semantics in terms of Shafer basic probability assignments.
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- (5 more...)
DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
Cheuk, Kin Wai, Sawata, Ryosuke, Uesaka, Toshimitsu, Murata, Naoki, Takahashi, Naoya, Takahashi, Shusuke, Herremans, Dorien, Mitsufuji, Yuki
In this paper we propose a novel generative approach, DiffRoll, to tackle automatic music transcription (AMT). Instead of treating AMT as a discriminative task in which the model is trained to convert spectrograms into piano rolls, we think of it as a conditional generative task where we train our model to generate realistic looking piano rolls from pure Gaussian noise conditioned on spectrograms. This new AMT formulation enables DiffRoll to transcribe, generate and even inpaint music. Due to the classifier-free nature, DiffRoll is also able to be trained on unpaired datasets where only piano rolls are available. Our experiments show that DiffRoll outperforms its discriminative counterpart by 19 percentage points (ppt.) and our ablation studies also indicate that it outperforms similar existing methods by 4.8 ppt. Source code and demonstration are available https://sony.github.io/DiffRoll/.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Singapore (0.05)
- Europe > Netherlands > Gelderland > Nijmegen (0.04)
- (2 more...)
- Media > Music (0.94)
- Leisure & Entertainment (0.94)
A Quantum Natural Language Processing Approach to Musical Intelligence
Miranda, Eduardo Reck, Yeung, Richie, Pearson, Anna, Meichanetzidis, Konstantinos, Coecke, Bob
There has been tremendous progress in Artificial Intelligence (AI) for music, in particular for musical composition and access to large databases for commercialisation through the Internet. We are interested in further advancing this field, focusing on composition. In contrast to current black-box AI methods, we are championing an interpretable compositional outlook on generative music systems. In particular, we are importing methods from the Distributional Compositional Categorical (DisCoCat) modelling framework for Natural Language Processing (NLP), motivated by musical grammars. Quantum computing is a nascent technology, which is very likely to impact the music industry in time to come. Thus, we are pioneering a Quantum Natural Language Processing (QNLP) approach to develop a new generation of intelligent musical systems. This work follows from previous experimental implementations of DisCoCat linguistic models on quantum hardware. In this chapter, we present Quanthoven, the first proof-of-concept ever built, which (a) demonstrates that it is possible to program a quantum computer to learn to classify music that conveys different meanings and (b) illustrates how such a capability might be leveraged to develop a system to compose meaningful pieces of music. After a discussion about our current understanding of music as a communication medium and its relationship to natural language, the chapter focuses on the techniques developed to (a) encode musical compositions as quantum circuits, and (b) design a quantum classifier. The chapter ends with demonstrations of compositions created with the system.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Illinois (0.04)
- (10 more...)
- Research Report (0.50)
- Instructional Material (0.46)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine > Therapeutic Area > Neurology (0.46)
What Is em Virgin River /em , the Show Topping Netflix's Charts? Who Are the Virgins?
Lately I've come to think of the list Netflix provides on its homepage of its Top 10 most popular shows and movies at any given time as the streamer's version of the roll call at the Democratic National Convention this summer: Taking it in, one can only marvel at what a big country this is and how many, many different people, with very different entertainment preferences, occupy it. Where else does one find prestige programming like The Crown and The Queen's Gambit cheek by jowl with docufiction about aliens, a Christmas movie from 20 years ago, and, always, between one and five options you're convinced don't actually exist beyond their thumbnail images? For the past week or so, the honor of most fake-seeming show on the list has belonged to something called Virgin River. In contrast to the months-long publicity campaigns that precede some Netflix releases, others, like Virgin River, just seem to show up one day, their Rotten Tomatoes pages suspiciously lacking in reviews. With its blandly scenic setting and its generically good-looking leads, Virgin River feels, even more than most Netflix shows, like it could have been generated entirely by artificial intelligence.
- North America > United States > California (0.15)
- North America > United States > Colorado (0.05)
- Media > Television (1.00)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)