Goto

Collaborating Authors

 moses


Mixtures of SubExperts for Large Language Continual Learning

Kang, Haeyong

arXiv.org Artificial Intelligence

Adapting Large Language Models (LLMs) to a continuous stream of tasks is a critical yet challenging endeavor. While Parameter-Efficient Fine-Tuning (PEFT) methods have become a standard for this, they face a fundamental dilemma in continual learning. Reusing a single set of PEFT parameters for new tasks often leads to catastrophic forgetting of prior knowledge. Conversely, allocating distinct parameters for each task prevents forgetting but results in a linear growth of the model's size and fails to facilitate knowledge transfer between related tasks. To overcome these limitations, we propose a novel adaptive PEFT method referred to as \textit{Mixtures of SubExperts (MoSEs)}, a novel continual learning framework designed for minimal forgetting and efficient scalability. MoSEs integrate a sparse Mixture of SubExperts into the transformer layers, governed by a task-specific routing mechanism. This architecture allows the model to isolate and protect knowledge within dedicated SubExperts, thereby minimizing parameter interference and catastrophic forgetting. Crucially, the router can adaptively select and combine previously learned sparse parameters for new tasks, enabling effective knowledge transfer while ensuring that the model's capacity grows sublinearly. We evaluate MoSEs on the comprehensive TRACE benchmark datasets. Our experiments demonstrate that MoSEs significantly outperform conventional continual learning approaches in both knowledge retention and scalability to new tasks, achieving state-of-the-art performance with substantial memory and computational savings.


Multi-Task Vehicle Routing Solver via Mixture of Specialized Experts under State-Decomposable MDP

Pan, Yuxin, Cao, Zhiguang, Gu, Chengyang, Liu, Liu, Zhao, Peilin, Chen, Yize, Lin, Fangzhen

arXiv.org Artificial Intelligence

Existing neural methods for multi-task vehicle routing problems (VRPs) typically learn unified solvers to handle multiple constraints simultaneously. However, they often underutilize the compositional structure of VRP variants, each derivable from a common set of basis VRP variants. This critical oversight causes unified solvers to miss out the potential benefits of basis solvers, each specialized for a basis VRP variant. To overcome this limitation, we propose a framework that enables unified solvers to perceive the shared-component nature across VRP variants by proactively reusing basis solvers, while mitigating the exponential growth of trained neural solvers. Specifically, we introduce a State-Decomposable MDP (SDMDP) that reformulates VRPs by expressing the state space as the Cartesian product of basis state spaces associated with basis VRP variants. More crucially, this formulation inherently yields the optimal basis policy for each basis VRP variant. Furthermore, a Latent Space-based SDMDP extension is developed by incorporating both the optimal basis policies and a learnable mixture function to enable the policy reuse in the latent space. Under mild assumptions, this extension provably recovers the optimal unified policy of SDMDP through the mixture function that computes the state embedding as a mapping from the basis state embeddings generated by optimal basis policies. For practical implementation, we introduce the Mixture-of-Specialized-Experts Solver (MoSES), which realizes basis policies through specialized Low-Rank Adaptation (LoRA) experts, and implements the mixture function via an adaptive gating mechanism. Extensive experiments conducted across VRP variants showcase the superiority of MoSES over prior methods.


Discovery in Egypt offers new evidence for the Bible's story of Moses

Daily Mail - Science & tech

Trump dollar coin design released by Treasury... and it's inspired by the most iconic political photo of the century Top plastic surgeons reveal secrets behind Taylor Swift's'changing' face: 'It is looking very full' Shroud of Turin mystery deepens as surgeon spots hidden detail that points to Jesus' resurrection Hollywood A-listers pay me $50,000 to cure their drug addicted nepo-babies because they can't afford for these secrets to go public I'm no longer sleeping with my husband - and never will again, says MOLLY RYDDELL. I love him, but counted down the moments until he climaxed. Then I couldn't bear it any more and the truth spilled out... so many women feel the same Lori Loughlin's husband Mossimo Giannulli seen with mystery brunette in tiny skirt day after shock split I'm a woman with autism... here are the signs you might be masking, even from yourself Diddy sentenced to 50 MONTHS in prison for prostitution offenses as he's branded a vile and unrepentant woman beater I've loved Taylor Swift for years. I was so happy after trying a trendy new cosmetic procedure. But 10 years later I suffered a devastating side effect... the doctor had lied The'middle-class kinks' saving marriages: Wives reveal the eight buzzy sex trends that revived their lagging libidos - including the fantasy husbands are secretly obsessed with Cake-faced 90s sitcom star looks unrecognizable as she ditches the heavy eyeshadow for an LA errand run can you guess who?


MoSEs: Uncertainty-Aware AI-Generated Text Detection via Mixture of Stylistics Experts with Conditional Thresholds

Wu, Junxi, Wang, Jinpeng, Liu, Zheng, Chen, Bin, Hu, Dongjian, Wu, Hao, Xia, Shu-Tao

arXiv.org Artificial Intelligence

The rapid advancement of large language models has intensified public concerns about the potential misuse. Therefore, it is important to build trustworthy AI-generated text detection systems. Existing methods neglect stylistic modeling and mostly rely on static thresholds, which greatly limits the detection performance. In this paper, we propose the Mixture of Stylistic Experts (MoSEs) framework that enables stylistics-aware uncertainty quantification through conditional threshold estimation. MoSEs contain three core components, namely, the Stylistics Reference Repository (SRR), the Stylistics-Aware Router (SAR), and the Conditional Threshold Estimator (CTE). For input text, SRR can activate the appropriate reference data in SRR and provide them to CTE. Subsequently, CTE jointly models the linguistic statistical properties and semantic features to dynamically determine the optimal threshold. With a discrimination score, MoSEs yields prediction labels with the corresponding confidence level. Our framework achieves an average improvement 11.34% in detection performance compared to baselines. More inspiringly, MoSEs shows a more evident improvement 39.15% in the low-resource case. Our code is available at https://github.com/creator-xi/MoSEs.


Hidden 'fingerprints' found in the Bible after thousands of years rewrite the story of the Ark of the Covenant

Daily Mail - Science & tech

Scientists have uncovered hidden patterns in the Bible that challenge ancient beliefs about its origins. Using artificial intelligence, they discovered'fingerprints' in text throughout the Old Testament, suggesting multiple people wrote the stories. The traditional Jewish and Christian understanding is that Moses wrote the first five books of the Old Testament, including stories about creation, Noah's flood and the Ark of the Covenant. The new study found three distinct writing styles with distinct vocabulary, tone and focus areas, suggesting multiple authors and sources contributed to the books over time. Researchers used AI analyzed for 50 chapters across five books, uncovering inconsistencies in language and content, repeated stories, shifts in tone and internal contradictions.


Sparse Regression for Machine Translation

Biçici, Ergun

arXiv.org Artificial Intelligence

We use transductive regression techniques to learn mappings between source and target features of given parallel corpora and use these mappings to generate machine translation outputs. We show the effectiveness of $L_1$ regularized regression (\textit{lasso}) to learn the mappings between sparsely observed feature sets versus $L_2$ regularized regression. Proper selection of training instances plays an important role to learn correct feature mappings within limited computational resources and at expected accuracy levels. We introduce \textit{dice} instance selection method for proper selection of training instances, which plays an important role to learn correct feature mappings for improving the source and target coverage of the training set. We show that $L_1$ regularized regression performs better than $L_2$ regularized regression both in regression measurements and in the translation experiments using graph decoding. We present encouraging results when translating from German to English and Spanish to English. We also demonstrate results when the phrase table of a phrase-based decoder is replaced with the mappings we find with the regression model.


Computer vision-based model for detecting turning lane features on Florida's public roadways

Antwi, Richard Boadu, Takyi, Samuel, Michael, Kimollo, Karaer, Alican, Ozguven, Eren Erman, Moses, Ren, Dulebenets, Maxim A., Sando, Thobias

arXiv.org Artificial Intelligence

Efficient and current roadway geometry data collection is a critical task for transportation agencies to undertake effective road planning, maintenance, design, and rehabilitation efforts. The methods for gathering such data can be broadly classified into two categories: a) land-based methods, which encompass field inventory, mobile mapping, and image logging, and b) aerial-based methods, which involve satellite imagery, drones, and laser scanning. However, employing land-based techniques for extensive highway networks covering thousands of miles proves arduous and costly, and poses safety risks for crew members. Consequently, there exists a pressing need to develop more efficient methodologies for acquiring this data promptly, safely, and economically. Fortunately, with the increasing availability of high-resolution images and recent strides in computer vision and object detection technologies, automated extraction of roadway geometry features has become feasible.


Marginalization Consistent Mixture of Separable Flows for Probabilistic Irregular Time Series Forecasting

Yalavarthi, Vijaya Krishna, Scholz, Randolf, Madhusudhanan, Kiran, Born, Stefan, Schmidt-Thieme, Lars

arXiv.org Artificial Intelligence

Probabilistic forecasting models for joint distributions of targets in irregular time series are a heavily under-researched area in machine learning with, to the best of our knowledge, only three models researched so far: GPR, the Gaussian Process Regression model [16], TACTiS, the Transformer-Attentional Copulas for Time Series [14, 2] and ProFITi [43], a multivariate normalizing flow model based on invertible attention layers. While ProFITi, thanks to using multivariate normalizing flows, is the more expressive model with a better predictive performance, we will show that it suffers from marginalization inconsistency: it does not guarantee that the marginal distributions of a subset of variables in its predictive distributions coincide with the directly predicted distributions of these variables. Also, TACTiS does not provide any guarantees for marginalization consistency. We develop a novel probabilistic irregular time series forecasting model, Marginalization Consistent Mixtures of Separable Flows (moses), that mixes several normalizing flows with (i) Gaussian Processes with full covariance matrix as source distributions and (ii) a separable invertible transformation, aiming to combine the expressivity of normalizing flows with the marginalization consistency of Gaussians. In experiments on four different datasets we show that moses outperform other state-of-the-art marginalization consistent models, perform on par with ProFITi, but different from ProFITi, guarantees marginalization consistency.


Preference Optimization for Molecular Language Models

Park, Ryan, Theisen, Ryan, Sahni, Navriti, Patek, Marcel, Cichońska, Anna, Rahman, Rayees

arXiv.org Machine Learning

Molecular language modeling is an effective approach to generating novel chemical structures. However, these models do not \emph{a priori} encode certain preferences a chemist may desire. We investigate the use of fine-tuning using Direct Preference Optimization to better align generated molecules with chemist preferences. Our findings suggest that this approach is simple, efficient, and highly effective.


The First Parallel Corpora for Kurdish Sign Language

Kamal, Zina, Hassani, Hossein

arXiv.org Artificial Intelligence

Kurdish Sign Language (KuSL) is the natural language of the Kurdish Deaf people. We work on automatic translation between spoken Kurdish and KuSL. Sign languages evolve rapidly and follow grammatical rules that differ from spoken languages. Consequently,those differences should be considered during any translation. We proposed an avatar-based automatic translation of Kurdish texts in the Sorani (Central Kurdish) dialect into the Kurdish Sign language. We developed the first parallel corpora for that pair that we use to train a Statistical Machine Translation (SMT) engine. We tested the outcome understandability and evaluated it using the Bilingual Evaluation Understudy (BLEU). Results showed 53.8% accuracy. Compared to the previous experiments in the field, the result is considerably high. We suspect the reason to be the similarity between the structure of the two pairs. We plan to make the resources publicly available under CC BY-NC-SA 4.0 license on the Kurdish-BLARK (https://kurdishblark.github.io/).