Goto

Collaborating Authors

 Markazi Province


The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Neural Information Processing Systems

Today's best language models still struggle with hallucinations: factually incorrect generations, which impede their ability to reliably retrieve information seen during training. The reversal curse, where models cannot recall information when probed in a different order than was encountered during training, exemplifies this in information retrieval. We reframe the reversal curse as a factorization curse -- a failure of models to learn the same joint distribution under different factorizations. Through a series of controlled experiments with increasing levels of realism including WikiReversal, a setting we introduce to closely simulate a knowledge intensive finetuning task, we find that the factorization curse is an inherent failure of the next-token prediction objective used in popular large language models. Moreover, we demonstrate reliable information retrieval cannot be solved with scale, reversed tokens, or even naive bidirectional-attention training. Consequently, various approaches to finetuning on specialized data would necessarily provide mixed results on downstream tasks, unless the model has already seen the right sequence of tokens. Across five tasks of varying levels of complexity, our results uncover a promising path forward: factorization-agnostic objectives can significantly mitigate the reversal curse and hint at improved knowledge storage and planning capabilities.



The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Kitouni, Ouail, Nolte, Niklas, Bouchacourt, Diane, Williams, Adina, Rabbat, Mike, Ibrahim, Mark

arXiv.org Artificial Intelligence

Today's best language models still struggle with hallucinations: factually incorrect generations, which impede their ability to reliably retrieve information seen during training. The reversal curse, where models cannot recall information when probed in a different order than was encountered during training, exemplifies this in information retrieval. We reframe the reversal curse as a factorization curse - a failure of models to learn the same joint distribution under different factorizations. Through a series of controlled experiments with increasing levels of realism including WikiReversal, a setting we introduce to closely simulate a knowledge intensive finetuning task, we find that the factorization curse is an inherent failure of the next-token prediction objective used in popular large language models. Moreover, we demonstrate reliable information retrieval cannot be solved with scale, reversed tokens, or even naive bidirectional-attention training. Consequently, various approaches to finetuning on specialized data would necessarily provide mixed results on downstream tasks, unless the model has already seen the right sequence of tokens. Across five tasks of varying levels of complexity, our results uncover a promising path forward: factorization-agnostic objectives can significantly mitigate the reversal curse and hint at improved knowledge storage and planning capabilities.


Fake news detection using parallel BERT deep neural networks

Farokhian, Mahmood, Rafe, Vahid, Veisi, Hadi

arXiv.org Artificial Intelligence

Fake news is a growing challenge for social networks and media. Detection of fake news always has been a problem for many years, but after the evolution of social networks and increasing speed of news dissemination in recent years has been considered again. There are several approaches to solving this problem, one of which is to detect fake news based on its text style using deep neural networks. In recent years, one of the most used forms of deep neural networks for natural language processing is transfer learning with transformers. BERT is one of the most promising transformers who outperforms other models in many NLP benchmarks. This article, we introduce MWPBert, which uses two parallel BERT networks to perform veracity detection on full-text news articles. One of the BERT networks encodes news headline, and another encodes news body. Since the input length of the BERT network is limited and constant and the news body is usually a long text, we cannot fed the whole news text into the BERT. Therefore, using the MaxWorth algorithm, we selected the part of the news text that is more valuable for fact-checking, and fed it into the BERT network. Finally, we encode the output of the two BERT networks to an output network to classify the news. The experiment results showed that the proposed model outperformed previous models in terms of accuracy and other performance measures.


Recent advancement in Disease Diagnostic using machine learning: Systematic survey of decades, comparisons, and challenges

Tajidini, Farzaneh, Kheiri, Mohammad-Javad

arXiv.org Artificial Intelligence

Computer-aided diagnosis (CAD), a vibrant medical imaging research field, is expanding quickly. Because errors in medical diagnostic systems might lead to seriously misleading medical treatments, major efforts have been made in recent years to improve computer-aided diagnostics applications. The use of machine learning in computer-aided diagnosis is crucial. A simple equation may result in a false indication of items like organs. Therefore, learning from examples is a vital component of pattern recognition. Pattern recognition and machine learning in the biomedical area promise to increase the precision of disease detection and diagnosis. They also support the decision-making process's objectivity. Machine learning provides a practical method for creating elegant and autonomous algorithms to analyze high-dimensional and multimodal bio-medical data. This review article examines machine-learning algorithms for detecting diseases, including hepatitis, diabetes, liver disease, dengue fever, and heart disease. It draws attention to the collection of machine learning techniques and algorithms employed in studying conditions and the ensuing decision-making process.


Despite Iranian attack killing American abroad, Biden pursues nuclear deal with ayatollah's regime

FOX News

National security analyst Dr. Rebecca Grant joins "Fox News Live" to weigh in on what steps President Biden can take to rein in Iranian-backed militia strikes on U.S. bases in Syria. The Iranian regime's recent drone attack on an American base in Syria, which resulted in the murder of a U.S. contractor, has not deterred the Biden administration from pursuing the controversial nuclear pact with Tehran that would dramatically enrich the coffers of the Islamic Republic. The White House remains wedded to the Joint Comprehensive Plan of Action (JCPOA) – the formal name for the Iran nuclear deal – that "would allow Tehran to access up to $275 billion in financial benefits during its first year in effect and $1 trillion by 2030." Veteran Iran experts have argued that the JCPOA is no longer tenable because it is riddled with serious defects about deterring Iran's malign behavior, including failing to stop Tehran's ongoing drone attacks against Americans. Iran's regime was caught enriching uranium to 84% purity in February – just 6% short of weapons-grade uranium for a nuclear weapon.


Modeling bank performance: A novel fuzzy two-stage DEA approach

Izadikhah, Mohammad

arXiv.org Artificial Intelligence

Evaluating the banks' performance has always been of interest due to their crucial role in the economic development of each country. Data envelopment analysis (DEA) has been widely used for measuring the performance of bank branches. In the conventional DEA approach, decision making units (DMUs) are regarded as black boxes that transform sets of inputs into sets of outputs without considering the internal interactions taking place within each DMU. Two-stage DEA models are designed to overcome this shortfall. Thus, this paper presented a new two-stage DEA model based on a modification on Enhanced Russell Model. On the other hand, in many situations, such as in a manufacturing system, a production process or a service system, inputs, intermediates and outputs can be given as a fuzzy variable. The main aim of this paper is to build and present a new fuzzy two-stage DEA model for measuring the efficiency of 15 branches of Melli bank in Hamedan province.


Proposing a two-step Decision Support System (TPIS) based on Stacked ensemble classifier for early and low cost (step-1) and final (step-2) differential diagnosis of Mycobacterium Tuberculosis from non-tuberculosis Pneumonia

Khatibi, Toktam, Farahani, Ali, Sarmadian, Hossein

arXiv.org Machine Learning

Background: Mycobacterium Tuberculosis (TB) is an infectious bacterial disease presenting similar symptoms to pneumonia; therefore, differentiating between TB and pneumonia is challenging. Therefore, the main aim of this study is proposing an automatic method for differential diagnosis of TB from Pneumonia. Methods: In this study, a two-step decision support system named TPIS is proposed for differential diagnosis of TB from pneumonia based on stacked ensemble classifiers. The first step of our proposed model aims at early diagnosis based on low-cost features including demographic characteristics and patient symptoms (including 18 features). TPIS second step makes the final decision based on the meta features extracted in the first step, the laboratory tests and chest radiography reports. This retrospective study considers 199 patient medical records for patients suffering from TB or pneumonia, which has been registered in a hospital in Arak, Iran. Results: Experimental results show that TPIS outperforms the compared machine learning methods for early differential diagnosis of pulmonary tuberculosis from pneumonia with AUC of 90.26 and accuracy of 91.37 and final decision making with AUC of 92.81 and accuracy of 93.89. Conclusions: The main advantage of early diagnosis is beginning the treatment procedure for confidently diagnosed patients as soon as possible and preventing latency in treatment. Therefore, early diagnosis reduces the maturation of late treatment of both diseases.


Optimal Network Topology for Effective Collective Response

Mateo, David, Horsevad, Nikolaj, Hassani, Vahid, Chamanbaz, Mohammadreza, Bouffanais, Roland

arXiv.org Artificial Intelligence

Natural, social, and artificial multi-agent systems usually operate in dynamic environments, where the ability to respond to changing circumstances is a crucial feature. An effective collective response requires suitable information transfer among agents, and thus is critically dependent on the agents' interaction network. In order to investigate the influence of the network topology on collective response, we consider an archetypal model of distributed decision-making---the leader-follower linear consensus---and study the collective capacity of the system to follow a dynamic driving signal (the "leader") for a range of topologies and system sizes. The analysis reveals a nontrivial relationship between optimal topology and frequency of the driving signal. Interestingly, the response is optimal when each individual interacts with a certain number of agents which decreases monotonically with the frequency and, for large enough systems, is independent of the size of the system. This phenomenology is investigated in experiments of collective motion using a swarm of land robots. The emergent collective response to both a slow- and a fast-changing leader is measured and analyzed for a range of interaction topologies. These results have far-reaching practical implications for the design and understanding of distributed systems, since they highlight that a dynamic rewiring of the interaction network is paramount to the effective collective operations of multi-agent systems at different time-scales.