AITopics | Mousavi, Ali

Collaborating Authors

Mousavi, Ali

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Microsoft, null, :, null, Abouelenin, Abdelrahman, Ashfaq, Atabak, Atkinson, Adam, Awadalla, Hany, Bach, Nguyen, Bao, Jianmin, Benhaim, Alon, Cai, Martin, Chaudhary, Vishrav, Chen, Congcong, Chen, Dong, Chen, Dongdong, Chen, Junkun, Chen, Weizhu, Chen, Yen-Chun, Chen, Yi-ling, Dai, Qi, Dai, Xiyang, Fan, Ruchao, Gao, Mei, Gao, Min, Garg, Amit, Goswami, Abhishek, Hao, Junheng, Hendy, Amr, Hu, Yuxuan, Jin, Xin, Khademi, Mahmoud, Kim, Dongwoo, Kim, Young Jin, Lee, Gina, Li, Jinyu, Li, Yunsheng, Liang, Chen, Lin, Xihui, Lin, Zeqi, Liu, Mengchen, Liu, Yang, Lopez, Gilsinia, Luo, Chong, Madan, Piyush, Mazalov, Vadim, Mitra, Arindam, Mousavi, Ali, Nguyen, Anh, Pan, Jing, Perez-Becker, Daniel, Platin, Jacob, Portet, Thomas, Qiu, Kai, Ren, Bo, Ren, Liliang, Roy, Sambuddha, Shang, Ning, Shen, Yelong, Singhal, Saksham, Som, Subhojit, Song, Xia, Sych, Tetyana, Vaddamanu, Praneetha, Wang, Shuohang, Wang, Yiming, Wang, Zhenghao, Wu, Haibin, Xu, Haoran, Xu, Weijian, Yang, Yifan, Yang, Ziyi, Yu, Donghan, Zabir, Ishmam, Zhang, Jianwen, Zhang, Li Lyna, Zhang, Yunan, Zhou, Xiren

arXiv.org Artificial IntelligenceMar-7-2025

We introduce Phi-4-Mini and Phi-4-Multimodal, compact yet highly capable language and multimodal models. Phi-4-Mini is a 3.8-billion-parameter language model trained on high-quality web and synthetic data, significantly outperforming recent open-source models of similar size and matching the performance of models twice its size on math and coding tasks requiring complex reasoning. This achievement is driven by a carefully curated synthetic data recipe emphasizing high-quality math and coding datasets. Compared to its predecessor, Phi-3.5-Mini, Phi-4-Mini features an expanded vocabulary size of 200K tokens to better support multilingual applications, as well as group query attention for more efficient long-sequence generation. Phi-4-Multimodal is a multimodal model that integrates text, vision, and speech/audio input modalities into a single model. Its novel modality extension approach leverages LoRA adapters and modality-specific routers to allow multiple inference modes combining various modalities without interference. For example, it now ranks first in the OpenASR leaderboard to date, although the LoRA component of the speech/audio modality has just 460 million parameters. Phi-4-Multimodal supports scenarios involving (vision + language), (vision + speech), and (speech/audio) inputs, outperforming larger vision-language and speech-language models on a wide range of tasks. Additionally, we experiment to further train Phi-4-Mini to enhance its reasoning capabilities. Despite its compact 3.8-billion-parameter size, this experimental version achieves reasoning performance on par with or surpassing significantly larger models, including DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Llama-8B.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.01743

Country:

Europe (0.92)
North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (0.92)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Time Sensitive Knowledge Editing through Efficient Finetuning

Ge, Xiou, Mousavi, Ali, Grave, Edouard, Joulin, Armand, Qian, Kun, Han, Benjamin, Arefiyan, Mostafa, Li, Yunyao

arXiv.org Artificial IntelligenceJun-6-2024

Large Language Models (LLMs) have demonstrated impressive capability in different tasks and are bringing transformative changes to many domains. However, keeping the knowledge in LLMs up-to-date remains a challenge once pretraining is complete. It is thus essential to design effective methods to both update obsolete knowledge and induce new knowledge into LLMs. Existing locate-and-edit knowledge editing (KE) method suffers from two limitations. First, the post-edit LLMs by such methods generally have poor capability in answering complex queries that require multi-hop reasoning. Second, the long run-time of such locate-and-edit methods to perform knowledge edits make it infeasible for large scale KE in practice. In this paper, we explore Parameter-Efficient Fine-Tuning (PEFT) techniques as an alternative for KE. We curate a more comprehensive temporal KE dataset with both knowledge update and knowledge injection examples for KE performance benchmarking. We further probe the effect of fine-tuning on a range of layers in an LLM for the multi-hop QA task. We find that PEFT performs better than locate-and-edit techniques for time-sensitive knowledge edits.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.04496

Country: Europe (0.14)

Genre: Research Report (0.82)

Industry: Automobiles & Trucks (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Entity Disambiguation via Fusion Entity Decoding

Wang, Junxiong, Mousavi, Ali, Attia, Omar, Pradeep, Ronak, Potdar, Saloni, Rush, Alexander M., Minhas, Umar Farooq, Li, Yunyao

arXiv.org Artificial IntelligenceMay-7-2024

Entity disambiguation (ED), which links the mentions of ambiguous entities to their referent entities in a knowledge base, serves as a core component in entity linking (EL). Existing generative approaches demonstrate improved accuracy compared to classification approaches under the standardized ZELDA benchmark. Nevertheless, generative approaches suffer from the need for large-scale pre-training and inefficient generation. Most importantly, entity descriptions, which could contain crucial information to distinguish similar entities from each other, are often overlooked. We propose an encoder-decoder model to disambiguate entities with more detailed entity descriptions. Given text and candidate entities, the encoder learns interactions between the text and each candidate entity, producing representations for each entity candidate. The decoder then fuses the representations of entity candidates together and selects the correct entity. Our experiments, conducted on various entity disambiguation benchmarks, demonstrate the strong and robust performance of this model, particularly +1.5% in the ZELDA benchmark compared with GENRE. Furthermore, we integrate this approach into the retrieval/reader framework and observe +1.5% improvements in end-to-end entity linking in the GERBIL benchmark compared with EntQA.

information retrieval, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2404.01626

Country:

Europe (0.67)
North America > United States > Wisconsin (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
(2 more...)

Add feedback

Construction of Paired Knowledge Graph-Text Datasets Informed by Cyclic Evaluation

Mousavi, Ali, Zhan, Xin, Bai, He, Shi, Peng, Rekatsinas, Theo, Han, Benjamin, Li, Yunyao, Pound, Jeff, Susskind, Josh, Schluter, Natalie, Ilyas, Ihab, Jaitly, Navdeep

arXiv.org Artificial IntelligenceSep-20-2023

Datasets that pair Knowledge Graphs (KG) and text together (KG-T) can be used to train forward and reverse neural models that generate text from KG and vice versa. However models trained on datasets where KG and text pairs are not equivalent can suffer from more hallucination and poorer recall. In this paper, we verify this empirically by generating datasets with different levels of noise and find that noisier datasets do indeed lead to more hallucination. We argue that the ability of forward and reverse models trained on a dataset to cyclically regenerate source KG or text is a proxy for the equivalence between the KG and the text in the dataset. Using cyclic evaluation we find that manually created WebNLG is much better than automatically created TeKGen and T-REx. Guided by these observations, we construct a new, improved dataset called LAGRANGE using heuristics meant to improve equivalence between KG and text and show the impact of each of the heuristics on cyclic evaluation. We also construct two synthetic datasets using large language models (LLMs), and observe that these are conducive to models that perform significantly well on cyclic generation of text, but less so on cyclic generation of KGs, probably because of a lack of a consistent underlying ontology.

artificial intelligence, large language model, natural language, (3 more...)

arXiv.org Artificial Intelligence

2309.11669

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.60)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

Growing and Serving Large Open-domain Knowledge Graphs

Ilyas, Ihab F., Lacerda, JP, Li, Yunyao, Minhas, Umar Farooq, Mousavi, Ali, Pound, Jeffrey, Rekatsinas, Theodoros, Sumanth, Chiraag

arXiv.org Artificial IntelligenceMay-16-2023

Applications of large open-domain knowledge graphs (KGs) to real-world problems pose many unique challenges. In this paper, we present extensions to Saga our platform for continuous construction and serving of knowledge at scale. In particular, we describe a pipeline for training knowledge graph embeddings that powers key capabilities such as fact ranking, fact verification, a related entities service, and support for entity linking. We then describe how our platform, including graph embeddings, can be leveraged to create a Semantic Annotation service that links unstructured Web documents to entities in our KG. Semantic annotation of the Web effectively expands our knowledge graph with edges to open-domain Web content which can be used in various search and ranking problems. Finally, we leverage annotated Web documents to drive Open-domain Knowledge Extraction. This targeted extraction framework identifies important coverage issues in the KG, then finds relevant data sources for target entities on the Web and extracts missing information to enrich the KG. Finally, we describe adaptations to our knowledge platform needed to construct and serve private personal knowledge on-device. This includes private incremental KG construction, cross-device knowledge sync, and global knowledge enrichment.

artificial intelligence, information retrieval, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3555041.3589672

2305.09464

Country: North America > United States (0.49)

Genre: Research Report (0.50)

Industry:

Media (0.94)
Leisure & Entertainment > Sports > Basketball (0.70)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Add feedback

High-Throughput Vector Similarity Search in Knowledge Graphs

Mohoney, Jason, Pacaci, Anil, Chowdhury, Shihabur Rahman, Mousavi, Ali, Ilyas, Ihab F., Minhas, Umar Farooq, Pound, Jeffrey, Rekatsinas, Theodoros

arXiv.org Artificial IntelligenceApr-4-2023

There is an increasing adoption of machine learning for encoding data into vectors to serve online recommendation and search use cases. As a result, recent data management systems propose augmenting query processing with online vector similarity search. In this work, we explore vector similarity search in the context of Knowledge Graphs (KGs). Motivated by the tasks of finding related KG queries and entities for past KG query workloads, we focus on hybrid vector similarity search (hybrid queries for short) where part of the query corresponds to vector similarity search and part of the query corresponds to predicates over relational attributes associated with the underlying data vectors. For example, given past KG queries for a song entity, we want to construct new queries for new song entities whose vector representations are close to the vector representation of the entity in the past KG query. But entities in a KG also have non-vector attributes such as a song associated with an artist, a genre, and a release date. Therefore, suggested entities must also satisfy query predicates over non-vector attributes beyond a vector-based similarity predicate. While these tasks are central to KGs, our contributions are generally applicable to hybrid queries. In contrast to prior works that optimize online queries, we focus on enabling efficient batch processing of past hybrid query workloads. We present our system, HQI, for high-throughput batch processing of hybrid queries. We introduce a workload-aware vector data partitioning scheme to tailor the vector index layout to the given workload and describe a multi-query optimization technique to reduce the overhead of vector similarity computations. We evaluate our methods on industrial workloads and demonstrate that HQI yields a 31x improvement in throughput for finding related KG queries compared to existing hybrid query processing approaches.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2304.01926

Country: North America > United States (0.68)

Genre:

Research Report (0.50)
Overview (0.46)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Adaptive Neuro Fuzzy Networks based on Quantum Subtractive Clustering

Mousavi, Ali, Jalali, Mehrdad, Yaghoubi, Mahdi

arXiv.org Artificial IntelligenceJan-26-2021

Data mining techniques can be used to discover useful patterns by exploring and analyzing data and it's feasible to synergitically combine machine learning tools to discover fuzzy classification rules.In this paper, an adaptive Neuro fuzzy network with TSK fuzzy type and an improved quantum subtractive clustering has been developed. Quantum clustering (QC) is an intuition from quantum mechanics which uses Schrodinger potential and time-consuming gradient descent method. The principle advantage and shortcoming of QC is analyzed and based on its shortcomings, an improved algorithm through a subtractive clustering method is proposed. Cluster centers represent a general model with essential characteristics of data which can be use as premise part of fuzzy rules.The experimental results revealed that proposed Anfis based on quantum subtractive clustering yielded good approximation and generalization capabilities and impressive decrease in the number of fuzzy rules and network output accuracy in comparison with traditional methods.

artificial intelligence, cluster center, fuzzy logic, (13 more...)

arXiv.org Artificial Intelligence

2102.0082

Country: Asia > Middle East > Iran (0.30)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders

Bennett, Andrew, Kallus, Nathan, Li, Lihong, Mousavi, Ali

arXiv.org Artificial IntelligenceJul-27-2020

Off-policy evaluation (OPE) in reinforcement learning is an important problem in settings where experimentation is limited, such as education and healthcare. But, in these very same settings, observed actions are often confounded by unobserved variables making OPE even more difficult. We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders, where states and actions can act as proxies for the unobserved confounders. We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data. Our method involves two stages. In the first, we show how to use proxies to estimate stationary distribution ratios, extending recent work on breaking the curse of horizon to the confounded setting. In the second, we show optimal balancing can be combined with such learned ratios to obtain policy value while avoiding direct modeling of reward functions. We establish theoretical guarantees of consistency, and benchmark our method empirically.

assumption, diabetes, optimization problem, (23 more...)

arXiv.org Artificial Intelligence

2007.13893

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Unsupervised Learning with Stein's Unbiased Risk Estimator

Metzler, Christopher A., Mousavi, Ali, Heckel, Reinhard, Baraniuk, Richard G.

arXiv.org Machine LearningMay-26-2018

Learning from unlabeled and noisy data is one of the grand challenges of machine learning. As such, it has seen a flurry of research with new ideas proposed continuously. In this work, we revisit a classical idea: Stein's Unbiased Risk Estimator (SURE). We show that, in the context of image recovery, SURE and its generalizations can be used to train convolutional neural networks (CNNs) for a range of image denoising and recovery problems {\em without any ground truth data.} Specifically, our goal is to reconstruct an image $x$ from a {\em noisy} linear transformation (measurement) of the image. We consider two scenarios: one where no additional data is available and one where we have measurements of other images that are drawn from the same noisy distribution as $x$, but have no access to the clean images. Such is the case, for instance, in the context of medical imaging, microscopy, and astronomy, where noise-less ground truth data is rarely available. We show that in this situation, SURE can be used to estimate the mean-squared-error loss associated with an estimate of $x$. Using this estimate of the loss, we train networks to perform denoising and compressed sensing recovery. In addition, we also use the SURE framework to partially explain and improve upon an intriguing results presented by Ulyanov et al. in "Deep Image Prior": that a network initialized with random weights and fit to a single noisy image can effectively denoise that image.

deep learning, neural network, training data, (18 more...)

arXiv.org Machine Learning

1805.10531

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Learned D-AMP: Principled Neural Network based Compressive Image Recovery

Metzler, Chris, Mousavi, Ali, Baraniuk, Richard

Neural Information Processing SystemsDec-31-2017

Compressive image recovery is a challenging problem that requires fast and accurate algorithms. Recently, neural networks have been applied to this problem with promising results. By exploiting massively parallel GPU processing architectures and oodles of training data, they can run orders of magnitude faster than existing techniques. However, these methods are largely unprincipled black boxes that are difficult to train and often-times specific to a single measurement matrix. It was recently demonstrated that iterative sparse-signal-recovery algorithms can be ``unrolled’' to form interpretable deep networks. Taking inspiration from this work, we develop a novel neural network architecture that mimics the behavior of the denoising-based approximate message passing (D-AMP) algorithm. We call this new network {\em Learned} D-AMP (LDAMP). The LDAMP network is easy to train, can be applied to a variety of different measurement matrices, and comes with a state-evolution heuristic that accurately predicts its performance. Most importantly, it outperforms the state-of-the-art BM3D-AMP and NLR-CS algorithms in terms of both accuracy and run time. At high resolutions, and when used with sensing matrices that have fast implementations, LDAMP runs over $50\times$ faster than BM3D-AMP and hundreds of times faster than NLR-CS.

algorithm, deep learning, neural network, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback