AITopics | Mikolov, Tomas

Collaborating Authors

Mikolov, Tomas

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time

Herel, David, Bartek, Vojtech, Mikolov, Tomas

arXiv.org Artificial IntelligenceSep-20-2024

Who is the US President? The answer changes depending on when the question is asked. While large language models (LLMs) are evaluated on various reasoning tasks, they often miss a crucial dimension: time. In real-world scenarios, the correctness of answers is frequently tied to temporal context. In this paper, we introduce a novel dataset designed to rigorously test LLMs' ability to handle time-sensitive facts. Our benchmark offers a systematic way to measure how well LLMs align their knowledge with the correct time context, filling a key gap in current evaluation methods and offering a valuable tool for improving real-world applicability in future models.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2409.13338

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry: Government > Regional Government > North America Government > United States Government (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Thinking Tokens for Language Modeling

Herel, David, Mikolov, Tomas

arXiv.org Artificial IntelligenceMay-14-2024

How much is 56 times 37? Language models often make mistakes in these types of difficult calculations. This is usually explained by their inability to perform complex reasoning. Since language models rely on large training sets and great memorization capability, naturally they are not equipped to run complex calculations. However, one can argue that humans also cannot perform this calculation immediately and require a considerable amount of time to construct the solution. In order to enhance the generalization capability of language models, and as a parallel to human behavior, we propose to use special 'thinking tokens' which allow the model to perform much more calculations whenever a complex problem is encountered.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2405.08644

Country:

Europe (0.15)
North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

Collapse of Self-trained Language Models

Herel, David, Mikolov, Tomas

arXiv.org Artificial IntelligenceApr-2-2024

In various fields of knowledge creation, including science, new ideas often build on pre-existing information. In this work, we explore this concept within the context of language models. Specifically, we explore the potential of self-training models on their own outputs, akin to how humans learn and build on their previous thoughts and actions. While this approach is intuitively appealing, our research reveals its practical limitations. We find that extended self-training of the GPT-2 model leads to a significant degradation in performance, resulting in repetitive and collapsed token output.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2404.02305

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

Large Language Models: A Survey

Minaee, Shervin, Mikolov, Tomas, Nikzad, Narjes, Chenaghlu, Meysam, Socher, Richard, Amatriain, Xavier, Gao, Jianfeng

arXiv.org Artificial IntelligenceFeb-9-2024

Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data, as predicted by scaling laws \cite{kaplan2020scaling,hoffmann2022training}. The research area of LLMs, while very recent, is evolving rapidly in many different ways. In this paper, we review some of the most prominent LLMs, including three popular LLM families (GPT, LLaMA, PaLM), and discuss their characteristics, contributions and limitations. We also give an overview of techniques developed to build, and augment LLMs. We then survey popular datasets prepared for LLM training, fine-tuning, and evaluation, review widely used LLM evaluation metrics, and compare the performance of several popular LLMs on a set of representative benchmarks. Finally, we conclude the paper by discussing open challenges and future research directions.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.06196

Country:

Europe (1.00)
North America > United States > Texas (0.14)
North America > United States > Pennsylvania (0.14)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)
Education > Curriculum > Subject-Specific Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Advancing State of the Art in Language Modeling

Herel, David, Mikolov, Tomas

arXiv.org Artificial IntelligenceNov-28-2023

Generalization is arguably the most important goal of statistical language modeling research. Publicly available benchmarks and papers published with an open-source code have been critical to advancing the field. However, it is often very difficult, and sometimes even impossible, to reproduce the results fully as reported in publications. In this paper, we propose a simple framework that should help advance the state of the art in language modeling in terms of generalization. We propose to publish not just the code, but also probabilities on dev and test sets with future publications so that one can easily add the new model into an ensemble. This has crucial advantages: it is much easier to determine whether a newly proposed model is actually complementary to the current baseline. Therefore, instead of inventing new names for the old tricks, the scientific community can advance faster. Finally, this approach promotes diversity of ideas: one does not need to create an individual model that is the new state of the art to attract attention; it will be sufficient to develop a new model that learns patterns which other models do not. Thus, even a suboptimal model can be found to have value. Remarkably, our approach has yielded new state-of-the-art results across various language modeling benchmarks up to 10%.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.03735

Country: Europe > Belgium (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Preserving Semantics in Textual Adversarial Attacks

Herel, David, Cisneros, Hugo, Mikolov, Tomas

arXiv.org Artificial IntelligenceOct-5-2023

The growth of hateful online content, or hate speech, has been associated with a global increase in violent crimes against minorities [23]. Harmful online content can be produced easily, automatically and anonymously. Even though, some form of auto-detection is already achieved through text classifiers in NLP, they can be fooled by adversarial attacks. To strengthen existing systems and stay ahead of attackers, we need better adversarial attacks. In this paper, we show that up to 70% of adversarial examples generated by adversarial attacks should be discarded because they do not preserve semantics. We address this core weakness and propose a new, fully supervised sentence embedding technique called Semantics-Preserving-Encoder (SPE). Our method outperforms existing sentence encoders used in adversarial attacks by achieving 1.2x - 5.1x better real attack success rate. We release our code as a plugin that can be used in any existing adversarial attack to improve its quality and speed up its execution.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.3233/FAIA230376

2211.04205

Country:

Europe > Italy (0.14)
Oceania > Australia (0.14)
North America > United States (0.14)
Europe > Belgium (0.14)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
(2 more...)

Add feedback

Visualizing computation in large-scale cellular automata

Cisneros, Hugo, Sivic, Josef, Mikolov, Tomas

arXiv.org Artificial IntelligenceApr-1-2021

Emergent processes in complex systems such as cellular automata can perform computations of increasing complexity, and could possibly lead to artificial evolution. Such a feat would require scaling up current simulation sizes to allow for enough computational capacity. Understanding complex computations happening in cellular automata and other systems capable of emergence poses many challenges, especially in large-scale systems. We propose methods for coarse-graining cellular automata based on frequency analysis of cell states, clustering and autoencoders. These innovative techniques facilitate the discovery of large-scale structure formation and complexity analysis in those systems. They emphasize interesting behaviors in elementary cellular automata while filtering out background patterns. Moreover, our methods reduce large 2D automata to smaller sizes and enable identifying systems that behave interestingly at multiple scales.

artificial intelligence, cellular automata, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1162/isal_a_00277

2104.01008

Country:

Europe (0.14)
Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Classification of Complex Systems Based on Transients

Hudcova, Barbora, Mikolov, Tomas

arXiv.org Artificial IntelligenceAug-31-2020

In order to develop systems capable of modeling artificial life, we need to identify, which systems can produce complex behavior. We present a novel classification method applicable to any class of deterministic discrete space and time dynamical systems. The method distinguishes between different asymptotic behaviors of a system's average computation time before entering a loop. When applied to elementary cellular automata, we obtain classification results, which correlate very well with Wolfram's manual classification. Further, we use it to classify 2D cellular automata to show that our technique can easily be applied to more complex models of computation. We believe this classification method can help to develop systems, in which complex structures emerge.

artificial intelligence, evolutionary algorithm, log 2 1, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1162/isal_a_00260

2008.13503

Country: North America > United States > New York (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.48)

Add feedback

Efficient Large-Scale Multi-Modal Classification

Kiela, Douwe (Facebook AI Research) | Grave, Edouard (Facebook AI Research) | Joulin, Armand (Facebook AI Research) | Mikolov, Tomas (Facebook AI Research)

AAAI ConferencesFeb-8-2018

While the incipient internet was largely text-based, the modern digital world is becoming increasingly multi-modal. Here, we examine multi-modal classification where one modality is discrete, e.g. text, and the other is continuous, e.g. visual representations transferred from a convolutional neural network. In particular, we focus on scenarios where we have to be able to classify large quantities of data quickly. We investigate various methods for performing multi-modal fusion and analyze their trade-offs in terms of classification accuracy and computational efficiency. Our findings indicate that the inclusion of continuous information improves performance over text-only on a range of multi-modal classification tasks, even with simple fusion methods. In addition, we experiment with discretizing the continuous features in order to speed up and simplify the fusion process even further. Our results show that fusion with discretized features outperforms text-only classification, at a fraction of the computational cost of full multi-modal fusion, with the additional benefit of improved interpretability.

deep learning, neural network, proceedings, (21 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Fast Linear Model for Knowledge Graph Embeddings

Joulin, Armand, Grave, Edouard, Bojanowski, Piotr, Nickel, Maximilian, Mikolov, Tomas

arXiv.org Machine LearningOct-30-2017

This paper shows that a simple baseline based on a Bag-of-Words (BoW) representation learns surprisingly good knowledge graph embeddings. By casting knowledge base completion and question answering as supervised classification problems, we observe that modeling co-occurences of entities and relations leads to state-of-the-art performance with a training time of a few minutes using the open sourced library fastText.

artificial intelligence, arxiv preprint arxiv, text processing, (19 more...)

arXiv.org Machine Learning

1710.10881

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.71)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)

Add feedback