AITopics | elmo

Collaborating Authors

elmo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)

Mariya Toneva, Leila Wehbe

Neural Information Processing SystemsFeb-12-2026, 15:07:17 GMT

Weusebrainimagingrecordings ofsubjectsreading complex natural text to interpret word and sequence embeddings from4 recent NLP models - ELMo, USE, BERT and Transformer-XL. We study how their representations differ across layer depth, contextlength, and attention type.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre: Research Report (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.98)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

ELMO: Efficiency via Low-precision and Peak Memory Optimization in Large Output Spaces

Zhang, Jinbin, Ullah, Nasib, Schultheis, Erik, Babbar, Rohit

arXiv.org Artificial IntelligenceOct-14-2025

Large output spaces, also referred to as Extreme multilabel classification (XMC), is a setting that arises, e.g., in large-scale tagging and product-to-product recommendation, and is characterized by the number of labels ranging from hundreds of thousands to millions. This means that the linear classification head, usually only a tiny fraction of the overall model, turns into the main driver for compute and memory demand. Current state-of-the-art XMC methods predominantly rely on FP16-FP32 mixed-precision training, which we show can be unstable, and inefficient in terms of memory usage and computational overhead. Meanwhile, existing low-precision methods typically retain higher precision for the classification layer. In this work, we propose ELMO, a pure low-precision training framework for XMC models using BFloat16 and Float8 data types. By leveraging Kahan summation and stochastic rounding, we demonstrate that XMC models can be effectively trained entirely in Float8, without relying on single-precision master weights or tensor scaling. Low-precision training, combined with our proposed memory optimizations -- gradient fusion and chunking -- enables significant reductions in GPU memory usage. For example, we train a 3-million-label XMC model with only 6.6 GiB of GPU memory, compared to the 39.7 GiB required by the optimized SOTA method, Renee without compromising accuracy.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.11168

Country:

Europe (0.28)
North America (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Hardware (0.69)
(3 more...)

Add feedback

Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)

Mariya Toneva, Leila Wehbe

Neural Information Processing SystemsOct-3-2025, 00:33:26 GMT

We study how their representations differ across layer depth, context length, and attention type.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America (0.46)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (0.71)
Health & Medicine > Diagnostic Medicine > Imaging (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

Sesame Street puppet Elmo's X account posts anti-Jewish rant after hacking

Al JazeeraJul-15-2025, 01:20:49 GMT

The makers of Sesame Street have deleted a slew of offensive social media posts after hackers hijacked the puppet Elmo's X account to launch a tirade about Jews and Jeffrey Epstein. The posts on Elmo's account on Sunday called for the extermination of Jewish people, referred to United States President Donald Trump as a "puppet" of Israeli Prime Minister Benjamin Netanyahu and demanded the release of law enforcement files about Epstein, the accused sex trafficker who died in 2019. The posts attracted a flurry of attention online before being deleted a short time after they were uploaded on Sunday. "Elmo's X account was compromised by an unknown hacker who posted disgusting messages, including antisemitic and racist posts," a spokesperson for the Sesame Workshop told Al Jazeera in a statement on Monday. "The account has since been secured."

account post anti-jewish rant, artificial intelligence, elmo, (3 more...)

Al Jazeera

Country: North America > United States (0.98)

Industry: Government > Regional Government > North America Government > United States Government (0.60)

Technology: Information Technology > Artificial Intelligence (0.38)

Add feedback

ELMO: Enhanced Real-time LiDAR Motion Capture through Upsampling

Jang, Deok-Kyeong, Yang, Dongseok, Jang, Deok-Yun, Choi, Byeoli, Shin, Donghoon, Lee, Sung-hee

arXiv.org Artificial IntelligenceOct-11-2024

This paper introduces ELMO, a real-time upsampling motion capture framework designed for a single LiDAR sensor. Modeled as a conditional autoregressive transformer-based upsampling motion generator, ELMO achieves 60 fps motion capture from a 20 fps LiDAR point cloud sequence. The key feature of ELMO is the coupling of the self-attention mechanism with thoughtfully designed embedding modules for motion and point clouds, significantly elevating the motion quality. To facilitate accurate motion capture, we develop a one-time skeleton calibration model capable of predicting user skeleton offsets from a single-frame point cloud. Additionally, we introduce a novel data augmentation technique utilizing a LiDAR simulator, which enhances global root tracking to improve environmental understanding. To demonstrate the effectiveness of our method, we compare ELMO with state-of-the-art methods in both image-based and point cloud-based motion capture. We further conduct an ablation study to validate our design principles. ELMO's fast inference time makes it well-suited for real-time applications, exemplified in our demo video featuring live streaming and interactive gaming scenarios. Furthermore, we contribute a high-quality LiDAR-mocap synchronized dataset comprising 20 different subjects performing a range of motions, which can serve as a valuable resource for future research. The dataset and evaluation code are available at {\blue \url{https://movin3d.github.io/ELMO_SIGASIA2024/}}

artificial intelligence, machine learning, point cloud, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3687991

2410.06963

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Reviews: GLoMo: Unsupervised Learning of Transferable Relational Graphs

Neural Information Processing SystemsOct-7-2024, 10:58:40 GMT

This paper presents a method to transfer graph structures learned on unlabeled data to downstream tasks, which is a conceptual shift from existing research that aims to transfer features (e.g., embeddings). The method consists of jointly training a feature and graph predictor using an unsupervised objective (which are decoupled) and then extracting only the output of the graph predictor for downstream tasks, where it is multiplicatively applied to arbitrary features. The method yields small improvements on a variety of NLP and vision tasks, and the qualitative analysis of the learned graphs does not convince me that it learns "meaningful" substructures. Overall, however, the paper has a compelling and promising idea (graph transfer), and it seems like there is room to improve on its results, so I'm a weak accept. Detailed comments: - Is "unsupervisedly" a word? It sounds weird... - The objective function in eq 3 is interesting and could have potential uses outside of just graph induction, as it seems especially powerful from the ablations in table 2...

feature predictor, graph predictor, predictor, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.71)

Add feedback

Decoding Probing: Revealing Internal Linguistic Structures in Neural Language Models using Minimal Pairs

He, Linyang, Chen, Peili, Nie, Ercong, Li, Yuanning, Brennan, Jonathan R.

arXiv.org Artificial IntelligenceMar-25-2024

Inspired by cognitive neuroscience studies, we introduce a novel `decoding probing' method that uses minimal pairs benchmark (BLiMP) to probe internal linguistic characteristics in neural language models layer by layer. By treating the language model as the `brain' and its representations as `neural activations', we decode grammaticality labels of minimal pairs from the intermediate layers' representations. This approach reveals: 1) Self-supervised language models capture abstract linguistic structures in intermediate layers that GloVe and RNN language models cannot learn. 2) Information about syntactic grammaticality is robustly captured through the first third layers of GPT-2 and also distributed in later layers. As sentence complexity increases, more layers are required for learning grammatical capabilities. 3) Morphological and semantics/syntax interface-related features are harder to capture than syntax. 4) For Transformer-based models, both embeddings and attentions capture grammatical features but show distinct patterns. Different attention heads exhibit similar tendencies toward various linguistic phenomena, but with varied contributions.

gpt-2 xl, information, representation, (15 more...)

arXiv.org Artificial Intelligence

2403.17299

Country:

North America > United States > Michigan (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

A Context-Sensitive Word Embedding Approach for The Detection of Troll Tweets

Yilmaz, Seyhmus, Zavrak, Sultan

arXiv.org Artificial IntelligenceJun-7-2023

In this study, we aimed to address the growing concern of trolling behavior on social media by developing and evaluating a set of model architectures for the automatic detection of troll tweets. Utilizing deep learning techniques and pre-trained word embedding methods such as BERT, ELMo, and GloVe, we evaluated the performance of each architecture using metrics such as classification accuracy, F1 score, AUC, and precision. Our results indicate that BERT and ELMo embedding methods performed better than the GloVe method, likely due to their ability to provide contextualized word embeddings that better capture the nuances and subtleties of language use in online social media. Additionally, we found that CNN and GRU encoders performed similarly in terms of F1 score and AUC, suggesting their effectiveness in extracting relevant information from input text. The best-performing method was found to be an ELMo-based architecture that employed a GRU classifier, with an AUC score of 0.929. This research highlights the importance of utilizing contextualized word embeddings and appropriate encoder methods in the task of troll tweet detection, which can assist social-based systems in improving their performance in identifying and addressing trolling behavior on their platforms.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2207.0823

Country:

North America > United States (0.14)
Asia > Middle East > Republic of Türkiye > Duzce Province > Duzce (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.68)
Health & Medicine > Therapeutic Area (0.46)
Government > Regional Government (0.46)
Media > News (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On "Scientific Debt" in NLP: A Case for More Rigour in Language Model Pre-Training Research

Nityasya, Made Nindyatama, Wibowo, Haryo Akbarianto, Aji, Alham Fikri, Winata, Genta Indra, Prasojo, Radityo Eko, Blunsom, Phil, Kuncoro, Adhiguna

arXiv.org Artificial IntelligenceJun-5-2023

This evidence-based position paper critiques current research practices within the language model pre-training literature. Despite rapid recent progress afforded by increasingly better pre-trained language models (PLMs), current PLM research practices often conflate different possible sources of model improvement, without conducting proper ablation studies and principled comparisons between different models under comparable conditions. These practices (i) leave us ill-equipped to understand which pre-training approaches should be used under what circumstances; (ii) impede reproducibility and credit assignment; and (iii) render it difficult to understand: "How exactly does each factor contribute to the progress that we have today?" We provide a case in point by revisiting the success of BERT over its baselines, ELMo and GPT-1, and demonstrate how -- under comparable conditions where the baselines are tuned to a similar extent -- these baselines (and even-simpler variants thereof) can, in fact, achieve competitive or better performance than BERT. These findings demonstrate how disentangling different factors of model improvements can lead to valuable new insights. We conclude with recommendations for how to encourage and incentivize this line of work, and accelerate progress towards a better and more systematic understanding of what factors drive the progress of our foundation models today.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2306.0287

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Comprehensive Empirical Evaluation of Existing Word Embedding Approaches

Zaland, Obaidullah, Abulaish, Muhammad, Fazil, Mohd.

arXiv.org Artificial IntelligenceMar-13-2023

Vector-based word representations help countless Natural Language Processing (NLP) tasks capture both semantic and syntactic regularities of the language. In this paper, we present the characteristics of existing word embedding approaches and analyze them with regards to many classification tasks. We categorize the methods into two main groups - Traditional approaches mostly use matrix factorization to produce word representations, and they are not able to capture the semantic and syntactic regularities of the language very well. Neural-Network based approaches, on the other hand, can capture sophisticated regularities of the language and preserve the word relationships in the generated word representations. We report experimental results on multiple classification tasks and highlight the scenarios where one approach performs better than the rest.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.07196

Country:

North America > United States > Georgia > Chatham County > Savannah (0.04)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
Europe > Middle East > Malta > Port Region > Southern Harbour District > Valletta (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)

Add feedback