AITopics | merlot

Collaborating Authors

merlot

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MERLOT: MultimodalNeuralScriptKnowledgeModels

Neural Information Processing SystemsFeb-11-2026, 02:47:50 GMT

By pretraining with a mix of both framelevel (spatial) and video-level (temporal) objectives, our model not only learns to match images to temporally corresponding words, but also to contextualize what is happening globally over time.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Industry:

Education (0.47)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (0.43)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MERLOT: Multimodal Neural Script Knowledge Models

Neural Information Processing SystemsDec-24-2025, 21:33:25 GMT

As humans, we understand events in the visual world contextually, performing multimodal reasoning across time to make inferences about the past, present, and future. We introduce MERLOT, a model that learns multimodal script knowledge by watching millions of YouTube videos with transcribed speech -- in an entirely label-free, self-supervised manner. By pretraining with a mix of both frame-level (spatial) and video-level (temporal) objectives, our model not only learns to match images to temporally corresponding words, but also to contextualize what is happening globally over time. As a result, MERLOT exhibits strong out-of-the-box representations of temporal commonsense, and achieves state-of-the-art performance on 12 different video QA datasets when finetuned. It also transfers well to the world of static images, allowing models to reason about the dynamic context behind visual scenes. On Visual Commonsense Reasoning, MERLOT~answers questions correctly with 80.6\% accuracy, outperforming state-of-the-art models of similar size by over 3\%, even those that make heavy use of auxiliary supervised data (like object bounding boxes).Ablation analyses demonstrate the complementary importance of: 1) training on videos versus static images; 2) scaling the magnitude and diversity of the pretraining video corpus; and 3) using diverse objectives that encourage full-stack multimodal reasoning, from the recognition to cognition level.

artificial intelligence, name change, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.39)

Add feedback

Supplemental Material

Neural Information Processing SystemsAug-17-2025, 07:36:12 GMT

We found channels using Y ouTube's auto-generated'topic' pages, corresponding to entries in

data mining, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (0.46)

Industry:

Media (1.00)
Leisure & Entertainment > Sports > Tennis (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining (0.67)

Add feedback

c6d4eb15f1e84a36eff58eca3627c82e-Paper.pdf

Neural Information Processing SystemsAug-17-2025, 07:36:08 GMT

artificial intelligence, machine learning, natural language, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(2 more...)

Genre: Research Report (0.46)

Industry:

Education (0.94)
Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

MERLOT: Multimodal Neural Script Knowledge Models

Neural Information Processing SystemsJan-19-2025, 02:50:35 GMT

merlot, multimodal neural script knowledge model, multimodal reasoning, (2 more...)

Neural Information Processing Systems

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (0.92)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (0.40)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.40)

Add feedback

MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification

Chen, Yuxuan, Li, Rongpeng, Zhao, Zhifeng, Zhang, Honggang

arXiv.org Artificial IntelligenceNov-19-2024

We present MERLOT, a scalable mixture-of-expert (MoE) based refinement of distilled large language model optimized for encrypted traffic classification. By applying model distillation techniques in a teacher-student paradigm, compact models derived from GPT-2-base retain high classification accuracy while minimizing computational costs. These models function as specialized experts in an MoE architecture, dynamically assigned via a gating network. Unlike generation-based methods, our approach directly classifies encrypted traffic using the final decoder token with contextual feature embedding as input. Experiments on 10 datasets show superior or competitive performance over the state-of-the-art models while significantly reducing resource demands, underscoring its effectiveness and robustness.

classification, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2411.13004

Country:

Asia > China > Zhejiang Province > Hangzhou (0.05)
Asia > Macao (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Information Technology > Security & Privacy (0.68)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Multimodal models are fast becoming a reality -- consequences be damned

#artificialintelligenceDec-21-2021, 22:35:05 GMT

Roughly a year ago, VentureBeat wrote about progress in the AI and machine learning field toward developing multimodal models, or models that can understand the meaning of text, videos, audio, and images together in context. Back then, the work was in its infancy and faced formidable challenges, not least of which concerned biases amplified in training datasets. But breakthroughs have been made. This year, OpenAI released DALL-E and CLIP, two multimodal models that the research labs claims are a "a step toward systems with [a] deeper understanding of the world." DALL-E, inspired by the surrealist artist Salvador Dalí, was trained to generate images from simple text descriptions.

dataset, multimodal model, video, (16 more...)

#artificialintelligence

Country:

North America > United States > California (0.15)
Asia (0.04)

Genre: Research Report (0.50)

Industry:

Information Technology (0.69)
Health & Medicine > Therapeutic Area (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.83)

Add feedback

Local Nonparametric Meta-Learning

Goo, Wonjoon, Niekum, Scott

arXiv.org Machine LearningFeb-8-2020

A central goal of meta-learning is to find a learning rule that enables fast adaptation across a set of tasks, by learning the appropriate inductive bias for that set. Most meta-learning algorithms try to find a \textit{global} learning rule that encodes this inductive bias. However, a global learning rule represented by a fixed-size representation is prone to meta-underfitting or -overfitting since the right representational power for a task set is difficult to choose a priori. Even when chosen correctly, we show that global, fixed-size representations often fail when confronted with certain types of out-of-distribution tasks, even when the same inductive bias is appropriate. To address these problems, we propose a novel nonparametric meta-learning algorithm that utilizes a meta-trained local learning rule, building on recent ideas in attention-based and functional gradient-based meta-learning. In several meta-regression problems, we show improved meta-generalization results using our local, nonparametric approach and achieve state-of-the-art results in the robotics benchmark, Omnipush.

international conference, merlot, standard deviation, (13 more...)

arXiv.org Machine Learning

2002.03272

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (0.40)

Industry: Education (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback