AITopics | Media

Collaborating Authors

Media

170 Best Prime Day Deals of 2025, Vetted by Our Amazon Experts

WIREDJul-8-2025, 08:01:20 GMT

Amazon Prime Day is here, and the deals are dropping like the bass at a Skrillex show. This year, Amazon has expanded the event to four days, with most of the best Prime Day deals launching in the early hours of Tuesday morning and sitting on the metaphorical shelf through 11:59 pm PT on Friday, July 11. There are tens of thousands of products on sale this week, but comparatively few deserve your time and money. The WIRED Reviews team spends weeks prepping for Prime Day and will be working in shifts for 20 hours a day throughout the event to keep our coverage updated--all so you can score real savings on products we've tested and approved. If you're looking for up-to-the-minute coverage of lightning deals, this year's trending products, and fast sellers, along with what will surely be increasingly unhinged commentary, check out our Amazon Prime Day liveblog, which will run from 5 am to midnight daily. How Does WIRED Spot Deals? We start searching for the best Prime Day deals weeks ...

headphone & speaker, nena farrell, wired recommend, (12 more...)

WIRED

Country:

Europe > United Kingdom (0.28)
North America > United States > California (0.04)

Industry:

Media (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology > Security & Privacy (1.00)
(8 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Hardware (1.00)
Information Technology > Communications > Networks (1.00)
(3 more...)

Add feedback

Demystifying ChatGPT: How It Masters Genre Recognition

Raj, Subham, Saha, Sriparna, Singh, Brijraj, Pedanekar, Niranjan

arXiv.org Artificial IntelligenceJul-8-2025

The introduction of ChatGPT has garnered significant attention within the NLP community and beyond. Previous studies have demonstrated ChatGPT's substantial advancements across various downstream NLP tasks, highlighting its adaptability and potential to revolutionize language-related applications. However, its capabilities and limitations in genre prediction remain unclear. This work analyzes three Large Language Models (LLMs) using the MovieLens-100K dataset to assess their genre prediction capabilities. Our findings show that ChatGPT, without fine-tuning, outperformed other LLMs, and fine-tuned ChatGPT performed best overall. We set up zero-shot and few-shot prompts using audio transcripts/subtitles from movie trailers in the MovieLens-100K dataset, covering 1682 movies of 18 genres, where each movie can have multiple genres. Additionally, we extended our study by extracting IMDb movie posters to utilize a Vision Language Model (VLM) with prompts for poster information. This fine-grained information was used to enhance existing LLM prompts. In conclusion, our study reveals ChatGPT's remarkable genre prediction capabilities, surpassing other language models. The integration of VLM further enhances our findings, showcasing ChatGPT's potential for content-related applications by incorporating visual information from movie posters.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.03875

Country: Asia > India > Bihar > Patna (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Personalized Image Generation from an Author Writing Style

Gandhi, Sagar, Gandhi, Vishal

arXiv.org Artificial IntelligenceJul-8-2025

Translating nuanced, textually-defined authorial writing styles into compelling visual representations presents a novel challenge in generative AI. This paper introduces a pipeline that leverages Author Writing Sheets (AWS) - structured summaries of an author's literary characteristics - as input to a Large Language Model (LLM, Claude 3.7 Sonnet). The LLM interprets the AWS to generate three distinct, descriptive text-to-image prompts, which are then rendered by a diffusion model (Stable Diffusion 3.5 Medium). We evaluated our approach using 49 author styles from Reddit data, with human evaluators assessing the stylistic match and visual distinctiveness of the generated images. Results indicate a good perceived alignment between the generated visuals and the textual authorial profiles (mean style match: $4.08/5$), with images rated as moderately distinctive. Qualitative analysis further highlighted the pipeline's ability to capture mood and atmosphere, while also identifying challenges in representing highly abstract narrative elements. This work contributes a novel end-to-end methodology for visual authorial style personalization and provides an initial empirical validation, opening avenues for applications in creative assistance and cross-modal understanding.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2507.03313

Genre: Research Report (1.00)

Industry: Media (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

MusGO: A Community-Driven Framework For Assessing Openness in Music-Generative AI

Batlle-Roca, Roser, Ibáñez-Martínez, Laura, Serra, Xavier, Gómez, Emilia, Rocamora, Martín

arXiv.org Artificial IntelligenceJul-8-2025

Since 2023, generative AI has rapidly advanced in the music domain. Despite significant technological advancements, music-generative models raise critical ethical challenges, including a lack of transparency and accountability, along with risks such as the replication of artists' works, which highlights the importance of fostering openness. With upcoming regulations such as the EU AI Act encouraging open models, many generative models are being released labelled as 'open'. However, the definition of an open model remains widely debated. In this article, we adapt a recently proposed evidence-based framework for assessing openness in LLMs to the music domain. Using feedback from a survey of 110 participants from the Music Information Retrieval (MIR) community, we refine the framework into MusGO (Music-Generative Open AI), which comprises 13 openness categories: 8 essential and 5 desirable. We evaluate 16 state-of-the-art generative models and provide an openness leaderboard that is fully open to public scrutiny and community contributions. Through this work, we aim to clarify the concept of openness in music-generative AI and promote its transparent and responsible development.

machine learning, natural language, openness, (19 more...)

arXiv.org Artificial Intelligence

2507.03599

Country:

North America > United States (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Spain > Andalusia > Seville Province > Seville (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)

Add feedback

Exploring Object Status Recognition for Recipe Progress Tracking in Non-Visual Cooking

Li, Franklin Mingzhe, Ng, Kaitlyn, Zhu, Bin, Carrington, Patrick

arXiv.org Artificial IntelligenceJul-8-2025

Cooking plays a vital role in everyday independence and well-being, yet remains challenging for people with vision impairments due to limited support for tracking progress and receiving contextual feedback. Object status - the condition or transformation of ingredients and tools - offers a promising but underexplored foundation for context-aware cooking support. In this paper, we present OSCAR (Object Status Context Awareness for Recipes), a technical pipeline that explores the use of object status recognition to enable recipe progress tracking in non-visual cooking. OSCAR integrates recipe parsing, object status extraction, visual alignment with cooking steps, and time-causal modeling to support real-time step tracking. We evaluate OSCAR on 173 instructional videos and a real-world dataset of 12 non-visual cooking sessions recorded by BLV individuals in their homes. Our results show that object status consistently improves step prediction accuracy across vision-language models, and reveal key factors that impact performance in real-world conditions, such as implicit tasks, camera placement, and lighting. We contribute the pipeline of context-aware recipe progress tracking, an annotated real-world non-visual cooking dataset, and design insights to guide future context-aware assistive cooking systems.

machine learning, natural language, oscar, (12 more...)

arXiv.org Artificial Intelligence

2507.0333

Country:

North America > United States > Florida > Hillsborough County > University (0.40)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Consumer Health (1.00)
Media (0.93)
Education > Educational Technology > Audio & Video (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.46)

Add feedback

EXPOTION: Facial Expression and Motion Control for Multimodal Music Generation

Izzati, Fathinah, Li, Xinyue, Xia, Gus

arXiv.org Artificial IntelligenceJul-8-2025

We propose Expotion (Facial Expression and Motion Control for Multimodal Music Generation), a generative model leveraging multimodal visual controls - specifically, human facial expressions and upper-body motion - as well as text prompts to produce expressive and temporally accurate music. We adopt parameter-efficient fine-tuning (PEFT) on the pretrained text-to-music generation model, enabling fine-grained adaptation to the multimodal controls using a small dataset. To ensure precise synchronization between video and music, we introduce a temporal smoothing strategy to align multiple modalities. Experiments demonstrate that integrating visual features alongside textual descriptions enhances the overall quality of generated music in terms of musicality, creativity, beat-tempo consistency, temporal alignment with the video, and text adherence, surpassing both proposed baselines and existing state-of-the-art video-to-music generation models. Additionally, we introduce a novel dataset consisting of 7 hours of synchronized video recordings capturing expressive facial and upper-body gestures aligned with corresponding music, providing significant potential for future research in multimodal and interactive music generation.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.04955

Country:

Asia > South Korea > Daejeon > Daejeon (0.04)
Asia > Middle East > UAE (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Four Shades of Life Sciences: A Dataset for Disinformation Detection in the Life Sciences

Seidlmayer, Eva, Galke, Lukas, Förstner, Konrad U.

arXiv.org Artificial IntelligenceJul-8-2025

Disseminators of disinformation often seek to attract attention or evoke emotions - typically to gain influence or generate revenue - resulting in distinctive rhetorical patterns that can be exploited by machine learning models. In this study, we explore linguistic and rhetorical features as proxies for distinguishing disinformative texts from other health and life-science text genres, applying both large language models and classical machine learning classifiers. Given the limitations of existing datasets, which mainly focus on fact checking misinformation, we introduce Four Shades of Life Sciences (FSoLS): a novel, labeled corpus of 2,603 texts on 14 life-science topics, retrieved from 17 diverse sources and classified into four categories of life science publications. The source code for replicating, and updating the dataset is available on GitHub: https://github.com/EvaSeidlmayer/FourShadesofLifeSciences

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.03488

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > New York > New York County > New York City (0.04)
(10 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Media > News (1.00)
Health & Medicine > Therapeutic Area > Vaccines (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)

Add feedback

Towards Human-in-the-Loop Onset Detection: A Transfer Learning Approach for Maracatu

Pinto, António Sá

arXiv.org Artificial IntelligenceJul-8-2025

We explore transfer learning strategies for musical onset detection in the Afro-Brazilian Maracatu tradition, which features complex rhythmic patterns that challenge conventional models. We adapt two Temporal Convolutional Network architectures: one pre-trained for onset detection (intra-task) and another for beat tracking (inter-task). Using only 5-second annotated snippets per instrument, we fine-tune these models through layer-wise retraining strategies for five traditional percussion instruments. Our results demonstrate significant improvements over baseline performance, with F1 scores reaching up to 0.998 in the intra-task setting and improvements of over 50 percentage points in best-case scenarios. The cross-task adaptation proves particularly effective for time-keeping instruments, where onsets naturally align with beat positions. The optimal fine-tuning configuration varies by instrument, highlighting the importance of instrument-specific adaptation strategies. This approach addresses the challenges of underrepresented musical traditions, offering an efficient human-in-the-loop methodology that minimizes annotation effort while maximizing performance. Our findings contribute to more inclusive music information retrieval tools applicable beyond Western musical contexts.

artificial intelligence, instrument, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2507.04858

Country:

Europe > Portugal > Porto > Porto (0.76)
South America > Brazil > Pernambuco (0.04)
North America > United States > Illinois (0.04)
(2 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.64)

Add feedback

Efficient Perplexity Bound and Ratio Matching in Discrete Diffusion Language Models

Haxholli, Etrit, Gürbüz, Yeti Z., Can, Oğul, Waxman, Eli

arXiv.org Machine LearningJul-8-2025

While continuous diffusion models excel in modeling continuous distributions, their application to categorical data has been less effective. Recent work has shown that ratio-matching through score-entropy within a continuous-time discrete Markov chain (CTMC) framework serves as a competitive alternative to autoregressive models in language modeling. To enhance this framework, we first introduce three new theorems concerning the KL divergence between the data and learned distribution. Our results serve as the discrete counterpart to those established for continuous diffusion models and allow us to derive an improved upper bound of the perplexity. Second, we empirically show that ratio-matching performed by minimizing the denoising cross-entropy between the clean and corrupted data enables models to outperform those utilizing score-entropy with up to 10% lower perplexity/generative-perplexity, and 15% faster training steps. To further support our findings, we introduce and evaluate a novel CTMC transition-rate matrix that allows prediction refinement, and derive the analytic expression for its matrix exponential which facilitates the computation of conditional ratios thus enabling efficient training and generation.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2507.04341

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > Scotland (0.04)
(15 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Leisure & Entertainment > Sports (0.92)
Law (0.92)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.48)

Add feedback

An Evaluation of Large Language Models on Text Summarization Tasks Using Prompt Engineering Techniques

Aly, Walid Mohamed, Soliman, Taysir Hassan A., AbdelAziz, Amr Mohamed

arXiv.org Artificial IntelligenceJul-8-2025

Large Language Models (LLMs) continue to advance natural language processing with their ability to generate human-like text across a range of tasks. Despite the remarkable success of LLMs in Natural Language Processing (NLP), their performance in text summarization across various domains and datasets has not been comprehensively evaluated. At the same time, the ability to summarize text effectively without relying on extensive training data has become a crucial bottleneck. To address these issues, we present a systematic evaluation of six LLMs across four datasets: CNN/Daily Mail and NewsRoom (news), SAMSum (dialog), and ArXiv (scientific). By leveraging prompt engineering techniques including zero-shot and in-context learning, our study evaluates the performance using the ROUGE and BERTScore metrics. In addition, a detailed analysis of inference times is conducted to better understand the trade-off between summarization quality and computational efficiency. For Long documents, introduce a sentence-based chunking strategy that enables LLMs with shorter context windows to summarize extended inputs in multiple stages. The findings reveal that while LLMs perform competitively on news and dialog tasks, their performance on long scientific documents improves significantly when aided by chunking strategies. In addition, notable performance variations were observed based on model parameters, dataset properties, and prompt design. These results offer actionable insights into how different LLMs behave across task types, contributing to ongoing research in efficient, instruction-based NLP systems.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.05123

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Media > News (0.49)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback