AITopics

Country:

Europe (0.28)
North America > United States > California > Santa Clara County > Palo Alto (0.15)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry:

Information Technology (0.92)
Law > Intellectual Property & Technology Law (0.68)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.95)
(2 more...)

Neural Information Processing SystemsFeb-17-2026, 12:24:59 GMT

Holistic Evaluation of Text-to-Image Models Tony Lee

We curate 62 scenarios encompassing these aspects and evaluate 26 state-of-the-art text-to-image models on this benchmark.

large language model, machine learning, natural language, (22 more...)

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.15)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry:

Information Technology (0.92)
Law > Intellectual Property & Technology Law (0.68)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.49)

Neural Information Processing SystemsFeb-17-2026, 12:24:56 GMT

Holistic Evaluation of Text-to-Image Models Tony Lee

We curate 62 scenarios encompassing these aspects and evaluate 26 state-of-the-art text-to-image models on this benchmark.

large language model, machine learning, natural language, (22 more...)

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.15)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry:

Information Technology (0.92)
Law > Intellectual Property & Technology Law (0.68)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.49)

Neural Information Processing SystemsFeb-13-2026, 15:37:50 GMT

b0ba5c44aaf65f6ca34cf116e6d82ebf-AuthorFeedback.pdf

entity resolution, machine learning, natural language, (15 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.51)
Information Technology > Artificial Intelligence > Natural Language (0.36)

WIREDDec-8-2025, 20:40:18 GMT

OpenAI Should Stop Naming Its Creations After Products That Already Exist

From "cameo" to "io," OpenAI keeps trying to call its new and upcoming releases by names that resemble existing trademarks. In September, OpenAI launched a way for users to generate a digital likeness of themselves they could use to create personalized deepfake videos . This is one of the core features in Sora, OpenAI's app for sharing AI videos inside a TikTok-style feed. The self-deepfaking feature was called "cameo," and with that standout feature, Sora quickly rose to the top of Apple's iOS download charts. This feature name led to a trademark lawsuit with Cameo, the app where fans can pay celebrities to record personalized videos.

large language model, machine learning, natural language, (18 more...)

WIRED

Country:

Asia > Nepal (0.15)
North America > United States > California (0.05)
Europe > Slovakia (0.05)
(2 more...)

Industry:

Law (1.00)
Information Technology > Services (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Almohaimeed, Saleh, Almohaimeed, Saad, Jari, Mousa, Alobaid, Khaled A., Alotaibi, Fahad

AI Text Detectors and the Misclassification of Slightly Polished Arabic Text

arXiv.org Artificial IntelligenceDec-3-2025

Many AI detection models have been developed to counter the presence of articles created by artificial intelligence (AI). However, if a human-authored article is slightly polished by AI, a shift will occur in the borderline decision of these AI detection models, leading them to consider it as AI-generated article. This misclassification may result in falsely accusing authors of AI plagiarism and harm the credibility of AI detectors. In English, some efforts were made to meet this challenge, but not in Arabic. In this paper, we generated two datasets. The first dataset contains 800 Arabic articles, half AI-generated and half human-authored. We used it to evaluate 14 Large Language models (LLMs) and commercial AI detectors to assess their ability in distinguishing between human-authored and AI-generated articles. The best 8 models were chosen to act as detectors for our primary concern, which is whether they would consider slightly polished human-authored text as AI-generated. The second dataset, Ar-APT, contains 400 Arabic human-authored articles polished by 10 LLMs using 4 polishing settings, totaling 16400 samples. We use it to evaluate the 8 nominated models and determine whether slight polishing will affect their performance. The results reveal that all AI detectors incorrectly attribute a significant number of articles to AI. The best performing LLM, Claude-4 Sonnet, achieved 83.51\%, its performance decreased to 57.63\% for articles slightly polished by LLaMA-3. Whereas the best performing commercial model, originality.AI, achieves 92\% accuracy, dropped to 12\% for articles slightly polished by Mistral or Gemma-3.

detector, large language model, machine learning, (19 more...)

2511.1669

Country: Asia > Middle East > Saudi Arabia (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-7-2025

Laugh, Relate, Engage: Stylized Comment Generation for Short Videos

Ouyang, Xuan, Wang, Senan, Wang, Bouzhou, Xiahou, Siyuan, Zhou, Jinrong, Li, Yuekang

Short-video platforms have become a central medium in the modern Internet landscape, where efficient information delivery and strong interactivity are reshaping user engagement and cultural dissemination. Among the various forms of user interaction, comments play a vital role in fostering community participation and enabling content re-creation. However, generating comments that are both compliant with platform guidelines and capable of exhibiting stylistic diversity and contextual awareness remains a significant challenge. We introduce LOLGORITHM, a modular multi-agent system (MAS) designed for controllable short-video comment generation. The system integrates video segmentation, contextual and affective analysis, and style-aware prompt construction. It supports six distinct comment styles: puns (homophones), rhyming, meme application, sarcasm (irony), plain humor, and content extraction. Powered by a multimodal large language model (MLLM), LOLGORITHM directly processes video inputs and achieves fine-grained style control through explicit prompt markers and few-shot examples. To support development and evaluation, we construct a bilingual dataset using official APIs from Douyin (Chinese) and YouTube (English), covering five popular video genres: comedy skits, daily life jokes, funny animal clips, humorous commentary, and talk shows. Evaluation combines automated metrics originality, relevance, and style conformity with a large-scale human preference study involving 40 videos and 105 participants. Results show that LOLGORITHM significantly outperforms baseline models, achieving preference rates of over 90% on Douyin and 87.55% on YouTube. This work presents a scalable and culturally adaptive framework for stylized comment generation on short-video platforms, offering a promising path to enhance user engagement and creative interaction.

artificial intelligence, machine learning, natural language, (21 more...)

2511.03757

Country: North America > United States > California (0.46)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

arXiv.org Artificial IntelligenceOct-29-2025

Can LLMs Write Faithfully? An Agent-Based Evaluation of LLM-generated Islamic Content

Mushtaq, Abdullah, Naeem, Rafay, Elmahjub, Ezieddin, Ghaznavi, Ibrahim, Al-Maliki, Shawqi, Abdallah, Mohamed, Al-Fuqaha, Ala, Qadir, Junaid

Large language models are increasingly used for Islamic guidance, but risk misquoting texts, misapplying jurisprudence, or producing culturally inconsistent responses. We pilot an evaluation of GPT-4o, Ansari AI, and Fanar on prompts from authentic Islamic blogs. Our dual-agent framework uses a quantitative agent for citation verification and six-dimensional scoring (e.g., Structure, Islamic Consistency, Citations) and a qualitative agent for five-dimensional side-by-side comparison (e.g., Tone, Depth, Originality). GPT-4o scored highest in Islamic Accuracy (3.93) and Citation (3.38), Ansari AI followed (3.68, 3.32), and Fanar lagged (2.76, 1.82). Despite relatively strong performance, models still fall short in reliably producing accurate Islamic content and citations -- a paramount requirement in faith-sensitive writing. GPT-4o had the highest mean quantitative score (3.90/5), while Ansari AI led qualitative pairwise wins (116/200). Fanar, though trailing, introduces innovations for Islamic and Arabic contexts. This study underscores the need for community-driven benchmarks centering Muslim perspectives, offering an early step toward more reliable AI in Islamic knowledge and other high-stakes domains such as medicine, law, and journalism.

large language model, machine learning, natural language, (21 more...)

2510.24438

Country: Asia > Middle East (0.14)

Genre: Research Report (0.82)

Industry:

Law (0.95)
Education (0.95)
Media > News (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Thelwall, Mike, Mohammadi, Ehsan

Can Small and Reasoning Large Language Models Score Journal Articles for Research Quality and Do Averaging and Few-shot Help?

arXiv.org Artificial IntelligenceOct-28-2025

Assessing published academic journal articles is a common task for evaluations of departments and individuals. Whilst it is sometimes supported by citation data, Large Language Models (LLMs) may give more useful indications of article quality. Evidence of this capability exists for two of the largest LLM families, ChatGPT and Gemini, and the medium sized LLM Gemma3 27b, but it is unclear whether smaller LLMs and reasoning models have similar abilities. This is important because larger models may be slow and impractical in some situations, and reasoning models may perform differently. Four relevant questions are addressed with Gemma3 variants, Llama4 Scout, Qwen3, Magistral Small and DeepSeek R1, on a dataset of 2,780 medical, health and life science papers in 6 fields, with two different gold standards, one novel. The results suggest that smaller (open weights) and reasoning LLMs have similar performance to ChatGPT 4o-mini and Gemini 2.0 Flash, but that 1b parameters may often, and 4b sometimes, be too few. Moreover, averaging scores from multiple identical queries seems to be a universally successful strategy, and few-shot prompts (four examples) tended to help but the evidence was equivocal. Reasoning models did not have a clear advantage. Overall, the results show, for the first time, that smaller LLMs >4b, including reasoning models, have a substantial capability to score journal articles for research quality, especially if score averaging is used.

correlation, large language model, machine learning, (22 more...)