AITopics

2412.19654

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.50)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

arXiv.org Artificial IntelligenceOct-1-2024

BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data

Wang, Xuwu, Cui, Qiwen, Tao, Yunzhe, Wang, Yiran, Chai, Ziwei, Han, Xiaotian, Liu, Boyi, Yuan, Jianbo, Su, Jing, Wang, Guoyin, Liu, Tingkai, Chen, Liyu, Liu, Tianyi, Sun, Tao, Zhang, Yufeng, Zheng, Sirui, You, Quanzeng, Yang, Yang, Yang, Hongxia

Large language models (LLMs) have become increasingly pivotal across various domains, especially in handling complex data types. This includes structured data processing, as exemplified by ChartQA and ChatGPT-Ada, and multimodal unstructured data processing as seen in Visual Question Answering (VQA). These areas have attracted significant attention from both industry and academia. Despite this, there remains a lack of unified evaluation methodologies for these diverse data handling scenarios. In response, we introduce BabelBench, an innovative benchmark framework that evaluates the proficiency of LLMs in managing multimodal multistructured data with code execution. BabelBench incorporates a dataset comprising 247 meticulously curated problems that challenge the models with tasks in perception, commonsense reasoning, logical reasoning, and so on. Besides the basic capabilities of multimodal understanding, structured data processing as well as code generation, these tasks demand advanced capabilities in exploration, planning, reasoning and debugging. Our experimental findings on BabelBench indicate that even cutting-edge models like ChatGPT 4 exhibit substantial room for improvement. The insights derived from our comprehensive analysis offer valuable guidance for future research within the community. The benchmark data can be found at https://github.com/FFD8FFE/babelbench.

large language model, machine learning, natural language, (22 more...)

2410.00773

Genre: Research Report (0.84)

Industry:

Information Technology > Software (0.74)
Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJan-18-2024

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Wang, Yiqi, Chen, Wentao, Han, Xiaotian, Lin, Xudong, Zhao, Haiteng, Liu, Yongfei, Zhai, Bohan, Yuan, Jianbo, You, Quanzeng, Yang, Hongxia

Strong Artificial Intelligence (Strong AI) or Artificial General Intelligence (AGI) with abstract reasoning ability is the goal of next-generation AI. Recent advancements in Large Language Models (LLMs), along with the emerging field of Multimodal Large Language Models (MLLMs), have demonstrated impressive capabilities across a wide range of multimodal tasks and applications. Particularly, various MLLMs, each with distinct model architectures, training data, and training stages, have been evaluated across a broad range of MLLM benchmarks. These studies have, to varying degrees, revealed different aspects of the current capabilities of MLLMs. However, the reasoning abilities of MLLMs have not been systematically investigated. In this survey, we comprehensively review the existing evaluation protocols of multimodal reasoning, categorize and illustrate the frontiers of MLLMs, introduce recent trends in applications of MLLMs on reasoning-intensive tasks, and finally discuss current practices and future directions. We believe our survey establishes a solid base and sheds light on this important topic, multimodal reasoning.

large language model, machine learning, natural language, (19 more...)

2401.06805

Country:

North America > United States > Pennsylvania (0.14)
North America > United States > California (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningNov-29-2023

Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling

Zheng, Huangjie, Wang, Zhendong, Yuan, Jianbo, Ning, Guanghan, He, Pengcheng, You, Quanzeng, Yang, Hongxia, Zhou, Mingyuan

Diffusion models excel at generating photo-realistic images but come with significant computational costs in both training and sampling. While various techniques address these computational challenges, a less-explored issue is designing an efficient and adaptable network backbone for iterative refinement. Current options like U-Net and Vision Transformer often rely on resource-intensive deep networks and lack the flexibility needed for generating images at variable resolutions or with a smaller network than used in training. This study introduces LEGO bricks, which seamlessly integrate Local-feature Enrichment and Global-content Orchestration. These bricks can be stacked to create a test-time reconfigurable diffusion backbone, allowing selective skipping of bricks to reduce sampling costs and generate higher-resolution images than the training data. LEGO bricks enrich local regions with an MLP and transform them using a Transformer block while maintaining a consistent full-resolution image across all bricks. Experimental results demonstrate that LEGO bricks enhance training efficiency, expedite convergence, and facilitate variable-resolution image generation while maintaining strong generative performance. Moreover, LEGO significantly reduces sampling time compared to other methods, establishing it as a valuable enhancement for diffusion models.

artificial intelligence, lego brick, machine learning, (16 more...)

arXiv.org Machine Learning

2310.06389

Country:

North America > United States > Texas (0.14)
Europe (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-28-2023

Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis

Chen, Xiaohui, Liu, Yongfei, Yang, Yingxiang, Yuan, Jianbo, You, Quanzeng, Liu, Li-Ping, Yang, Hongxia

Recent advancements in text-to-image (T2I) generative models have shown remarkable capabilities in producing diverse and imaginative visuals based on text prompts. Despite the advancement, these diffusion models sometimes struggle to translate the semantic content from the text into images entirely. While conditioning on the layout has shown to be effective in improving the compositional ability of T2I diffusion models, they typically require manual layout input. In this work, we introduce a novel approach to improving T2I diffusion models using Large Language Models (LLMs) as layout generators. Our method leverages the Chain-of-Thought prompting of LLMs to interpret text and generate spatially reasonable object layouts. The generated layout is then used to enhance the generated images' composition and spatial accuracy. Moreover, we propose an efficient adapter based on a cross-attention mechanism, which explicitly integrates the layout information into the stable diffusion models. Our experiments demonstrate significant improvements in image quality and layout accuracy, showcasing the potential of LLMs in augmenting generative image models.

large language model, machine learning, natural language, (21 more...)

2311.17126

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-21-2023

Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation

Luo, Junyu, Zheng, Zifei, Ye, Hanzhong, Ye, Muchao, Wang, Yaqing, You, Quanzeng, Xiao, Cao, Ma, Fenglong

Patients with low health literacy usually have difficulty understanding medical jargon and the complex structure of professional medical language. Although some studies are proposed to automatically translate expert language into layperson-understandable language, only a few of them focus on both accuracy and readability aspects simultaneously in the clinical domain. Thus, simplification of the clinical language is still a challenging task, but unfortunately, it is not yet fully addressed in previous work. To benchmark this task, we construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches. Besides, we propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance compared with eight strong baselines. To fairly evaluate the performance, we also propose three specific evaluation metrics. Experimental results demonstrate the utility of the annotated MedLane dataset and the effectiveness of the proposed model DECLARE.

machine learning, natural language, simplification, (21 more...)

2012.0242

Country: Asia > China > Liaoning Province (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (0.67)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.47)
Health & Medicine > Health Care Technology > Medical Record (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

AAAI ConferencesMay-11-2017

Cultural Diffusion and Trends in Facebook Photographs

You, Quanzeng (University of Rochester) | García-García, Darío (Facebook) | Paluri, Mahohar (Facebook) | Luo, Jiebo (University of Rochester) | Joo, Jungseock (University of California, Los Angeles)

Online social media is a social vehicle in which people share various moments of their lives with their friends, such as playing sports, cooking dinner or just taking a selfie for fun, via visual means, that is, photographs. Our study takes a closer look at the popular visual concepts illustrating various cultural lifestyles from aggregated, de-identified photographs. We perform analysis both at macroscopic and microscopic levels, to gain novel insights about global and local visual trends as well as the dynamics of interpersonal cultural exchange and diffusion among Facebook friends. We processed images by automatically classifying the visual content by a convolutional neural network (CNN). Through various statistical tests, we find that socially tied individuals more likely post images showing similar cultural lifestyles. To further identify the main cause of the observed social correlation, we use the Shuffle test and the Preference-based Matched Estimation (PME) test to distinguish the effects of influence and homophily. The results indicate that the visual content of each user's photographs are temporally, although not necessarily causally, correlated with the photographs of their friends, which may suggest the effect of influence. Our paper demonstrates that Facebook photographs exhibit diverse cultural lifestyles and preferences and that the social interaction mediated through the visual channel in social media can be an effective mechanism for cultural diffusion.

cultural diffusion and trend, facebook photograph

Eleventh International AAAI Conference on Web and Social Media

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

AAAI ConferencesFeb-14-2017

Visual Sentiment Analysis by Attending on Local Image Regions

You, Quanzeng (University of Rochester) | Jin, Hailin (Adobe) | Luo, Jiebo (University of Rochester)

Visual sentiment analysis, which studies the emotional response of humans on visual stimuli such as images and videos, has been an interesting and challenging problem. It tries to understand the high-level content of visual data. The success of current models can be attributed to the development of robust algorithms from computer vision. Most of the existing models try to solve the problem by proposing either robust features or more complex models. In particular, visual features from the whole image or video are the main proposed inputs. Little attention has been paid to local areas, which we believe is pretty relevant to human's emotional response to the whole image. In this work, we study the impact of local image regions on visual sentiment analysis. Our proposed model utilizes the recent studied attention mechanism to jointly discover the relevant local regions and build a sentiment classifier on top of these local regions. The experimental results suggest that 1) our model is capable of automatically discovering sentimental local regions of given images and 2) it outperforms existing state-of-the-art algorithms to visual sentiment analysis.

deep learning, neural network, sentiment analysis, (19 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country:

North America > United States > New York (0.14)
North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

AAAI ConferencesApr-19-2016

Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark

You, Quanzeng (University of Rochester) | Luo, Jiebo (University of Rochester) | Jin, Hailin (Adobe Research ) | Yang, Jianchao (Snapchat Inc)

Psychological research results have confirmed that people can have different emotional reactions to different visual stimuli. Several papers have been published on the problem of visual emotion analysis. In particular, attempts have been made to analyze and predict people's emotional reaction towards images. To this end, different kinds of hand-tuned features are proposed. The results reported on several carefully selected and labeled small image data sets have confirmed the promise of such features. While the recent successes of many computer vision related tasks are due to the adoption of Convolutional Neural Networks (CNNs), visual emotion analysis has not achieved the same level of success. This may be primarily due to the unavailability of confidently labeled and relatively large image data sets for visual emotion analysis. In this work, we introduce a new data set, which started from 3+ million weakly labeled images of different emotions and ended up 30 times as large as the current largest publicly available visual emotion data set. We hope that this data set encourages further research on visual emotion analysis. We also perform extensive benchmarking analyses on this large data set using the state of the art methods including CNNs.

deep learning, emotion analysis, neural network, (18 more...)

Thirtieth AAAI Conference on Artificial Intelligence

Country: North America > United States > New York (0.28)

Genre: Research Report (0.67)

Industry:

Information Technology (0.94)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)

AAAI ConferencesMar-6-2015

Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks

You, Quanzeng (University of Rochester) | Luo, Jiebo (University of Rochester) | Jin, Hailin (Adobe Research) | Yang, Jianchao (Adobe Research)

Sentiment analysis of online user generated content is important for many social media analytics tasks. Researchers have largely relied on textual sentiment analysis to develop systems to predict political elections, measure economic indicators, and so on. Recently, social media users are increasingly using images and videos to express their opinions and share their experiences. Sentiment analysis of such large scale visual content can help better extract user sentiments toward events or topics, such as those in image tweets, so that prediction of sentiment from visual content is complementary to textual sentiment analysis. Motivated by the needs in leveraging large scale yet noisy training data to solve the extremely challenging problem of image sentiment analysis, we employ Convolutional Neural Networks (CNN). We first design a suitable CNN architecture for image sentiment analysis. We obtain half a million training samples by using a baseline sentiment algorithm to label Flickr images. To make use of such noisy machine labeled data, we employ a progressive strategy to fine-tune the deep network. Furthermore, we improve the performance on Twitter images by inducing domain transfer with a small number of manually labeled Twitter images. We have conducted extensive experiments on manually labeled Twitter images. The results show that the proposed CNN can achieve better performance in image sentiment analysis than competing algorithms.

deep learning, neural network, sentiment analysis, (22 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology (0.69)
Government > Voting & Elections (0.66)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)