AITopics

2412.03092

Country:

North America > United States > Illinois (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Materials > Chemicals (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL

Koh, Woosung, Oh, Wonbeen, Kim, Siyeol, Shin, Suhin, Kim, Hyeongjin, Jang, Jaein, Lee, Junghyun, Yun, Se-Young

Multi-agent reinforcement learning has demonstrated significant potential in addressing complex cooperative tasks across various real-world applications. However, existing MARL approaches often rely on the restrictive assumption that the number of entities (e.g., agents, obstacles) remains constant between training and inference. This overlooks scenarios where entities are dynamically removed or added during the inference trajectory -- a common occurrence in real-world environments like search and rescue missions and dynamic combat situations. In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization, where such dynamic changes cannot be anticipated beforehand. Our empirical studies reveal that existing MARL methods suffer significant performance degradation and increased uncertainty in these scenarios. In response, we propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods. FlickerFusion stochastically drops out parts of the observation space, emulating being in-domain when inferenced OOD. The results show that FlickerFusion not only achieves superior inference rewards but also uniquely reduces uncertainty vis-\`a-vis the backbone, compared to existing methods. Benchmarks, implementations, and model weights are organized and open-sourced at flickerfusion305.github.io, accompanied by ample demo video renderings.

machine learning, natural language, reinforcement learning, (16 more...)

2410.15876

Country:

North America > United States (0.28)
Asia (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Materials > Chemicals > Industrial Gases > Liquified Gas (0.67)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.67)
Energy > Oil & Gas > Midstream (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Deep Learning, Machine Learning, Advancing Big Data Analytics and Management

Hsieh, Weiche, Bi, Ziqian, Chen, Keyu, Peng, Benji, Zhang, Sen, Xu, Jiawei, Wang, Jinlang, Yin, Caitlyn Heqi, Zhang, Yichao, Feng, Pohsun, Wen, Yizhu, Wang, Tianyang, Li, Ming, Liang, Chia Xin, Ren, Jintao, Niu, Qian, Chen, Silin, Yan, Lawrence K. Q., Xu, Han, Tseng, Hong-Ming, Song, Xinyuan, Jing, Bowen, Yang, Junjie, Song, Junhao, Liu, Junyu, Liu, Ming

Advancements in artificial intelligence, machine learning, and deep learning have catalyzed the transformation of big data analytics and management into pivotal domains for research and application. This work explores the theoretical foundations, methodological advancements, and practical implementations of these technologies, emphasizing their role in uncovering actionable insights from massive, high-dimensional datasets. The study presents a systematic overview of data preprocessing techniques, including data cleaning, normalization, integration, and dimensionality reduction, to prepare raw data for analysis. Core analytics methodologies such as classification, clustering, regression, and anomaly detection are examined, with a focus on algorithmic innovation and scalability. Furthermore, the text delves into state-of-the-art frameworks for data mining and predictive modeling, highlighting the role of neural networks, support vector machines, and ensemble methods in tackling complex analytical challenges. Special emphasis is placed on the convergence of big data with distributed computing paradigms, including cloud and edge computing, to address challenges in storage, computation, and real-time analytics. The integration of ethical considerations, including data privacy and compliance with global standards, ensures a holistic perspective on data management. Practical applications across healthcare, finance, marketing, and policy-making illustrate the real-world impact of these technologies. Through comprehensive case studies and Python-based implementations, this work equips researchers, practitioners, and data enthusiasts with the tools to navigate the complexities of modern data analytics. It bridges the gap between theory and practice, fostering the development of innovative solutions for managing and leveraging data in the era of artificial intelligence.

data mining, information retrieval, machine learning, (25 more...)

2412.02187

Country:

Europe (0.67)
Asia > China (0.45)
North America > United States > Wisconsin (0.14)
(2 more...)

Genre:

Workflow (1.00)
Overview (1.00)
Research Report > Experimental Study (0.67)

Industry:

Transportation (1.00)
Leisure & Entertainment (1.00)
Information Technology > Services (1.00)
(10 more...)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
(4 more...)

Bridging Hard and Soft: Mechanical Metamaterials Enable Rigid Torque Transmission in Soft Robots

Carton, Molly, Kowalewski, Jakub F., Guo, Jiani, Alpert, Jacob F., Garg, Aman, Revier, Daniel, Lipton, Jeffrey Ian

Torque and continuous rotation are fundamental methods of actuation and manipulation in rigid robots. Soft robot arms use soft materials and structures to mimic the passive compliance of biological arms that bend and extend. This use of compliance prevents soft arms from continuously transmitting and exerting torques to interact with their environment. Here, we show how relying on patterning structures instead of inherent material properties allows soft robotic arms to remain compliant while continuously transmitting torque to their environment. We demonstrate a soft robotic arm made from a pair of mechanical metamaterials that act as compliant constant-velocity joints. The joints are up to 52 times stiffer in torsion than bending and can bend up to 45{\deg}. This robot arm can continuously transmit torque while deforming in all other directions. The arm's mechanical design achieves high motion repeatability (0.4 mm and 0.1{\deg}) when tracking trajectories. We then trained a neural network to learn the inverse kinematics, enabling us to program the arm to complete tasks that are challenging for existing soft robots such as installing light bulbs, fastening bolts, and turning valves. The arm's passive compliance makes it safe around humans and provides a source of mechanical intelligence, enabling it to adapt to misalignment when manipulating objects. This work will bridge the gap between hard and soft robotics with applications in human assistance, warehouse automation, and extreme environments.

artificial intelligence, machine learning, trunc, (18 more...)

2412.0265

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Materials (0.93)
Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

An Automated Data Mining Framework Using Autoencoders for Feature Extraction and Dimensionality Reduction

Liang, Yaxin, Li, Xinshi, Huang, Xin, Zhang, Ziqi, Yao, Yue

This study proposes an automated data mining framework based on autoencoders and experimentally verifies its effectiveness in feature extraction and data dimensionality reduction. Through the encoding-decoding structure, the autoencoder can capture the data's potential characteristics and achieve noise reduction and anomaly detection, providing an efficient and stable solution for the data mining process. The experiment compared the performance of the autoencoder with traditional dimensionality reduction methods (such as PCA, FA, T-SNE, and UMAP). The results showed that the autoencoder performed best in terms of reconstruction error and root mean square error and could better retain data structure and enhance the generalization ability of the model. The autoencoder-based framework not only reduces manual intervention but also significantly improves the automation of data processing. In the future, with the advancement of deep learning and big data technology, the autoencoder method combined with a generative adversarial network (GAN) or graph neural network (GNN) is expected to be more widely used in the fields of complex data processing, real-time data analysis and intelligent decision-making.

autoencoder, data mining, machine learning, (14 more...)

2412.02211

Country:

North America > United States > Virginia (0.04)
North America > United States > New Jersey (0.04)
North America > United States > California > San Bernardino County > Montclair (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (0.69)
Materials > Metals & Mining (0.35)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.84)

3D Interaction Geometric Pre-training for Molecular Relational Learning

Lee, Namkyeong, Oh, Yunhak, Noh, Heewoong, Na, Gyoung S., Xu, Minkai, Wang, Hanchen, Fu, Tianfan, Park, Chanyoung

Molecular relational learning (MRL) focuses on understanding the interaction dynamics between molecules and has gained significant attention from researchers thanks to its diverse applications [20]. For instance, understanding how a medication dissolves in different solvents (medication-solvent interaction) is vital in pharmacy [30, 26, 3], while predicting the optical and photophysical properties of chromophores in various solvents (chromophore-solvent interaction) is essential for material discovery [16]. Because of the expensive time and financial costs associated with conducting wet lab experiments to test the interaction behavior of all possible molecular pairs [31], machine learning methods have been quickly embraced for MRL. Despite recent advancements in MRL, previous works tend to ignore molecules' 3D geometric information and instead focus solely on their 2D topological structures. However, in molecular science, the 3D geometric information of molecules (Figure 1 (a)) is crucial for understanding and predicting molecular behavior across various contexts, ranging from physical properties [1] to biological functions [10, 46]. This is particularly important in MRL, as geometric information plays a key role in molecular interactions by determining how molecules recognize, interact, and bind with one another in their interaction environment [34]. In traditional molecular dynamics simulations, explicit solvent models, which directly consider the detailed environment of molecular interaction, have demonstrated superior performance compared to implicit solvent models, which simplify the solvent as a continuous medium, highlighting the significance of explicitly modeling the complex geometries of interaction environments [47]. However, acquiring stereochemical structures of molecules is often very costly, resulting in limited availability of such 3D geometric information for downstream tasks [23].

artificial intelligence, machine learning, molecule, (19 more...)

2412.02957

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Minnesota (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Materials > Chemicals (1.00)
Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Mansour, Youssef, Heckel, Reinhard

Measuring Bias of Web-filtered Text Datasets and Bias Propagation Through Training

We investigate biases in pretraining datasets for large language models (LLMs) through dataset classification experiments. Building on prior work demonstrating the existence of biases in popular computer vision datasets, we analyze popular open-source pretraining datasets for LLMs derived from CommonCrawl including C4, RefinedWeb, DolmaCC, RedPajama-V2, FineWeb, and DCLM-Baseline. Despite those datasets being obtained with similar filtering and deduplication steps, neural networks can classify surprisingly well which dataset a single text sequence belongs to, significantly better than a human can. This indicates that popular pretraining datasets have their own unique biases or fingerprints. Those biases remain even when the text is rewritten with LLMs. Moreover, these biases propagate through training: Random sequences generated by models trained on those datasets can be classified well by a classifier trained on the original datasets.

large language model, machine learning, natural language, (19 more...)

2412.02857

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > Mexico (0.04)
(6 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Materials > Chemicals (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Law Enforcement & Public Safety (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback

Furuta, Hiroki, Zen, Heiga, Schuurmans, Dale, Faust, Aleksandra, Matsuo, Yutaka, Liang, Percy, Yang, Sherry

Large text-to-video models hold immense potential for a wide range of downstream applications. However, these models struggle to accurately depict dynamic object interactions, often resulting in unrealistic movements and frequent violations of real-world physics. One solution inspired by large language models is to align generated outputs with desired outcomes using external feedback. This enables the model to refine its responses autonomously, eliminating extensive manual data collection. In this work, we investigate the use of feedback to enhance the object dynamics in text-to-video models. We aim to answer a critical question: what types of feedback, paired with which specific self-improvement algorithms, can most effectively improve text-video alignment and realistic object interactions? We begin by deriving a unified probabilistic objective for offline RL finetuning of text-to-video models. This perspective highlights how design elements in existing algorithms like KL regularization and policy projection emerge as specific choices within a unified framework. We then use derived methods to optimize a set of text-video alignment metrics (e.g., CLIP scores, optical flow), but notice that they often fail to align with human perceptions of generation quality. To address this limitation, we propose leveraging vision-language models to provide more nuanced feedback specifically tailored to object dynamics in videos. Our experiments demonstrate that our method can effectively optimize a wide variety of rewards, with binary AI feedback driving the most significant improvements in video quality for dynamic interactions, as confirmed by both AI and human evaluations. Notably, we observe substantial gains when using reward signals derived from AI feedback, particularly in scenarios involving complex interactions between multiple objects and realistic depictions of objects falling.

arxivpreprintarxiv, dynamic object interaction, text-to-video generation, (11 more...)

2412.02617

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.50)

Industry: Materials > Containers & Packaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceDec-2-2024

VisScience: An Extensive Benchmark for Evaluating K12 Educational Multi-modal Scientific Reasoning

Jiang, Zhihuan, Yang, Zhen, Chen, Jinhao, Du, Zhengxiao, Wang, Weihan, Xu, Bin, Tang, Jie

Multi-modal large language models (MLLMs) have demonstrated promising capabilities across various tasks by integrating textual and visual information to achieve visual understanding in complex scenarios. Despite the availability of several benchmarks aims to evaluating MLLMs in tasks from visual question answering to complex problem-solving, most focus predominantly on mathematics or general visual understanding tasks. This reveals a critical gap in current benchmarks, which often overlook the inclusion of other key scientific disciplines such as physics and chemistry. To address this gap, we meticulously construct a comprehensive benchmark, named VisScience, which is utilized to assess the multi-modal scientific reasoning across the three disciplines of mathematics, physics, and chemistry. This benchmark comprises 3,000 questions drawn from K12 education - spanning elementary school through high school - equally distributed across three disciplines, with 1,000 questions per discipline. The questions within VisScience span 21 distinct subjects and are categorized into five difficulty levels, offering a broad spectrum of topics within each discipline. With VisScience, we present a detailed evaluation of the performance of 25 representative MLLMs in scientific reasoning. Experimental results demonstrate that closed-source MLLMs generally outperform open-source models. The best performance observed include a 53.4\% accuracy in mathematics by Claude3.5-Sonnet, 38.2\% in physics by GPT-4o, and 47.0\% in chemistry by Gemini-1.5-Pro. These results underscore the strengths and limitations of MLLMs, suggesting areas for future improvement and highlighting the importance of developing models that can effectively handle the diverse demands of multi-modal scientific reasoning.

large language model, machine learning, natural language, (23 more...)

2409.1373

Genre: Research Report > New Finding (0.85)

Industry:

Education > Educational Setting > K-12 Education (1.00)
Materials > Chemicals > Industrial Gases (0.93)
Energy > Oil & Gas > Upstream (0.67)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > Polymers & Plastics (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceDec-2-2024

Enhancing Compositional Text-to-Image Generation with Reliable Random Seeds

Li, Shuangqi, Le, Hieu, Xu, Jingyi, Salzmann, Mathieu

Text-to-image diffusion models have demonstrated remarkable capability in generating realistic images from arbitrary text prompts. However, they often produce inconsistent results for compositional prompts such as "two dogs" or "a penguin on the right of a bowl". Understanding these inconsistencies is crucial for reliable image generation. In this paper, we highlight the significant role of initial noise in these inconsistencies, where certain noise patterns are more reliable for compositional prompts than others. Our analyses reveal that different initial random seeds tend to guide the model to place objects in distinct image areas, potentially adhering to specific patterns of camera angles and image composition associated with the seed. To improve the model's compositional ability, we propose a method for mining these reliable cases, resulting in a curated training set of generated images without requiring any manual annotation. By fine-tuning text-to-image models on these generated images, we significantly enhance their compositional capabilities. For numerical composition, we observe relative increases of 29.3% and 19.5% for Stable Diffusion and PixArt-{\alpha}, respectively. Spatial composition sees even larger gains, with 60.7% for Stable Diffusion and 21.1% for PixArt-{\alpha}.

artificial intelligence, deep learning, machine learning, (17 more...)

2411.1881

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Materials > Chemicals > Industrial Gases > Liquified Gas (1.00)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (1.00)
Energy > Oil & Gas > Midstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)