AITopics

Data-driven approaches to philosophy have emerged as a valuable tool for studying the history of the discipline. However, most studies in this area have focused on a limited number of journals from specific regions and subfields. We expand the scope of this research by applying dynamic topic modelling techniques to explore the history of philosophy in Colombia and Latin America. Our study examines the Colombian philosophy journal Ideas y Valores, founded in 1951 and currently one of the most influential academic philosophy journals in the region. By analyzing the evolution of topics across the journal's history, we identify various trends and specific dynamics in philosophical discourse within the Colombian and Latin American context. Our findings reveal that the most prominent topics are value theory (including ethics, political philosophy, and aesthetics), epistemology, and the philosophy of science. We also trace the evolution of articles focusing on the historical and interpretive aspects of philosophical texts, and we note a notable emphasis on German philosophers such as Kant, Husserl, and Hegel on various topics throughout the journal's lifetime. Additionally, we investigate whether articles with a historical focus have decreased over time due to editorial pressures. Our analysis suggests no significant decline in such articles. Finally, we propose ideas for extending this research to other Latin American journals and suggest improvements for natural language processing workflows in non-English languages.

main area, philosophy, proportion, (15 more...)

2412.04236

Country:

South America > Colombia (0.61)
North America > Central America (0.25)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(4 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Tessera, Kale-ab Abebe, Rahman, Arrasy, Albrecht, Stefano V.

HyperMARL: Adaptive Hypernetworks for Multi-Agent RL

Balancing individual specialisation and shared behaviours is a critical challenge in multi-agent reinforcement learning (MARL). Existing methods typically focus on encouraging diversity or leveraging shared representations. Full parameter sharing (FuPS) improves sample efficiency but struggles to learn diverse behaviours when required, while no parameter sharing (NoPS) enables diversity but is computationally expensive and sample inefficient. To address these challenges, we introduce HyperMARL, a novel approach using hypernetworks to balance efficiency and specialisation. HyperMARL generates agent-specific actor and critic parameters, enabling agents to adaptively exhibit diverse or homogeneous behaviours as needed, without modifying the learning objective or requiring prior knowledge of the optimal diversity. Furthermore, HyperMARL decouples agent-specific and state-based gradients, which empirically correlates with reduced policy gradient variance, potentially offering insights into its ability to capture diverse behaviours. Across MARL benchmarks requiring homogeneous, heterogeneous, or mixed behaviours, HyperMARL consistently matches or outperforms FuPS, NoPS, and diversity-focused methods, achieving NoPS-level diversity with a shared architecture. These results highlight the potential of hypernetworks as a versatile approach to the trade-off between specialisation and shared behaviours in MARL.

agent, hypermarl, hypernetwork, (13 more...)

2412.04233

Country:

South America > Brazil > São Paulo (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

Lohr, Dominic, Berges, Marc, Chugh, Abhishek, Kohlhase, Michael, Müller, Dennis

Leveraging Large Language Models to Generate Course-specific Semantically Annotated Learning Objects

Background: Over the past few decades, the process and methodology of automated question generation (AQG) have undergone significant transformations. Recent progress in generative natural language models has opened up new potential in the generation of educational content. Objectives: This paper explores the potential of large language models (LLMs) for generating computer science questions that are sufficiently annotated for automatic learner model updates, are fully situated in the context of a particular course, and address the cognitive dimension understand. Methods: Unlike previous attempts that might use basic methods like ChatGPT, our approach involves more targeted strategies such as retrieval-augmented generation (RAG) to produce contextually relevant and pedagogically meaningful learning objects. Results and Conclusions: Our results show that generating structural, semantic annotations works well. However, this success was not reflected in the case of relational annotations. The quality of the generated questions often did not meet educational standards, highlighting that although LLMs can contribute to the pool of learning materials, their current level of performance requires significant human intervention to refine and validate the generated content.

annotation, dimension, student, (16 more...)

2412.04185

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
Europe > Finland > Southwest Finland > Turku (0.04)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Setting (0.68)
Education > Assessment & Standards (0.49)
Education > Curriculum (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Text Change Detection in Multilingual Documents Using Image Comparison

Park, Doyoung, Yarram, Naresh Reddy, Kim, Sunjin, Kim, Minkyu, Cho, Seongho, Lee, Taehee

Document comparison typically relies on optical character recognition (OCR) as its core technology. However, OCR requires the selection of appropriate language models for each document and the performance of multilingual or hybrid models remains limited. To overcome these challenges, we propose text change detection (TCD) using an image comparison model tailored for multilingual documents. Unlike OCR-based approaches, our method employs word-level text image-to-image comparison to detect changes. Our model generates bidirectional change segmentation maps between the source and target documents. To enhance performance without requiring explicit text alignment or scaling preprocessing, we employ correlations among multi-scale attention features. We also construct a benchmark dataset comprising actual printed and scanned word pairs in various languages to evaluate our model. We validate our approach using our benchmark dataset and public benchmarks Distorted Document Images and the LRDE Document Binarization Dataset. We compare our model against state-of-the-art semantic segmentation and change detection models, as well as to conventional OCR-based models.

dataset, detection, recognition, (15 more...)

2412.04137

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

MTMT: Consolidating Multiple Thinking Modes to Form a Thought Tree for Strengthening LLM

Li, Changcheng, Wang, Xiangyu, Chen, Qiuju, Zhou, Xiren, Chen, Huanhuan

Large language models (LLMs) have shown limitations in tasks requiring complex logical reasoning and multi-step problem-solving. To address these challenges, researchers have employed carefully designed prompts and flowcharts, simulating human cognitive processes to enhance LLM performance, such as the Chain of Thought approach. In this paper, we introduce MTMT (Multi-thinking Modes Tree), a novel method that interacts with LLMs to construct a thought tree, simulating various advanced cognitive processes, including but not limited to association, counterfactual thinking, task decomposition, and comparison. By breaking down the original complex task into simpler sub-questions, MTMT facilitates easier problem-solving for LLMs, enabling more effective utilization of the latent knowledge within LLMs. We evaluate the performance of MTMT under different parameter configurations, using GPT-4o mini as the base model. Our results demonstrate that integrating multiple modes of thinking significantly enhances the ability of LLMs to handle complex tasks.

dataset, information, node, (16 more...)

2412.03987

Country:

South America > Venezuela (0.04)
Asia > China (0.04)

Genre:

Research Report > New Finding (0.54)
Research Report > Promising Solution (0.34)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Karabulut, Erkan, Groth, Paul, Degeler, Victoria

Learning Semantic Association Rules from Internet of Things Data

Association Rule Mining (ARM) is the task of discovering commonalities in data in the form of logical implications. ARM is used in the Internet of Things (IoT) for different tasks including monitoring and decision-making. However, existing methods give limited consideration to IoT-specific requirements such as heterogeneity and volume. Furthermore, they do not utilize important static domain-specific description data about IoT systems, which is increasingly represented as knowledge graphs. In this paper, we propose a novel ARM pipeline for IoT data that utilizes both dynamic sensor data and static IoT system metadata. Furthermore, we propose an Autoencoder-based Neurosymbolic ARM method (Aerial) as part of the pipeline to address the high volume of IoT data and reduce the total number of rules that are resource-intensive to process. Aerial learns a neural representation of a given data and extracts association rules from this representation by exploiting the reconstruction (decoding) mechanism of an autoencoder. Extensive evaluations on 3 IoT datasets from 2 domains show that ARM on both static and dynamic IoT data results in more generically applicable rules while Aerial can learn a more concise set of high-quality association rules than the state-of-the-art with full coverage over the datasets.

algorithm, association rule, dataset, (13 more...)

2412.03417

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada > Ontario > Toronto (0.04)
North America > Canada > Ontario > Kingston (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Smart Houses & Appliances (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)

HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation

Chen, Yuhan, Lv, Ang, Luan, Jian, Wang, Bin, Liu, Wei

Many positional encodings (PEs) are designed to exhibit long-term decay, based on an entrenched and long-standing inductive opinion: tokens farther away from the current position carry less relevant information. We argue that long-term decay is outdated in the era of LLMs, as LLMs are now applied to tasks demanding precise retrieval of in-context information from arbitrary positions. Firstly, we present empirical analyses on various PEs, demonstrating that models inherently learn attention with only a local-decay pattern while forming a U-shape pattern globally, contradicting the principle of long-term decay. Furthermore, we conduct a detailed analysis of rotary position encoding (RoPE, a prevalent relative positional encoding in LLMs), and found that the U-shape attention is caused by some learned components, which are also the key factor limiting RoPE's expressiveness and extrapolation.Inspired by these insights, we propose High-frequency rotary Position Encoding (HoPE). HoPE replaces the specific components in RoPE with position-independent ones, retaining only high-frequency signals, which also breaks the principle of long-term decay in theory. HoPE achieves two major advantages: (1) Without constraints imposed by long-term decay, contradictory factors that limit spontaneous attention optimization and model extrapolation performance are removed. (2) Components representing positions and semantics are are optimized. These enhances model's context awareness and extrapolation, as validated by extensive experiments.

attention pattern, extrapolation, zhang, (16 more...)

2410.21216

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Middle East > Saudi Arabia > Asir Province > Abha (0.04)
Asia > China (0.04)
Asia > British Indian Ocean Territory > Diego Garcia (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts

Ge, Suyu, Lin, Xihui, Zhang, Yunan, Han, Jiawei, Peng, Hao

Training and serving long-context large language models (LLMs) incurs substantial overhead. To address this, two critical steps are often required: a pretrained LLM typically undergoes a separate stage for context length extension by training on long-context data, followed by architectural modifications to reduce the overhead of KV cache during serving. This paper argues that integrating length extension with a GPU-friendly KV cache reduction architecture not only reduces training overhead during length extension, but also achieves better long-context performance. This leads to our proposed LongGen, which finetunes a pretrained LLM into an efficient architecture during length extension. LongGen builds on three key insights: (1) Sparse attention patterns, such as window attention (attending to recent tokens), attention sink (initial ones), and blockwise sparse attention (strided token blocks) are well-suited for building efficient long-context models, primarily due to their GPU-friendly memory access patterns, enabling efficiency gains not just theoretically but in practice as well. (2) It is essential for the model to have direct access to all tokens. A hybrid architecture with 1/3 full attention layers and 2/3 efficient ones achieves a balanced trade-off between efficiency and long-context performance. (3) Lightweight training on 5B long-context data is sufficient to extend the hybrid model's context length from 4K to 128K. We evaluate LongGen on both Llama-2 7B and Llama-2 70B, demonstrating its effectiveness across different scales. During training with 128K-long contexts, LongGen achieves 1.55x training speedup and reduces wall-clock time by 36%, compared to a full-attention baseline. During inference, LongGen reduces KV cache memory by 62%, achieving 1.67x prefilling speedup and 1.41x decoding speedup.

architecture, arxiv preprint arxiv, language model, (15 more...)

2410.01485

Country:

Europe > Austria > Vienna (0.15)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Suwanwimolkul, Suwichaya, Tongamrak, Natanon, Thungka, Nuttamon, Hoonchareon, Naebboon, Songsiri, Jitkomut

Developing a Thailand solar irradiance map using Himawari-8 satellite imageries and deep learning models

Thailand has targeted to achieve carbon neutrality by 2050 when the power grid will need to accommodate 50% share of renewable electricity generation capacity; see [Ene21]. The most recent draft of Power Development Plan 2024 (PDP2024) for 2024 - 2037 from [Ene24] proposes to add a new solar generation capacity of approximately 24,400 MWp (more than 4 times the amount issued in the previous Alternative Energy Development Plan 2015-2036 (AEDP2015) at 6,000 MWp, shown in [Dep15, p.9]. This amount does not yet include behind-the-meter, self-generation solar installed capacities of the prosumers, which is expected to increase at an accelerating rate. Solar integration into the power grid with such a sharprising amount will pose technical challenges to the operation and control of the transmission and distribution networks, carried out by the transmission system operator (TSO) and distribution system operator (DSO), as presented in [OB16]. Hence, TSO in Thailand will need an effective means to estimate the solar power generation across the entire transmission network, on an hourly basis, or even finer time resolution, to provide economic hour-to-hour generation dispatch for load following the total net load of the transmission, and to prepare sufficient system flexibility (i.e., ramp-rate capability of the thermal and hydropower plants, or energy storage systems) to cope with the net load fluctuation due to solar generation intermittency for maintaining system frequency stability, concurrently, in its operation. For DSO, a significant amount of reverse power flow when self-generation from solar exceeds self-consumption can lead to technical concerns of voltage regulation and equipment overloading problems. The near real-time estimation of solar generation in each distribution area will enable DSO to activate proper network switching or reconfiguring to mitigate such fundamental concerns to ensure its reliable operation.

information, irradiance, thailand, (14 more...)

2409.1632

Country:

North America > United States (0.67)
Oceania > Australia (0.28)
Asia > Middle East > UAE (0.14)
(42 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Energy > Renewable > Solar (1.00)
Energy > Power Industry (1.00)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.50)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Shin, Changho, Cooper, John, Sala, Frederic

Weak-to-Strong Generalization Through the Data-Centric Lens

arXiv.org Machine LearningDec-5-2024

The weak-to-strong generalization phenomenon is the driver for important machine learning applications including highly data-efficient learning and, most recently, performing superalignment. While decades of research have resulted in numerous algorithms that produce strong empirical performance, understanding what aspects of data enable weak-to-strong generalization has been understudied. We propose a simple data-centric mechanism that characterizes weak-to-strong generalization: the overlap density. Intuitively, generalization tracks the number of points that contain overlaps, i.e., both easy patterns (learnable by a weak model) and challenging patterns (only learnable by a stronger model), as with such points, weak predictions can be used to learn challenging patterns by stronger models. We provide a practical overlap detection algorithm to find such points in datasets and leverage them to learn, among multiple sources of data, which to query when seeking to maximize overlap density and thereby enhance weak-to-strong generalization. We present a theoretical result showing that the generalization benefit is a function of the overlap density and a regret bound for our data selection algorithm. Empirically, we validate the mechanism and the overlap detection algorithm on a wide array of settings.

hard only, overlap, weak-to-strong generalization, (15 more...)

arXiv.org Machine Learning

2412.03881

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)