AITopics

Assigning a subset of labels from a fixed pool of labels to a given input text is a text classification problem with many real-world applications, such as in recommender systems. Two separate research streams address this issue. Hierarchical Text Classification (HTC) focuses on datasets with smaller label pools of hundreds of entries, accompanied by a semantic label hierarchy. In contrast, eXtreme Multi-Label Text Classification (XML) considers very large label pools with up to millions of entries, in which the labels are not arranged in any particular manner. However, in XML, a common approach is to construct an artificial hierarchy without any semantic information before or during the training process. Here, we investigate how state-of-the-art models from one domain perform when trained and tested on datasets from the other domain. The HBGL and HGLCR models from the HTC domain are trained and tested on the datasets Wiki10-31K, AmazonCat-13K, and Amazon-670K from the XML domain. On the other side, the XML models CascadeXML and XR-Transformer are trained and tested on the datasets Web of Science, The New York Times Annotated Corpus, and RCV1-V2 from the HTC domain. HTC models, on the other hand, are not equipped to handle the size of XML datasets and achieve poor transfer results. The code and numerous files that are needed to reproduce our results can be obtained from https://github.com/FloHauss/XMC_HTC

classification, dataset, hierarchy, (13 more...)

2411.13687

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
(14 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
(2 more...)

Improved Multi-Task Brain Tumour Segmentation with Synthetic Data Augmentation

Ferreira, André, Jesus, Tiago, Puladi, Behrus, Kleesiek, Jens, Alves, Victor, Egger, Jan

This paper presents the winning solution of task 1 and the third-placed solution of task 3 of the BraTS challenge. The use of automated tools in clinical practice has increased due to the development of more and more sophisticated and reliable algorithms. However, achieving clinical standards and developing tools for real-life scenarios is a major challenge. To this end, BraTS has organised tasks to find the most advanced solutions for specific purposes. In this paper, we propose the use of synthetic data to train state-of-the-art frameworks in order to improve the segmentation of adult gliomas in a post-treatment scenario, and the segmentation of meningioma for radiotherapy planning. Our results suggest that the use of synthetic data leads to more robust algorithms, although the synthetic data generation pipeline is not directly suited to the meningioma task.

dataset, segmentation, tumour, (12 more...)

2411.04632

Country:

Europe > Portugal > Braga > Braga (0.05)
Europe > Austria > Styria > Graz (0.05)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
(4 more...)

Genre: Research Report > New Finding (0.54)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Nuclear Medicine (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Brain Tumour Removing and Missing Modality Generation using 3D WDM

Ferreira, André, Luijten, Gijs, Puladi, Behrus, Kleesiek, Jens, Alves, Victor, Egger, Jan

This paper presents the second-placed solution for task 8 and the participation solution for task 7 of BraTS 2024. The adoption of automated brain analysis algorithms to support clinical practice is increasing. However, many of these algorithms struggle with the presence of brain lesions or the absence of certain MRI modalities. The alterations in the brain's morphology leads to high variability and thus poor performance of predictive models that were trained only on healthy brains. The lack of information that is usually provided by some of the missing MRI modalities also reduces the reliability of the prediction models trained with all modalities. In order to improve the performance of these models, we propose the use of conditional 3D wavelet diffusion models. The wavelet transform enabled full-resolution image training and prediction on a GPU with 48 GB VRAM, without patching or downsampling, preserving all information for prediction. The code for these tasks is available at https://github.com/ShadowTwin41/BraTS_2023_2024_solutions.

modality, removing and missing modality generation, synthesis, (11 more...)

2411.0463

Country:

Europe > Austria > Styria > Graz (0.05)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
South America > Peru > Lima Department > Lima Province > Lima (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition

Kim, Jiyeon, Lee, Hyunji, Cho, Hyowon, Jang, Joel, Hwang, Hyeonbin, Won, Seungpil, Ahn, Youbin, Lee, Dohaeng, Seo, Minjoon

In this work, we investigate how a model's tendency to broadly integrate its parametric knowledge evolves throughout pretraining, and how this behavior affects overall performance, particularly in terms of knowledge acquisition and forgetting. We introduce the concept of knowledge entropy, which quantifies the range of memory sources the model engages with; high knowledge entropy indicates that the model utilizes a wide range of memory sources, while low knowledge entropy suggests reliance on specific sources with greater certainty. Our analysis reveals a consistent decline in knowledge entropy as pretraining advances. We also find that the decline is closely associated with a reduction in the model's ability to acquire and retain knowledge, leading us to conclude that diminishing knowledge entropy (smaller number of active memory sources) impairs the model's knowledge acquisition and retention capabilities. We find further support for this by demonstrating that increasing the activity of inactive memory sources enhances the model's capacity for knowledge acquisition and retention.

knowledge, knowledge acquisition, knowledge entropy, (13 more...)

2410.0138

Country:

North America > Dominican Republic (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

CRISP: Object Pose and Shape Estimation with Test-Time Adaptation

Shi, Jingnan, Talak, Rajat, Zhang, Harry, Jin, David, Carlone, Luca

We consider the problem of estimating object pose and shape from an RGB-D image. Our first contribution is to introduce CRISP, a category-agnostic object pose and shape estimation pipeline. The pipeline implements an encoder-decoder model for shape estimation. It uses FiLM-conditioning for implicit shape reconstruction and a DPT-based network for estimating pose-normalized points for pose estimation. As a second contribution, we propose an optimization-based pose and shape corrector that can correct estimation errors caused by a domain gap. Observing that the shape decoder is well behaved in the convex hull of known shapes, we approximate the shape decoder with an active shape model, and show that this reduces the shape correction problem to a constrained linear least squares problem, which can be solved efficiently by an interior point algorithm. Third, we introduce a self-training pipeline to perform self-supervised domain adaptation of CRISP. The self-training is based on a correct-and-certify approach, which leverages the corrector to generate pseudo-labels at test time, and uses them to self-train CRISP. We demonstrate CRISP (and the self-training) on YCBV, SPE3R, and NOCS datasets. CRISP shows high performance on all the datasets. Moreover, our self-training is capable of bridging a large domain gap. Finally, CRISP also shows an ability to generalize to unseen objects. Code and pre-trained models will be available on https://web.mit.edu/sparklab/research/crisp_object_pose_shape/.

artificial intelligence, estimation, machine learning, (18 more...)

2412.01052

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.24)
South America > Brazil (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Qharabagh, Muhammad Fetrat, Ghofrani, Mohammadreza, Fountoulakis, Kimon

LVLM-COUNT: Enhancing the Counting Ability of Large Vision-Language Models

Counting is a fundamental skill for various visual tasks in real-life applications, requiring both object recognition and robust counting capabilities. Despite their advanced visual perception, large vision-language models (LVLMs) struggle with counting tasks, especially when the number of objects exceeds those commonly encountered during training. We enhance LVLMs' counting abilities using a divide-and-conquer approach, breaking counting problems into sub-counting tasks. Unlike prior methods, which do not generalize well to counting datasets on which they have not been trained, our method performs well on new datasets without any additional training or fine-tuning. We demonstrate that our approach enhances counting capabilities across various datasets and benchmarks.

large language model, lvlm-count, machine learning, (20 more...)

2412.00686

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Antarctica (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

HT-HEDL: High-Throughput Hypothesis Evaluation in Description Logic

Algahtani, Eyad

We present High-Throughput Hypothesis Evaluation in Description Logic (HT-HEDL). HT-HEDL is a high-performance hypothesis evaluation engine that accelerates hypothesis evaluation computations for inductive logic programming (ILP) learners using description logic (DL) for their knowledge representation; in particular, HT-HEDL targets accelerating computations for the $\mathcal{ALCQI}^{\mathcal{(D)}}$ DL language. HT-HEDL aggregates the computing power of multi-core CPUs with multi-GPUs to improve hypothesis computations at two levels: 1) the evaluation of a single hypothesis and 2) the evaluation of multiple hypotheses (i.e., batch of hypotheses). In the first level, HT-HEDL uses a single GPU or a vectorized multi-threaded CPU to evaluate a single hypothesis. In vectorized multi-threaded CPU evaluation, classical (scalar) CPU multi-threading is combined with CPU's extended vector instructions set to extract more CPU-based performance. The experimental results revealed that HT-HEDL increased performance using CPU-based evaluation (on a single hypothesis): from 20.4 folds using classical multi-threading to $\sim85$ folds using vectorized multi-threading. In the GPU-based evaluation, HT-HEDL achieved speedups of up to $\sim38$ folds for single hypothesis evaluation using a single GPU. To accelerate the evaluation of multiple hypotheses, HT-HEDL combines, in parallel, GPUs with multi-core CPUs to increase evaluation throughput (number of evaluated hypotheses per second). The experimental results revealed that HT-HEDL increased evaluation throughput by up to 29.3 folds using two GPUs and up to $\sim44$ folds using two GPUs combined with a CPU's vectorized multi-threaded evaluation.

artificial intelligence, description logic, logic & formal reasoning, (21 more...)

2412.00802

Country:

Asia > Middle East > Saudi Arabia (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > Orange County > Newport Beach (0.04)
(7 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Description Logic (1.00)

The Advancement of Personalized Learning Potentially Accelerated by Generative AI

Wei, Yuang, Jiang, Yuan-Hao, Liu, Jiayi, Qi, Changyong, Jia, Rui

The rapid development of Generative AI (GAI) has sparked revolutionary changes across various aspects of education. Personalized learning, a focal point and challenge in educational research, has also been influenced by the development of GAI. To explore GAI's extensive impact on personalized learning, this study investigates its potential to enhance various facets of personalized learning through a thorough analysis of existing research. The research comprehensively examines GAI's influence on personalized learning by analyzing its application across different methodologies and contexts, including learning strategies, paths, materials, environments, and specific analyses within the teaching and learning processes. Through this in-depth investigation, we find that GAI demonstrates exceptional capabilities in providing adaptive learning experiences tailored to individual preferences and needs. Utilizing different forms of GAI across various subjects yields superior learning outcomes. The article concludes by summarizing scenarios where GAI is applicable in educational processes and discussing strategies for leveraging GAI to enhance personalized learning, aiming to guide educators and learners in effectively utilizing GAI to achieve superior learning objectives.

large language model, machine learning, natural language, (15 more...)

2412.00691

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > Middle East > Jordan (0.04)
South America > Suriname > Marowijne District > Albina (0.04)
(4 more...)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
Research Report > New Finding (0.93)
Research Report > Promising Solution (0.68)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.86)

Fairstein, Roy, Vilenchik, Dan, Gal, Kobi

Learning Aggregation Rules in Participatory Budgeting: A Data-Driven Approach

Participatory Budgeting (PB) offers a democratic process for communities to allocate public funds across various projects through voting. In practice, PB organizers face challenges in selecting aggregation rules either because they are not familiar with the literature and the exact details of every existing rule or because no existing rule echoes their expectations. This paper presents a novel data-driven approach utilizing machine learning to address this challenge. By training neural networks on PB instances, our approach learns aggregation rules that balance social welfare, representation, and other societal beneficial goals. It is able to generalize from small-scale synthetic PB examples to large, real-world PB instances. It is able to learn existing aggregation rules but also generate new rules that adapt to diverse objectives, providing a more nuanced, compromise-driven solution for PB processes. The effectiveness of our approach is demonstrated through extensive experiments with synthetic and real-world PB data, and can expand the use and deployment of PB solutions.

aggregation rule, artificial intelligence, machine learning, (13 more...)

2412.01864

Country:

South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
North America > United States > New York (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.88)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Circuit Complexity Bounds for RoPE-based Transformer Architecture

Chen, Bo, Li, Xiaoyu, Liang, Yingyu, Long, Jiangxuan, Shi, Zhenmei, Song, Zhao

Characterizing the express power of the Transformer architecture is critical to understanding its capacity limits and scaling law. Recent works provide the circuit complexity bounds to Transformer-like architecture. On the other hand, Rotary Position Embedding ($\mathsf{RoPE}$) has emerged as a crucial technique in modern large language models, offering superior performance in capturing positional information compared to traditional position embeddings, which shows great potential in application prospects, particularly for the long context scenario. Empirical evidence also suggests that $\mathsf{RoPE}$-based Transformer architectures demonstrate greater generalization capabilities compared to conventional Transformer models. In this work, we establish a circuit complexity bound for Transformers with $\mathsf{RoPE}$ attention. Our key contribution is that we show that unless $\mathsf{TC}^0 = \mathsf{NC}^1$, a $\mathsf{RoPE}$-based Transformer with $\mathrm{poly}(n)$-precision, $O(1)$ layers, hidden dimension $d \leq O(n)$ cannot solve the Arithmetic formula evaluation problem or the Boolean formula value problem. This result significantly demonstrates the fundamental limitation of the expressivity of the $\mathsf{RoPE}$-based Transformer architecture, although it achieves giant empirical success. Our theoretical result not only establishes the complexity bound but also may instruct further work on the $\mathsf{RoPE}$-based Transformer.

large language model, machine learning, natural language, (18 more...)

2411.07602

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > Virginia (0.04)
(5 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)