AITopics

The field of text privatization often leverages the notion of $\textit{Differential Privacy}$ (DP) to provide formal guarantees in the rewriting or obfuscation of sensitive textual data. A common and nearly ubiquitous form of DP application necessitates the addition of calibrated noise to vector representations of text, either at the data- or model-level, which is governed by the privacy parameter $\varepsilon$. However, noise addition almost undoubtedly leads to considerable utility loss, thereby highlighting one major drawback of DP in NLP. In this work, we introduce a new sentence infilling privatization technique, and we use this method to explore the effect of noise in DP text rewriting. We empirically demonstrate that non-DP privatization techniques excel in utility preservation and can find an acceptable empirical privacy-utility trade-off, yet cannot outperform DP methods in empirical privacy protections. Our results highlight the significant impact of noise in current DP rewriting mechanisms, leading to a discussion of the merits and challenges of DP in NLP, as well as the opportunities that non-DP methods present.

large language model, machine learning, natural language, (21 more...)

2501.19022

Country:

North America > United States > Washington > King County > Seattle (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > New York > New York County > New York City (0.04)
(47 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Media > Music (1.00)
Leisure & Entertainment > Sports > Soccer (1.00)
Information Technology > Security & Privacy (1.00)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
(3 more...)

Roshanzamir, Mohamad, Alizadehsani, Roohallah, Zarepur, Ehsan, Mohammadifard, Noushin, Nouri, Fatemeh, Roshanzamir, Mahdi, Khosravi, Alireza, Nouhi, Fereidoon, Sarrafzadegan, Nizal

A machine learning approach for Premature Coronary Artery Disease Diagnosis according to Different Ethnicities in Iran

Premature coronary artery disease (PCAD) refers to the early onset of the disease, usually before the age of 55 for men and 65 for women. Coronary Artery Disease (CAD) develops when coronary arteries, the major blood vessels supplying the heart with blood, oxygen, and nutrients, become clogged or diseased. This is often due to many risk factors, including lifestyle and cardiometabolic ones, but few studies were done on ethnicity as one of these risk factors, especially in PCAD. In this study, we tested the rank of ethnicity among the major risk factors of PCAD, including age, gender, body mass index (BMI), visceral obesity presented as waist circumference (WC), diabetes mellitus (DM), high blood pressure (HBP), high low-density lipoprotein cholesterol (LDL-C), and smoking in a large national sample of patients with PCAD from different ethnicities. All patients who met the age criteria underwent coronary angiography to confirm CAD diagnosis. The weight of ethnicity was compared to the other eight features using feature weighting algorithms in PCAD diagnosis. In addition, we conducted an experiment where we ran predictive models (classification algorithms) to predict PCAD. We compared the performance of these models under two conditions: we trained the classification algorithms, including or excluding ethnicity. This study analyzed various factors to determine their predictive power influencing PCAD prediction. Among these factors, gender and age were the most significant predictors, with ethnicity being the third most important. The results also showed that if ethnicity is used as one of the input risk factors for classification algorithms, it can improve their efficiency. Our results show that ethnicity ranks as an influential factor in predicting PCAD. Therefore, it needs to be addressed in the PCAD diagnostic and preventive measures.

artificial intelligence, deep learning, machine learning, (18 more...)

2501.18893

Country:

North America > United States > California (0.14)
Asia > Middle East > Iran > Isfahan Province > Isfahan (0.05)
Asia > Middle East > Iran > East Azerbaijan Province > Tabriz (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Chan, Brian J, Cheng, Jui-Hung, Huang, Mao Xun, Chen, Chao-Ting, Huang, Hen-Hsen

Efficient Beam Search for Large Language Models Using Trie-Based Decoding

In Transformer-based sequence-to-sequence generation, beam search has proven effective in enhancing the quality of generated sequences compared to greedy decoding. Conventional beam search methods typically adopt either a sequential or batch-based approach. The sequential approach, while memory-efficient, requires multiple decoding passes to construct a complete search tree, leading to significantly slower inference. On the other hand, the batch-based approach enables parallel computation across beams, but at the expense of high memory consumption due to the need to maintain separate key-value (KV) caches for each beam. In this study, we introduce a novel trie (prefix-tree)-based parallel decoding method that addresses the memory inefficiency of batch-based beam search. By sharing a single KV cache among all beams that share the same prefix, the proposed method not only reduces memory consumption dramatically but also enables parallel decoding across all branches. This innovative use of a prefix tree offers an efficient alternative for beam search, achieving significant memory savings while preserving inference speed, making it particularly well-suited for memory-constrained environments or large-scale model deployments.

large language model, machine learning, natural language, (16 more...)

2502.00085

Country:

Asia > Taiwan > Taiwan Province > Taipei (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Italy (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Superhuman AI Disclosure: Impacts on Toxicity, Fairness, and Trust Vary by Expertise and Persona Attributes

Chua, Jaymari, Wang, Chen, Yao, Lina

As artificial intelligence demonstrates surpassing human performance across real-world tasks, disclosing superhuman capabilities poses challenges for fairness, accountability, and trust. To investigate how transparency impacts attitudes and perceptions, we introduce a grounded and validated set of synthetic personas reflecting diverse fairness concerns and technology acceptance levels. Then we evaluate responses in two contrasting domains: (1) a competitive player in StarCraft II, where strategy and high-skill gameplay often elicit toxic interactions, and (2) a cooperative personal-assistant in providing information. Across numerous interactions spanning persona profiles, we test non-disclosure versus explicit superhuman labelling under controlled game outcomes and usage contexts. Our findings reveal sharp domain-specific effects: in StarCraft II, explicitly labelling AI as superhuman, novice personas who learned of it reported lower toxicity and higher fairness-attributing defeat to advanced skill rather than hidden cheating-whereas expert personas found the disclosure statements irksome but still less deceptive than non-disclosure. Conversely, in the LLM as personal-assistant setting, disclosure of superhuman capabilities improved perceived trustworthiness, though it risked AI overreliance among certain persona segments. We release Dataset X-containing persona cards-including profile attributes, disclosure prompts, and detailed interaction logs, accompanied by reproducible protocols and disclaimers for adapting them to diverse tasks. Our results demonstrate that transparency is not a cure-all: while it reduces suspicion and enhances trust in cooperative contexts, it may inflame resistance or disappointment in competitive domains.

disclosure, fairness, persona, (14 more...)

2503.15514

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
(5 more...)

ScienceJan-30-2025, 19:00:00 GMT

News at a glance: Trump turmoil, New Zealand's funding overhaul, and an AI expert tripped by AI

Following through on his vows to shake up the U.S. government, President Donald Trump's new administration quickly issued a flurry of executive orders and other decisions, some with big implications for research and global health, sowing worry and confusion among many scientists. The White House this week proposed--and 2 days later rescinded--an unprecedented order to freeze huge chunks of federal spending, including research grants. The 27 January budget memo directed political appointees at every agency to decide whether the funds "conform with administrative priorities" as spelled out in a slew of executive orders Trump has issued since taking office. Despite withdrawing the memo, the White House said agencies must still comply with the executive orders, which ban support for programs that include promoting "Marxist equity, transgenderism, and Green New Deal social engineering policies." A federal judge had already temporarily halted implementation of the memo, which generated a public outcry.

executive order, funding overhaul, trump, (10 more...)

Science

Country:

North America > United States (1.00)
Oceania > New Zealand (0.40)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence (0.66)

A binary PSO based ensemble under-sampling model for rebalancing imbalanced training data

Li, Jinyan, Wu, Yaoyang, Fong, Simon, Tallón-Ballesteros, Antonio J., Yang, Xin-she, Mohammed, Sabah, Wu, Feng

Ensemble technique and under-sampling technique are both effective tools used for imbalanced dataset classification problems. In this paper, a novel ensemble method combining the advantages of both ensemble learning for biasing classifiers and a new under-sampling method is proposed. The under-sampling method is named Binary PSO instance selection; it gathers with ensemble classifiers to find the most suitable length and combination of the majority class samples to build a new dataset with minority class samples. The proposed method adopts multi-objective strategy, and contribution of this method is a notable improvement of the performances of imbalanced classification, and in the meantime guaranteeing a best integrity possible for the original dataset. We experimented the proposed method and compared its performance of processing imbalanced datasets with several other conventional basic ensemble methods. Experiment is also conducted on these imbalanced datasets using an improved version where ensemble classifiers are wrapped in the Binary PSO instance selection. According to experimental results, our proposed methods outperform single ensemble methods, state-of-the-art under-sampling methods, and also combinations of these methods with the traditional PSO instance selection algorithm.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

2502.01655

Country:

Asia > Macao (0.04)
Oceania > Australia (0.04)
North America > United States > District of Columbia > Washington (0.04)
(5 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(4 more...)

Omer, Skala Kamaran, Hassani, Hossein

Idiom Detection in Sorani Kurdish Texts

Idiom detection using Natural Language Processing (NLP) is the computerized process of recognizing figurative expressions within a text that convey meanings beyond the literal interpretation of the words. While idiom detection has seen significant progress across various languages, the Kurdish language faces a considerable research gap in this area despite the importance of idioms in tasks like machine translation and sentiment analysis. This study addresses idiom detection in Sorani Kurdish by approaching it as a text classification task using deep learning techniques. To tackle this, we developed a dataset containing 10,580 sentences embedding 101 Sorani Kurdish idioms across diverse contexts. Using this dataset, we developed and evaluated three deep learning models: KuBERT-based transformer sequence classification, a Recurrent Convolutional Neural Network (RCNN), and a BiLSTM model with an attention mechanism. The evaluations revealed that the transformer model, the fine-tuned BERT, consistently outperformed the others, achieving nearly 99% accuracy while the RCNN achieved 96.5% and the BiLSTM 80%. These results highlight the effectiveness of Transformer-based architectures in low-resource languages like Kurdish. This research provides a dataset, three optimized models, and insights into idiom detection, laying a foundation for advancing Kurdish NLP.

artificial intelligence, machine learning, natural language, (16 more...)

2501.14528

Country:

Asia > Middle East > Iraq > Kurdistan Region (0.14)
Europe > Germany > Berlin (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(15 more...)

Genre: Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Evolving Hard Maximum Cut Instances for Quantum Approximate Optimization Algorithms

Pan, Shuaiqun, Patel, Yash J., Neumann, Aneta, Neumann, Frank, Bäck, Thomas, Wang, Hao

Variational quantum algorithms, such as the Recursive Quantum Approximate Optimization Algorithm (RQAOA), have become increasingly popular, offering promising avenues for employing Noisy Intermediate-Scale Quantum devices to address challenging combinatorial optimization tasks like the maximum cut problem. In this study, we utilize an evolutionary algorithm equipped with a unique fitness function. This approach targets hard maximum cut instances within the latent space of a Graph Autoencoder, identifying those that pose significant challenges or are particularly tractable for RQAOA, in contrast to the classic Goemans and Williamson algorithm. Our findings not only delineate the distinct capabilities and limitations of each algorithm but also expand our understanding of RQAOA's operational limits. Furthermore, the diverse set of graphs we have generated serves as a crucial benchmarking asset, emphasizing the need for more advanced algorithms to tackle combinatorial optimization challenges. Additionally, our results pave the way for new avenues in graph generation research, offering exciting opportunities for future explorations.

artificial intelligence, evolutionary algorithm, machine learning, (16 more...)

2502.12012

Country:

Europe > Spain > Andalusia > Málaga Province > Málaga (0.05)
Europe > Netherlands > South Holland > Leiden (0.05)
Oceania > Australia > South Australia > Adelaide (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Memory-Efficient Fine-Tuning of Transformers via Token Selection

Simoulin, Antoine, Park, Namyong, Liu, Xiaoyi, Yang, Grey

Fine-tuning provides an effective means to specialize pre-trained models for various downstream tasks. However, fine-tuning often incurs high memory overhead, especially for large transformer-based models, such as LLMs. While existing methods may reduce certain parts of the memory required for fine-tuning, they still require caching all intermediate activations computed in the forward pass to update weights during the backward pass. In this work, we develop TokenTune, a method to reduce memory usage, specifically the memory to store intermediate activations, in the fine-tuning of transformer-based models. During the backward pass, TokenTune approximates the gradient computation by backpropagating through just a subset of input tokens. Thus, with TokenTune, only a subset of intermediate activations are cached during the forward pass. Also, TokenTune can be easily combined with existing methods like LoRA, further reducing the memory cost. We evaluate our approach on pre-trained transformer models with up to billions of parameters, considering the performance on multiple downstream tasks such as text classification and question answering in a few-shot learning setup. Overall, TokenTune achieves performance on par with full fine-tuning or representative memory-efficient fine-tuning methods, while greatly reducing the memory footprint, especially when combined with other methods with complementary memory reduction mechanisms. We hope that our approach will facilitate the fine-tuning of large transformers, in specializing them for specific domains or co-training them with other neural components from a larger system. Our code is available at https://github.com/facebookresearch/tokentune.

large language model, machine learning, natural language, (17 more...)

2501.18824

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
(15 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Baumeister, Jan, Finkbeiner, Bernd, Scheerer, Frederik, Siber, Julian, Wagenpfeil, Tobias

Stream-Based Monitoring of Algorithmic Fairness

Automatic decision and prediction systems are increasingly deployed in applications where they significantly impact the livelihood of people, such as for predicting the creditworthiness of loan applicants or the recidivism risk of defendants. These applications have given rise to a new class of algorithmic-fairness specifications that require the systems to decide and predict without bias against social groups. Verifying these specifications statically is often out of reach for realistic systems, since the systems may, e.g., employ complex learning components, and reason over a large input space. In this paper, we therefore propose stream-based monitoring as a solution for verifying the algorithmic fairness of decision and prediction systems at runtime. Concretely, we present a principled way to formalize algorithmic fairness over temporal data streams in the specification language RTLola and demonstrate the efficacy of this approach on a number of benchmarks. Besides synthetic scenarios that particularly highlight its efficiency on streams with a scaling amount of data, we notably evaluate the monitor on real-world data from the recidivism prediction tool COMPAS.

artificial intelligence, machine learning, specification, (16 more...)

2501.18331

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Florida > Broward County (0.04)
North America > United States > Colorado (0.04)
(25 more...)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.48)

Industry:

Law (0.68)
Information Technology > Security & Privacy (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)