AITopics

The rapid advancement of large language models (LLMs) has opened new boundaries in the extraction and synthesis of medical knowledge, particularly within evidence synthesis. This paper reviews the state-of-the-art applications of LLMs in the biomedical domain, exploring their effectiveness in automating complex tasks such as evidence synthesis and data extraction from a biomedical corpus of documents. While LLMs demonstrate remarkable potential, significant challenges remain, including issues related to hallucinations, contextual understanding, and the ability to generalize across diverse medical tasks. We highlight critical gaps in the current research literature, particularly the need for unified benchmarks to standardize evaluations and ensure reliability in real-world applications. In addition, we propose directions for future research, emphasizing the integration of state-of-the-art techniques such as retrieval-augmented generation (RAG) to enhance LLM performance in evidence synthesis. By addressing these challenges and utilizing the strengths of LLMs, we aim to improve access to medical literature and facilitate meaningful discoveries in healthcare.

large language model, machine learning, natural language, (19 more...)

2412.03531

Country: South America > Brazil > São Paulo (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.71)
Health & Medicine > Therapeutic Area > Immunology (0.71)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Gutiérrez-Zaballa, Jon, Basterretxea, Koldo, Echanobe, Javier

Evaluating Single Event Upsets in Deep Neural Networks for Semantic Segmentation: an embedded system perspective

As the deployment of artifical intelligence (AI) algorithms at edge devices becomes increasingly prevalent, enhancing the robustness and reliability of autonomous AI-based perception and decision systems is becoming as relevant as precision and performance, especially in applications areas considered safety-critical such as autonomous driving and aerospace. This paper delves into the robustness assessment in embedded Deep Neural Networks (DNNs), particularly focusing on the impact of parameter perturbations produced by single event upsets (SEUs) on convolutional neural networks (CNN) for image semantic segmentation. By scrutinizing the layer-by-layer and bit-by-bit sensitivity of various encoder-decoder models to soft errors, this study thoroughly investigates the vulnerability of segmentation DNNs to SEUs and evaluates the consequences of techniques like model pruning and parameter quantization on the robustness of compressed models aimed at embedded implementations. The findings offer valuable insights into the mechanisms underlying SEU-induced failures that allow for evaluating the robustness of DNNs once trained in advance. Moreover, based on the collected data, we propose a set of practical lightweight error mitigation techniques with no memory or computational cost suitable for resource-constrained deployments. The code used to perform the fault injection (FI) campaign is available at https://github.com/jonGuti13/TensorFI2 , while the code to implement proposed techniques is available at https://github.com/jonGuti13/parameterProtection .

artificial intelligence, deep learning, machine learning, (20 more...)

doi: 10.1016/j.sysarc.2024.103242

2412.0363

Country:

Europe > Spain > Basque Country (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada > British Columbia (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Ocheja, Patrick, Flanagan, Brendan, Dai, Yiling, Ogata, Hiroaki

How Good is ChatGPT in Giving Adaptive Guidance Using Knowledge Graphs in E-Learning Environments?

E-learning environments are increasingly harnessing large language models (LLMs) like GPT-3.5 and GPT-4 for tailored educational support. This study introduces an approach that integrates dynamic knowledge graphs with LLMs to offer nuanced student assistance. By evaluating past and ongoing student interactions, the system identifies and appends the most salient learning context to prompts directed at the LLM. Central to this method is the knowledge graph's role in assessing a student's comprehension of topic prerequisites. Depending on the categorized understanding (good, average, or poor), the LLM adjusts its guidance, offering advanced assistance, foundational reviews, or in-depth prerequisite explanations, respectively. Preliminary findings suggest students could benefit from this tiered support, achieving enhanced comprehension and improved task outcomes. However, several issues related to potential errors arising from LLMs were identified, which can potentially mislead students. This highlights the need for human intervention to mitigate these risks. This research aims to advance AI-driven personalized learning while acknowledging the limitations and potential pitfalls, thus guiding future research in technology and data-driven education.

large language model, machine learning, natural language, (20 more...)

2412.03856

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
South America > Uruguay > Maldonado > Maldonado (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(3 more...)

Genre:

Instructional Material (1.00)
Research Report > New Finding (0.88)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Blanco, Víctor, Japón, Alberto, Puerto, Justo, Zhang, Peter

Optimal probabilistic feature shifts for reclassification in tree ensembles

arXiv.org Machine LearningDec-4-2024

In this paper we provide a novel mathematical optimization based methodology to perturb the features of a given observation to be re-classified, by a tree ensemble classification rule, to a certain desired class. The method is based on these facts: the most viable changes for an observation to reach the desired class do not always coincide with the closest distance point (in the feature space) of the target class; individuals put effort on a few number of features to reach the desired class; and each individual is endowed with a probability to change each of its features to a given value, which determines the overall probability of changing to the target class. Putting all together, we provide different methods to find the features where the individuals must exert effort to maximize the probability to reach the target class. Our method also allows us to rank the most important features in the tree-ensemble. The proposed methodology is tested on a real dataset, validating the proposal.

classification, feature shift, probability, (15 more...)

arXiv.org Machine Learning

2412.03722

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
South America > Peru (0.04)
South America > Colombia (0.04)
(4 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Consumer Health (1.00)
Transportation (0.93)
Government (0.68)
Education > Health & Safety > School Nutrition (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.46)

Mackraz, Natalie, Sivakumar, Nivedha, Khorshidi, Samira, Patel, Krishna, Theobald, Barry-John, Zappella, Luca, Apostoloff, Nicholas

Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models

Large language models (LLMs) are increasingly being adapted to achieve task-specificity for deployment in real-world decision systems. Several previous works have investigated the bias transfer hypothesis (BTH) by studying the effect of the fine-tuning adaptation strategy on model fairness to find that fairness in pre-trained masked language models have limited effect on the fairness of models when adapted using fine-tuning. In this work, we expand the study of BTH to causal models under prompt adaptations, as prompting is an accessible, and compute-efficient way to deploy models in real-world systems. In contrast to previous works, we establish that intrinsic biases in pre-trained Mistral, Falcon and Llama models are strongly correlated (rho >= 0.94) with biases when the same models are zero- and few-shot prompted, using a pronoun co-reference resolution task. Further, we find that bias transfer remains strongly correlated even when LLMs are specifically prompted to exhibit fair or biased behavior (rho >= 0.92), and few-shot length and stereotypical composition are varied (rho >= 0.97). Our findings highlight the importance of ensuring fairness in pre-trained LLMs, especially when they are later used to perform downstream tasks via prompt adaptation.

bias transfer, fairness, right option, (12 more...)

2412.03537

Country:

North America > United States (0.14)
South America > Colombia > Meta Department > Villavicencio (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Government (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Zheng, Ge, Brintrup, Alexandra

Enhancing Supply Chain Visibility with Generative AI: An Exploratory Case Study on Relationship Prediction in Knowledge Graphs

A key stumbling block in effective supply chain risk management for companies and policymakers is a lack of visibility on interdependent supply network relationships. Relationship prediction, also called link prediction is an emergent area of supply chain surveillance research that aims to increase the visibility of supply chains using data-driven techniques. Existing methods have been successful for predicting relationships but struggle to extract the context in which these relationships are embedded - such as the products being supplied or locations they are supplied from. Lack of context prevents practitioners from distinguishing transactional relations from established supply chain relations, hindering accurate estimations of risk. In this work, we develop a new Generative Artificial Intelligence (Gen AI) enhanced machine learning framework that leverages pre-trained language models as embedding models combined with machine learning models to predict supply chain relationships within knowledge graphs. By integrating Generative AI techniques, our approach captures the nuanced semantic relationships between entities, thereby improving supply chain visibility and facilitating more precise risk management. Using data from a real case study, we show that GenAI-enhanced link prediction surpasses all benchmarks, and demonstrate how GenAI models can be explored and effectively used in supply chain risk management.

knowledge graph, prediction, quintuplet, (12 more...)

2412.0339

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
South America > Brazil (0.04)
(26 more...)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Automobiles & Trucks > Manufacturer (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Singh, Shivalika, Romanou, Angelika, Fourrier, Clémentine, Adelani, David I., Ngui, Jian Gang, Vila-Suero, Daniel, Limkonchotiwat, Peerat, Marchisio, Kelly, Leong, Wei Qi, Susanto, Yosephine, Ng, Raymond, Longpre, Shayne, Ko, Wei-Yin, Smith, Madeline, Bosselut, Antoine, Oh, Alice, Martins, Andre F. T., Choshen, Leshem, Ippolito, Daphne, Ferrante, Enzo, Fadaee, Marzieh, Ermis, Beyza, Hooker, Sara

Cultural biases in multilingual datasets pose significant challenges for their effectiveness as global benchmarks. These biases stem not only from language but also from the cultural knowledge required to interpret questions, reducing the practical utility of translated datasets like MMLU. Furthermore, translation often introduces artifacts that can distort the meaning or clarity of questions in the target language. A common practice in multilingual evaluation is to rely on machine-translated evaluation sets, but simply translating a dataset is insufficient to address these challenges. In this work, we trace the impact of both of these issues on multilingual evaluations and ensuing model performances. Our large-scale evaluation of state-of-the-art open and proprietary models illustrates that progress on MMLU depends heavily on learning Western-centric concepts, with 28% of all questions requiring culturally sensitive knowledge. Moreover, for questions requiring geographic knowledge, an astounding 84.9% focus on either North American or European regions. Rankings of model evaluations change depending on whether they are evaluated on the full portion or the subset of questions annotated as culturally sensitive, showing the distortion to model rankings when blindly relying on translated MMLU. We release Global-MMLU, an improved MMLU with evaluation coverage across 42 languages -- with improved overall quality by engaging with compensated professional and community annotators to verify translation quality while also rigorously evaluating cultural biases present in the original dataset. This comprehensive Global-MMLU set also includes designated subsets labeled as culturally sensitive and culturally agnostic to allow for more holistic, complete evaluation.

dataset, evaluation, knowledge, (14 more...)

2412.03304

Country:

Europe > United Kingdom (0.14)
North America > Central America (0.14)
Asia > Middle East > Israel (0.04)
(54 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Law (1.00)
Government (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Shao, Minghao, Basit, Abdul, Karri, Ramesh, Shafique, Muhammad

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

Large Language Models (LLMs) represent a class of deep learning models adept at understanding natural language and generating coherent responses to various prompts or queries. These models far exceed the complexity of conventional neural networks, often encompassing dozens of neural network layers and containing billions to trillions of parameters. They are typically trained on vast datasets, utilizing architectures based on transformer blocks. Present-day LLMs are multi-functional, capable of performing a range of tasks from text generation and language translation to question answering, as well as code generation and analysis. An advanced subset of these models, known as Multimodal Large Language Models (MLLMs), extends LLM capabilities to process and interpret multiple data modalities, including images, audio, and video. This enhancement empowers MLLMs with capabilities like video editing, image comprehension, and captioning for visual content. This survey provides a comprehensive overview of the recent advancements in LLMs. We begin by tracing the evolution of LLMs and subsequently delve into the advent and nuances of MLLMs. We analyze emerging state-of-the-art MLLMs, exploring their technical features, strengths, and limitations. Additionally, we present a comparative analysis of these models and discuss their challenges, potential limitations, and prospects for future development.

arxiv preprint arxiv, language model, wang, (12 more...)

doi: 10.1109/ACCESS.2024.3482107

2412.0322

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(8 more...)

Genre:

Research Report > Promising Solution (0.92)
Overview > Innovation (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Large Language Models show both individual and collective creativity comparable to humans

Sun, Luning, Yuan, Yuzhuo, Yao, Yuan, Li, Yanyan, Zhang, Hao, Xie, Xing, Wang, Xiting, Luo, Fang, Stillwell, David

Artificial intelligence has, so far, largely automated routine tasks, but what does it mean for the future of work if Large Language Models (LLMs) show creativity comparable to humans? To measure the creativity of LLMs holistically, the current study uses 13 creative tasks spanning three domains. We benchmark the LLMs against individual humans, and also take a novel approach by comparing them to the collective creativity of groups of humans. We find that the best LLMs (Claude and GPT-4) rank in the 52nd percentile against humans, and overall LLMs excel in divergent thinking and problem solving but lag in creative writing. When questioned 10 times, an LLM's collective creativity is equivalent to 8-10 humans. When more responses are requested, two additional responses of LLMs equal one extra human. Ultimately, LLMs, when optimally applied, may compete with a small group of humans in the future of work.

creativity, llm, task 1, (15 more...)

2412.03151

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > China > Beijing > Beijing (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.45)

Industry: Banking & Finance (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Preference-based opponent shaping in differentiable games

Qiao, Xinyu, Hu, Yudong, Han, Congying, Wu, Weiyan, Guo, Tiande

Multi-agent reinforcement learning (MARL), as a theoretical framework for modeling agent behavior in complex game environments, has become a significant area of research [42, 37]. Unlike traditional game theory, MARL typically allows agents to learn strategies through repeated interactions to achieve equilibrium [34]. By relaxing the assumptions of agent rationality and independence, MARL can learn strategies efficiently with arbitrary environments and opponents [10, 20, 17]. Current applications of MARL in game environments are primarily focused on zero-sum games (fully competitive) [10, 41] and fully cooperative games [12, 38], since the behavioral preferences of opponent agents in these environments are relatively easy to predict. Nevertheless, the environments in practical applications, e.g., economic markets, robotics and distributed control, may have multiple equilibrium [16, 40], and opponent agents may not exhibit clear preferences for different strategies, thus agents need to learn strategies in general-sum games [8, 7]. The Prisoner's dilemma [3, 14] is a classic example of the tension between mutual cooperation leading to a win-win situation and focusing solely on self-interest leading to a lose-lose situation. Therefore, modeling and shaping the behavior of opponent agents is the main challenge for the application of MARL in these environments [11]. Recent advancements in MARL have introduced opponent modeling and shaping techniques that allow agents to learn not just their own strategies, but also to predict and influence the strategies of the opponent, such as [10, 20, 36]. These methods show promise in improving the efficiency of strategy learning by incorporating the behavior of other agents into the learning process.

agent, algorithm, loss function, (16 more...)

2412.03072

Country:

Asia > China (0.04)
South America > Brazil > São Paulo (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Games > Computer Games (0.89)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)