AITopics

Deep neural networks provide reliable solutions for many classification and regression tasks; however, their application in real-time wireless systems with simple sensor networks is limited due to high energy consumption and significant bandwidth needs. This study proposes a multi-sensor wireless inference system with memristor-based analog computing. Given the sensors' limited computational capabilities, the features from the network's front end are transmitted to a central device where an $L_p$-norm inspired approximation of the maximum operation is employed to achieve transformation-invariant features, enabling efficient over-the-air transmission. We also introduce a trainable over-the-air sensor fusion method based on $L_p$-norm inspired combining function that customizes sensor fusion to match the network and sensor distribution characteristics, enhancing adaptability. To address the energy constraints of sensors, we utilize memristors, known for their energy-efficient in-memory computing, enabling analog-domain computations that reduce energy use and computational overhead in edge computing. This dual approach of memristors and $L_p$-norm inspired sensor fusion fosters energy-efficient computational and transmission paradigms and serves as a practical energy-efficient solution with minimal performance loss.

artificial intelligence, machine learning, sensor, (19 more...)

doi: 10.1016/j.phycom.2024.102582

2501.10245

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
Asia > Malaysia > Kuala Lumpur > Kuala Lumpur (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(10 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology (0.93)
Energy (0.87)

Technology:

Information Technology > Data Science > Data Integration (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Rahman, Mohammad Wali Ur, Lin, Yu-Zheng, Weeks, Carter, Ruddell, David, Gabriellini, Jeff, Hayes, Bill, Hariri, Salim, Ziegler, Edward V. Jr

AI/ML Based Detection and Categorization of Covert Communication in IPv6 Network

The flexibility and complexity of IPv6 extension headers allow attackers to create covert channels or bypass security mechanisms, leading to potential data breaches or system compromises. The mature development of machine learning has become the primary detection technology option used to mitigate covert communication threats. However, the complexity of detecting covert communication, evolving injection techniques, and scarcity of data make building machine-learning models challenging. In previous related research, machine learning has shown good performance in detecting covert communications, but oversimplified attack scenario assumptions cannot represent the complexity of modern covert technologies and make it easier for machine learning models to detect covert communications. To bridge this gap, in this study, we analyzed the packet structure and network traffic behavior of IPv6, used encryption algorithms, and performed covert communication injection without changing network packet behavior to get closer to real attack scenarios. In addition to analyzing and injecting methods for covert communications, this study also uses comprehensive machine learning techniques to train the model proposed in this study to detect threats, including traditional decision trees such as random forests and gradient boosting, as well as complex neural network architectures such as CNNs and LSTMs, to achieve detection accuracy of over 90\%. This study details the methods used for dataset augmentation and the comparative performance of the applied models, reinforcing insights into the adaptability and resilience of the machine learning application in IPv6 covert communication. In addition, we also proposed a Generative AI-assisted interpretation concept based on prompt engineering as a preliminary study of the role of Generative AI agents in covert communication.

artificial intelligence, deep learning, machine learning, (16 more...)

2501.10627

Country: North America > United States > Arizona > Pima County > Tucson (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Iverson, Valentio, Vavasis, Stephen

Mean and Variance Estimation Complexity in Arbitrary Distributions via Wasserstein Minimization

Parameter estimation is a fundamental challenge in machine learning, crucial for tasks such as neural network weight fitting and Bayesian inference. This paper focuses on the complexity of estimating translation $\boldsymbol{\mu} \in \mathbb{R}^l$ and shrinkage $\sigma \in \mathbb{R}_{++}$ parameters for a distribution of the form $\frac{1}{\sigma^l} f_0 \left( \frac{\boldsymbol{x} - \boldsymbol{\mu}}{\sigma} \right)$, where $f_0$ is a known density in $\mathbb{R}^l$ given $n$ samples. We highlight that while the problem is NP-hard for Maximum Likelihood Estimation (MLE), it is possible to obtain $\varepsilon$-approximations for arbitrary $\varepsilon > 0$ within $\text{poly} \left( \frac{1}{\varepsilon} \right)$ time using the Wasserstein distance.

artificial intelligence, bayesian inference, machine learning, (18 more...)

2501.10172

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.55)

Accelerating Large Language Models through Partially Linear Feed-Forward Network

Hu, Gansen, Wang, Zhaoguo, Wei, Jinglin, Huang, Wei, Chen, Haibo

Large language models (LLMs) demonstrate remarkable capabilities but face deployment challenges due to their massive parameter counts. While existing compression techniques like pruning can reduce model size, it leads to significant accuracy degradation under high compression ratios. We present a novel perspective inspired by constant folding in compiler optimization. Our approach enables parameter reduction by treating activation functions in LLMs as linear functions. However, recent LLMs use complex non-linear activations like GELU that prevent direct application of this technique. We propose TARDIS, which enables optimization of LLMs with non-linear activations by partially approximating them with linear functions in frequently occurring input ranges. For outlier inputs, TARDIS employs an online predictor to dynamically fall back to original computations. Our experiments demonstrate that TARDIS achieves 80% parameter reduction in feed-forward networks, while significantly outperforming state-of-the-art pruning methods Wanda and RIA with up to 65% higher accuracy. In practical deployments for a 7B model, TARDIS achieves 1.6x end-to-end inference speedup when integrated with the vLLM serving system, and 1.4x speedup with the widely adopted HuggingFace implementation, while incurring only a 10.9% accuracy trade-off.

large language model, machine learning, natural language, (18 more...)

2501.10054

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.60)

MSTS: A Multimodal Safety Test Suite for Vision-Language Models

Röttger, Paul, Attanasio, Giuseppe, Friedrich, Felix, Goldzycher, Janis, Parrish, Alicia, Bhardwaj, Rishabh, Di Bonaventura, Chiara, Eng, Roman, Geagea, Gaia El Khoury, Goswami, Sujata, Han, Jieun, Hovy, Dirk, Jeong, Seogyeong, Jeretič, Paloma, Plaza-del-Arco, Flor Miriam, Rooein, Donya, Schramowski, Patrick, Shaitarova, Anastassia, Shen, Xudong, Willats, Richard, Zugarini, Andrea, Vidgen, Bertie

Vision-language models (VLMs), which process image and text inputs, are increasingly integrated into chat assistants and other consumer AI applications. Without proper safeguards, however, VLMs may give harmful advice (e.g. how to self-harm) or encourage unsafe behaviours (e.g. to consume drugs). Despite these clear hazards, little work so far has evaluated VLM safety and the novel risks created by multimodal inputs. To address this gap, we introduce MSTS, a Multimodal Safety Test Suite for VLMs. MSTS comprises 400 test prompts across 40 fine-grained hazard categories. Each test prompt consists of a text and an image that only in combination reveal their full unsafe meaning. With MSTS, we find clear safety issues in several open VLMs. We also find some VLMs to be safe by accident, meaning that they are safe because they fail to understand even simple test prompts. We translate MSTS into ten languages, showing non-English prompts to increase the rate of unsafe model responses. We also show models to be safer when tested with text only rather than multimodal prompts. Finally, we explore the automation of VLM safety assessments, finding even the best safety classifiers to be lacking.

artificial intelligence, multimodal safety test suite, natural language, (2 more...)

2501.10057

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (0.60)
Information Technology > Artificial Intelligence > Natural Language (0.60)

Tabular-TX: Theme-Explanation Structure-based Table Summarization via In-Context Learning

Kwack, TaeYoon, Kim, Jisoo, Jung, Ki Yong, Lee, DongGeon, Park, Heesun

This paper proposes a Theme-Explanation Structure-based Table Summarization (Tabular-TX) pipeline designed to efficiently process table data. Tabular-TX preprocesses table data by focusing on highlighted cells and then generates summary sentences structured with a Theme Part in the form of adverbial phrases followed by an Explanation Part in the form of clauses. In this process, customized analysis is performed by considering the structural characteristics and comparability of the table. Additionally, by utilizing In-Context Learning, Tabular-TX optimizes the analytical capabilities of large language models (LLMs) without the need for fine-tuning, effectively handling the structural complexity of table data. Results from applying the proposed Tabular-TX to generate table-based summaries demonstrated superior performance compared to existing fine-tuning-based methods, despite limitations in dataset size. Experimental results confirmed that Tabular-TX can process complex table data more effectively and established it as a new alternative for table-based question answering and summarization tasks, particularly in resource-constrained environments.

large language model, machine learning, natural language, (16 more...)

2501.10487

Country:

Europe > Austria > Vienna (0.15)
North America > Mexico > Mexico City > Mexico City (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Government (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

GAWM: Global-Aware World Model for Multi-Agent Reinforcement Learning

Shi, Zifeng, Liu, Meiqin, Zhang, Senlin, Zheng, Ronghao, Dong, Shanling, Wei, Ping

In recent years, Model-based Multi-Agent Reinforcement Learning (MARL) has demonstrated significant advantages over model-free methods in terms of sample efficiency by using independent environment dynamics world models for data sample augmentation. However, without considering the limited sample size, these methods still lag behind model-free methods in terms of final convergence performance and stability. This is primarily due to the world model's insufficient and unstable representation of global states in partially observable environments. This limitation hampers the ability to ensure global consistency in the data samples and results in a time-varying and unstable distribution mismatch between the pseudo data samples generated by the world model and the real samples. This issue becomes particularly pronounced in more complex multi-agent environments. To address this challenge, we propose a model-based MARL method called GAWM, which enhances the centralized world model's ability to achieve globally unified and accurate representation of state information while adhering to the CTDE paradigm. GAWM uniquely leverages an additional Transformer architecture to fuse local observation information from different agents, thereby improving its ability to extract and represent global state information. This enhancement not only improves sample efficiency but also enhances training stability, leading to superior convergence performance, particularly in complex and challenging multi-agent environments. This advancement enables model-based methods to be effectively applied to more complex multi-agent environments. Experimental results demonstrate that GAWM outperforms various model-free and model-based approaches, achieving exceptional performance in the challenging domains of SMAC.

machine learning, reinforcement learning, world model, (18 more...)

2501.10116

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

AI Technicians: Developing Rapid Occupational Training Methods for a Competitive AI Workforce

Savelka, Jaromir, Kultur, Can, Agarwal, Arav, Bogart, Christopher, Burte, Heather, Zhang, Adam, Sakr, Majd

The accelerating pace of developments in Artificial Intelligence~(AI) and the increasing role that technology plays in society necessitates substantial changes in the structure of the workforce. Besides scientists and engineers, there is a need for a very large workforce of competent AI technicians (i.e., maintainers, integrators) and users~(i.e., operators). As traditional 4-year and 2-year degree-based education cannot fill this quickly opening gap, alternative training methods have to be developed. We present the results of the first four years of the AI Technicians program which is a unique collaboration between the U.S. Army's Artificial Intelligence Integration Center (AI2C) and Carnegie Mellon University to design, implement and evaluate novel rapid occupational training methods to create a competitive AI workforce at the technicians level. Through this multi-year effort we have already trained 59 AI Technicians. A key observation is that ongoing frequent updates to the training are necessary as the adoption of AI in the U.S. Army and within the society at large is evolving rapidly. A tight collaboration among the stakeholders from the army and the university is essential for successful development and maintenance of the training for the evolving role. Our findings can be leveraged by large organizations that face the challenge of developing a competent AI workforce as well as educators and researchers engaged in solving the challenge.

artificial intelligence, machine learning, trainee, (14 more...)

doi: 10.1145/3641554.3701935

2501.10579

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.18)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.68)

Industry:

Government > Military > Army (1.00)
Education > Educational Setting > Higher Education (1.00)
Education > Curriculum (0.94)
Government > Regional Government > North America Government > United States Government (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Gaines, Dylan, Vertanen, Keith

Adapting Large Language Models for Character-based Augmentative and Alternative Communication

Users of Augmentative and Alternative Communication (AAC) may write letter-by-letter via an interface that uses a character language model. However, most state-of-the-art large pretrained language models predict subword tokens of variable length. We investigate how to practically use such models to make accurate and efficient character predictions. We fine-tune models using a large dataset of sentences we curated in which each sentence is rated according to how useful it might be for spoken or written AAC communication. We find that using an algorithm to produce character predictions from a subword large language model provides more accurate predictions than adding a classification layer or using a byte-level model. We also find that our domain adaptation curriculum is effective at improving model performance on simple, conversational text.

large language model, machine learning, natural language, (21 more...)

2501.10582

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(9 more...)

Genre:

Workflow (0.93)
Personal > Interview (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Gladyshev, Maksim, Alechina, Natasha, Dastani, Mehdi, Doder, Dragan, Logan, Brian

Temporal Causal Reasoning with (Non-Recursive) Structural Equation Models

Structural Equation Models (SEM) are the standard approach to representing causal dependencies between variables in causal models. In this paper we propose a new interpretation of SEMs when reasoning about Actual Causality, in which SEMs are viewed as mechanisms transforming the dynamics of exogenous variables into the dynamics of endogenous variables. This allows us to combine counterfactual causal reasoning with existing temporal logic formalisms, and to introduce a temporal logic, CPLTL, for causal reasoning about such structures. We show that the standard restriction to so-called \textit{recursive} models (with no cycles in the dependency graph) is not necessary in our approach, allowing us to reason about mutually dependent processes and feedback loops. Finally, we introduce new notions of model equivalence for temporal causal models, and show that CPLTL has an efficient model-checking procedure.

artificial intelligence, causal model, reasoning, (12 more...)

2501.1019

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (1.00)