AITopics

Autonomous driving training requires a diverse range of datasets encompassing various traffic conditions, weather scenarios, and road types. Traditional data augmentation methods often struggle to generate datasets that represent rare occurrences. To address this challenge, we propose GenDDS, a novel approach for generating driving scenarios generation by leveraging the capabilities of Stable Diffusion XL (SDXL), an advanced latent diffusion model. Our methodology involves the use of descriptive prompts to guide the synthesis process, aimed at producing realistic and diverse driving scenarios. With the power of the latest computer vision techniques, such as ControlNet and Hotshot-XL, we have built a complete pipeline for video generation together with SDXL. We employ the KITTI dataset, which includes real-world driving videos, to train the model. Through a series of experiments, we demonstrate that our model can generate high-quality driving videos that closely replicate the complexity and variability of real-world driving scenarios. This research contributes to the development of sophisticated training data for autonomous driving systems and opens new avenues for creating virtual environments for simulation and validation purposes.

diffusion model, scenario, video, (12 more...)

2408.15868

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre:

Research Report > Promising Solution (0.48)
Overview > Innovation (0.34)

Industry:

Transportation > Ground > Road (0.71)
Information Technology > Robotics & Automation (0.57)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.89)

Short-Term Electricity-Load Forecasting by Deep Learning: A Comprehensive Survey

Dong, Qi, Huang, Rubing, Cui, Chenhui, Towey, Dave, Zhou, Ling, Tian, Jinyu, Wang, Jianzhou

Short-Term Electricity-Load Forecasting (STELF) refers to the prediction of the immediate demand (in the next few hours to several days) for the power system. Various external factors, such as weather changes and the emergence of new electricity consumption scenarios, can impact electricity demand, causing load data to fluctuate and become non-linear, which increases the complexity and difficulty of STELF. In the past decade, deep learning has been applied to STELF, modeling and predicting electricity demand with high accuracy, and contributing significantly to the development of STELF. This paper provides a comprehensive survey on deep-learning-based STELF over the past ten years. It examines the entire forecasting process, including data pre-processing, feature extraction, deep-learning modeling and optimization, and results evaluation. This paper also identifies some research challenges and potential research directions to be further investigated in future work.

forecasting, load forecasting, short-term load forecasting, (13 more...)

2408.16202

Country:

Asia > Macao (0.05)
Asia > China > Zhejiang Province > Ningbo (0.04)
Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.04)
(10 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Energy > Power Industry (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Fahimi, Miriam, Russo, Mayra, Scott, Kristen M., Vidal, Maria-Esther, Berendt, Bettina, Kinder-Kurlanda, Katharina

Articulation Work and Tinkering for Fairness in Machine Learning

The field of fair AI aims to counter biased algorithms through computational modelling. However, it faces increasing criticism for perpetuating the use of overly technical and reductionist methods. As a result, novel approaches appear in the field to address more socially-oriented and interdisciplinary (SOI) perspectives on fair AI. In this paper, we take this dynamic as the starting point to study the tension between computer science (CS) and SOI research. By drawing on STS and CSCW theory, we position fair AI research as a matter of 'organizational alignment': what makes research 'doable' is the successful alignment of three levels of work organization (the social world, the laboratory, and the experiment). Based on qualitative interviews with CS researchers, we analyze the tasks, resources, and actors required for doable research in the case of fair AI. We find that CS researchers engage with SOI research to some extent, but organizational conditions, articulation work, and ambiguities of the social world constrain the doability of SOI research for them. Based on our findings, we identify and discuss problems for aligning CS and SOI as fair AI continues to evolve.

alignment, articulation work, fair ai research, (11 more...)

2407.16496

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.46)
North America > United States > New York > New York County > New York City (0.05)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)
(23 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government > Regional Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Harnessing the Intrinsic Knowledge of Pretrained Language Models for Challenging Text Classification Settings

Gao, Lingyu

Text classification, a classic task in natural language processing (NLP), involves assigning predefined categories to textual data and is crucial for applications ranging from sentiment analysis to spam detection. This thesis advances text classification by harnessing the intrinsic knowledge of Pretrained Language Models (PLMs) to address three challenging scenarios: distractor selection for multiple-choice cloze questions, improving robustness for prompt-based zero-shot text classification, and demonstration selection for retrieval-based in-context learning. Firstly, we focus on selecting distractors for multiple-choice cloze questions, ensuring that they are misleading yet incorrect. We assess the relationship between human experts' annotations (accept/reject) and various features, including context-free features (e.g., word frequency) and context-sensitive features (e.g., conditional probabilities of fillin-the-blank words). We utilize pretrained embeddings and follow annotation instructions for context-free feature design, and we find that using contextualized word representations from PLMs as features drastically improves performance over traditional feature-based models, even rivaling human performance (Chapter 3).

artificial intelligence, natural language processing, text classification task, (16 more...)

2408.1565

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.13)
North America > United States > California > Los Angeles County > Long Beach (0.13)
(46 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.92)

Industry:

Automobiles & Trucks (1.00)
Leisure & Entertainment > Sports (0.45)
Media > Film (0.45)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Huang, Jiaxing, Zhang, Jingyi

A Survey on Evaluation of Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) mimic human perception and reasoning system by integrating powerful Large Language Models (LLMs) with various modality encoders (e.g., vision, audio), positioning LLMs as the "brain" and various modality encoders as sensory organs. This framework endows MLLMs with human-like capabilities, and suggests a potential pathway towards achieving artificial general intelligence (AGI). With the emergence of all-round MLLMs like GPT-4V and Gemini, a multitude of evaluation methods have been developed to assess their capabilities across different dimensions. This paper presents a systematic and comprehensive review of MLLM evaluation methods, covering the following key aspects: (1) the background of MLLMs and their evaluation; (2) "what to evaluate" that reviews and categorizes existing MLLM evaluation tasks based on the capabilities assessed, including general multimodal recognition, perception, reasoning and trustworthiness, and domain-specific applications such as socioeconomic, natural sciences and engineering, medical usage, AI agent, remote sensing, video and audio processing, 3D point cloud analysis, and others; (3) "where to evaluate" that summarizes MLLM evaluation benchmarks into general and specific benchmarks; (4) "how to evaluate" that reviews and illustrates MLLM evaluation steps and metrics; Our overarching goal is to provide valuable insights for researchers in the field of MLLM evaluation, thereby facilitating the development of more capable and reliable MLLMs. We emphasize that evaluation should be regarded as a critical discipline, essential for advancing the field of MLLMs.

arxiv preprint arxiv, benchmark, mllm, (14 more...)

2408.15769

Country: Asia > Singapore (0.04)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (1.00)
Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Khan, Zohaib, Khaquan, Muhammad, Tafveez, Omer, Samiwala, Burhanuddin, Raza, Agha Ali

Beyond Uniform Query Distribution: Key-Driven Grouped Query Attention

The Transformer architecture has revolutionized deep learning through its Self-Attention mechanism, which effectively captures contextual information. However, the memory footprint of Self-Attention presents significant challenges for long-sequence tasks. Grouped Query Attention (GQA) addresses this issue by grouping queries and mean-pooling the corresponding key-value heads - reducing the number of overall parameters and memory requirements in a flexible manner without adversely compromising model accuracy. In this work, we introduce enhancements to GQA, focusing on two novel approaches that deviate from the static nature of grouping: Key-Distributed GQA (KDGQA) and Dynamic Key-Distributed GQA (DGQA), which leverage information from the norms of the key heads to inform query allocation. Specifically, KDGQA looks at the ratios of the norms of the key heads during each forward pass, while DGQA examines the ratios of the norms as they evolve through training. Additionally, we present Perturbed GQA (PGQA) as a case-study, which introduces variability in (static) group formation via subtracting noise from the attention maps. Our experiments with up-trained Vision Transformers, for Image Classification on datasets such as CIFAR-10, CIFAR-100, Food101, and Tiny ImageNet, demonstrate the promise of these variants in improving upon the original GQA through more informed and adaptive grouping mechanisms: specifically ViT-L experiences accuracy gains of up to 8% when utilizing DGQA in comparison to GQA and other variants. We further analyze the impact of the number of Key-Value Heads on performance, underscoring the importance of utilizing query-key affinities. Code is available on GitHub.

gqa, transformer, variant, (17 more...)

2408.08454

Country: Asia > Pakistan > Punjab > Lahore Division > Lahore (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)

Singh, Hrishikesh, Sharma, Aarti, Pant, Millie

Pixels to Prose: Understanding the art of Image Captioning

In the era of evolving artificial intelligence, machines are increasingly emulating human-like capabilities, including visual perception and linguistic expression. Image captioning stands at the intersection of these domains, enabling machines to interpret visual content and generate descriptive text. This paper provides a thorough review of image captioning techniques, catering to individuals entering the field of machine learning who seek a comprehensive understanding of available options, from foundational methods to state-of-the-art approaches. Beginning with an exploration of primitive architectures, the review traces the evolution of image captioning models to the latest cutting-edge solutions. By dissecting the components of these architectures, readers gain insights into the underlying mechanisms and can select suitable approaches tailored to specific problem requirements without duplicating efforts. The paper also delves into the application of image captioning in the medical domain, illuminating its significance in various real-world scenarios. Furthermore, the review offers guidance on evaluating the performance of image captioning systems, highlighting key metrics for assessment. By synthesizing theoretical concepts with practical application, this paper equips readers with the knowledge needed to navigate the complex landscape of image captioning and harness its potential for diverse applications in machine learning and beyond.

architecture, caption, representation, (15 more...)

2408.15714

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
(4 more...)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Nuclear Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(5 more...)

Adversarial Network Optimization under Bandit Feedback: Maximizing Utility in Non-Stationary Multi-Hop Networks

Dai, Yan, Huang, Longbo

Stochastic Network Optimization (SNO) concerns scheduling in stochastic queueing systems. It has been widely studied in network theory. Classical SNO algorithms require network conditions to be stationary with time, which fails to capture the non-stationary components in many real-world scenarios. Many existing algorithms also assume knowledge of network conditions before decision, which rules out applications where unpredictability presents. Motivated by these issues, we consider Adversarial Network Optimization (ANO) under bandit feedback. Specifically, we consider the task of *i)* maximizing some unknown and time-varying utility function associated to scheduler's actions, where *ii)* the underlying network is a non-stationary multi-hop one whose conditions change arbitrarily with time, and *iii)* only bandit feedback (effect of actually deployed actions) is revealed after decisions. Our proposed `UMO2` algorithm ensures network stability and also matches the utility maximization performance of any "mildly varying" reference policy up to a polynomially decaying gap. To our knowledge, no previous ANO algorithm handled multi-hop networks or achieved utility guarantees under bandit feedback, whereas ours can do both. Technically, our method builds upon a novel integration of online learning into Lyapunov analyses: To handle complex inter-dependencies among queues in multi-hop networks, we propose meticulous techniques to balance online learning and Lyapunov arguments. To tackle the learning obstacles due to potentially unbounded queue sizes, we design a new online linear optimization algorithm that automatically adapts to loss magnitudes. To maximize utility, we propose a bandit convex optimization algorithm with novel queue-dependent learning rate scheduling that suites drastically varying queue lengths. Our new insights in online learning can be of independent interest.

algorithm, non-stationary multi-hop network, utility maximization, (13 more...)

2408.16215

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Africa > Sudan (0.04)

Genre:

Overview (0.67)
Research Report (0.64)
Workflow (0.46)

Industry:

Education > Educational Setting (0.75)
Information Technology (0.67)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Rodriguez, Charlotte, Degioanni, Laura, Kameni, Laetitia, Vidal, Richard, Neglia, Giovanni

Evaluating the Energy Consumption of Machine Learning: Systematic Literature Review and Experiments

arXiv.org Artificial IntelligenceAug-27-2024

Monitoring, understanding, and optimizing the energy consumption of Machine Learning (ML) are various reasons why it is necessary to evaluate the energy usage of ML. However, there exists no universal tool that can answer this question for all use cases, and there may even be disagreement on how to evaluate energy consumption for a specific use case. Tools and methods are based on different approaches, each with their own advantages and drawbacks, and they need to be mapped out and explained in order to select the most suitable one for a given situation. We address this challenge through two approaches. First, we conduct a systematic literature review of all tools and methods that permit to evaluate the energy consumption of ML (both at training and at inference), irrespective of whether they were originally designed for machine learning or general software. Second, we develop and use an experimental protocol to compare a selection of these tools and methods. The comparison is both qualitative and quantitative on a range of ML tasks of different nature (vision, language) and computational complexity. The systematic literature review serves as a comprehensive guide for understanding the array of tools and methods used in evaluating energy consumption of ML, for various use cases going from basic energy monitoring to consumption optimization. Two open-source repositories are provided for further exploration. The first one contains tools that can be used to replicate this work or extend the current review. The second repository houses the experimental protocol, allowing users to augment the protocol with new ML computing tasks and additional energy evaluation tools.

document title, energy consumption, estimation model, (14 more...)

2408.15128

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.68)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)

Jamshidi, Bitasadat, Ghorbani, Nastaran, Rostamy-Malkhalifeh, Mohsen

Optimizing Lung Cancer Detection in CT Imaging: A Wavelet Multi-Layer Perceptron (WMLP) Approach Enhanced by Dragonfly Algorithm (DA)

arXiv.org Artificial IntelligenceAug-27-2024

Early-stage detection is critical, as it significantly improves the five-year survival rate from a dismal 5% in late-stage diagnoses to over 50% [2]. The advent of advanced screening technologies promises to substantially improve patient prognoses. The field of medical imaging has been revolutionized by recent strides in deep learning, yielding significant enhancements in the detection and classification of lung cancer from CT images. Innovations such as the 3D Convolutional Neural Network (CNN) approach by Diviya et al. (2024) and the LCD-Capsule Network by Bushara et al. (2023) have demonstrated the potential of these models to transform early detection and diagnosis [3, 4]. X-ray and computed tomography (CT) scans are pivotal in lung cancer diagnostics, offering high-resolution imagery that outperforms traditional radiography in detecting small and low-contrast pulmonary nodules [5, 6, 7].

artificial intelligence, classification, machine learning, (14 more...)

2408.15355

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.94)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (0.98)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.90)