AITopics | Overview

Collaborating Authors

Overview

PDF-Malware: An Overview on Threats, Detection and Evasion Attacks

Fleury, Nicolas, Dubrunquez, Theo, Alouani, Ihsen

arXiv.org Artificial IntelligenceJul-27-2021

In the recent years, Portable Document Format, commonly known as PDF, has become a democratized standard for document exchange and dissemination. This trend has been due to its characteristics such as its flexibility and portability across platforms. The widespread use of PDF has installed a false impression of inherent safety among benign users. However, the characteristics of PDF motivated hackers to exploit various types of vulnerabilities, overcome security safeguards, thereby making the PDF format one of the most efficient malicious code attack vectors. Therefore, efficiently detecting malicious PDF files is crucial for information security. Several analysis techniques has been proposed in the literature, be it static or dynamic, to extract the main features that allow the discrimination of malware files from benign ones. Since classical analysis techniques may be limited in case of zero-days, machine-learning based techniques have emerged recently as an automatic PDF-malware detection method that is able to generalize from a set of training samples. These techniques are themselves facing the challenge of evasion attacks where a malicious PDF is transformed to look benign. In this work, we give an overview on the PDF-malware detection problem. We give a perspective on the new challenges and emerging solutions.

classifier, detection, pdf file, (16 more...)

arXiv.org Artificial Intelligence

2107.12873

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > France > Hauts-de-France (0.05)
Asia (0.05)
North America > United States > Hawaii (0.04)

Genre: Overview (0.89)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

Rogers, Anna, Gardner, Matt, Augenstein, Isabelle

arXiv.org Artificial IntelligenceJul-27-2021

Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been also much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with over 80 new datasets appearing in the past two years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work. We further discuss the current classifications of ``reasoning types" in question answering and propose a new taxonomy. We also discuss the implications of over-focusing on English, and survey the current monolingual resources for other languages and multilingual resources. The study is aimed at both practitioners looking for pointers to the wealth of existing data, and at researchers working on new resources.

arxiv, dataset, proceedings, (12 more...)

arXiv.org Artificial Intelligence

2107.12708

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(34 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Education > Assessment & Standards > Student Performance (0.72)
Health & Medicine > Health Care Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.94)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Federated Learning Meets Natural Language Processing: A Survey

Liu, Ming, Ho, Stella, Wang, Mengqi, Gao, Longxiang, Jin, Yuan, Zhang, He

arXiv.org Artificial IntelligenceJul-27-2021

Federated Learning aims to learn machine learning models from multiple decentralized edge devices (e.g. mobiles) or servers without sacrificing local data privacy. Recent Natural Language Processing techniques rely on deep learning and large pre-trained language models. However, both big deep neural and language models are trained with huge amounts of data which often lies on the server side. Since text data is widely originated from end users, in this work, we look into recent NLP models and techniques which use federated learning as the learning framework. Our survey discusses major challenges in federated natural language processing, including the algorithm challenges, system challenges as well as the privacy issues. We also provide a critical review of the existing Federated NLP evaluation methods and tools. Finally, we highlight the current research gaps and future directions.

arxiv preprint arxiv, federated learning, learning, (11 more...)

arXiv.org Artificial Intelligence

2107.12603

Country: Asia > China > Hong Kong (0.04)

Genre:

Overview (1.00)
Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How Artificial Intelligence Will Shape Our Future

#artificialintelligenceJul-26-2021, 12:00:18 GMT

As AI improves and becomes more powerful, its impact on the world economy will become vastly more significant. It will affect virtually every aspect of the world economy -- from unemployment rates to economic growth, productivity, income inequality and more. Some argue that so far, AI has not had a large enough impact, but as its development accelerates, its effects will grow exponentially. Whether we like it or not, automation and job displacement are already here, slowly pushing the human workforce into different domains. Similar patterns can be found throughout history; new technology made certain products and jobs obsolete, and eventually humans were forced to switch to more innovative products and new jobs.

computational power, productivity, world economy, (13 more...)

#artificialintelligence

Genre: Overview > Innovation (0.36)

Industry: Banking & Finance > Economy (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Predicting Game Engagement and Difficulty Using AI Players

Roohi, Shaghayegh, Guckelsberger, Christian, Relas, Asko, Heiskanen, Henri, Takatalo, Jari, Hämäläinen, Perttu

arXiv.org Artificial IntelligenceJul-26-2021

This paper presents a novel approach to automated playtesting for the prediction of human player behavior and experience. It has previously been demonstrated that Deep Reinforcement Learning (DRL) game-playing agents can predict both game difficulty and player engagement, operationalized as average pass and churn rates. We improve this approach by enhancing DRL with Monte Carlo Tree Search (MCTS). We also motivate an enhanced selection strategy for predictor features, based on the observation that an AI agent's best-case performance can yield stronger correlations with human data than the agent's average performance. Both additions consistently improve the prediction accuracy, and the DRL-enhanced MCTS outperforms both DRL and vanilla MCTS in the hardest levels. We conclude that player modelling via automated playtesting can benefit from combining DRL and MCTS. Moreover, it can be worthwhile to investigate a subset of repeated best AI agent runs, if AI gameplay does not yield good predictions on average.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3474658

2107.12061

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.05)
Europe > Finland (0.05)
(11 more...)

Genre:

Research Report > Promising Solution (0.49)
Overview > Innovation (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

SVEva Fair: A Framework for Evaluating Fairness in Speaker Verification

Toussaint, Wiebke, Ding, Aaron Yi

arXiv.org Artificial IntelligenceJul-26-2021

Despite the success of deep neural networks (DNNs) in enabling on-device voice assistants, increasing evidence of bias and discrimination in machine learning is raising the urgency of investigating the fairness of these systems. Speaker verification is a form of biometric identification that gives access to voice assistants. Due to a lack of fairness metrics and evaluation frameworks that are appropriate for testing the fairness of speaker verification components, little is known about how model performance varies across subgroups, and what factors influence performance variation. To tackle this emerging challenge, we design and develop SVEva Fair, an accessible, actionable and model-agnostic framework for evaluating the fairness of speaker verification components. The framework provides evaluation measures and visualisations to interrogate model performance across speaker subgroups and compare fairness between models. We demonstrate SVEva Fair in a case study with end-to-end DNNs trained on the VoxCeleb datasets to reveal potential bias in existing embedded speech recognition systems based on the demographic attributes of speakers. Our evaluation shows that publicly accessible benchmark models are not fair and consistently produce worse predictions for some nationalities, and for female speakers of most nationalities. To pave the way for fair and reliable embedded speaker verification, SVEva Fair has been implemented as an open-source python library and can be integrated into the embedded ML development pipeline to facilitate developers and researchers in troubleshooting unreliable speaker verification performance, and selecting high impact approaches for mitigating fairness challenges

fairness, speaker verification component, subgroup, (10 more...)

arXiv.org Artificial Intelligence

2107.12049

Country:

North America > Canada (0.05)
Oceania > Australia (0.05)
Europe > Norway (0.05)
(9 more...)

Genre:

Research Report (1.00)
Overview (0.68)

Industry: Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

A Survey of Monte Carlo Methods for Parameter Estimation

Luengo, D., Martino, L., Bugallo, M., Elvira, V., Särkkä, S.

arXiv.org Artificial IntelligenceJul-25-2021

Statistical signal processing applications usually require the estimation of some parameters of interest given a set of observed data. These estimates are typically obtained either by solving a multi-variate optimization problem, as in the maximum likelihood (ML) or maximum a posteriori (MAP) estimators, or by performing a multi-dimensional integration, as in the minimum mean squared error (MMSE) estimators. Unfortunately, analytical expressions for these estimators cannot be found in most real-world applications, and the Monte Carlo (MC) methodology is one feasible approach. MC methods proceed by drawing random samples, either from the desired distribution or from a simpler one, and using them to compute consistent estimators. The most important families of MC algorithms are Markov chain MC (MCMC) and importance sampling (IS). On the one hand, MCMC methods draw samples from a proposal density, building then an ergodic Markov chain whose stationary distribution is the desired distribution by accepting or rejecting those candidate samples as the new state of the chain. On the other hand, IS techniques draw samples from a simple proposal density, and then assign them suitable weights that measure their quality in some appropriate way. In this paper, we perform a thorough review of MC methods for the estimation of static parameters in signal processing applications. A historical note on the development of MC schemes is also provided, followed by the basic MC method and a brief description of the rejection sampling (RS) algorithm, as well as three sections describing many of the most relevant MCMC and IS algorithms, and their combined use.

algorithm, iteration, proposal, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1186/s13634-020-00675-6

2107.1182

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.13)
(26 more...)

Genre:

Research Report (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Health & Medicine (0.67)
Government > Regional Government > North America Government > United States Government (0.67)
Energy (0.67)

Add feedback

A Survey on Data-driven Software Vulnerability Assessment and Prioritization

Le, Triet H. M., Chen, Huaming, Babar, M. Ali

arXiv.org Artificial IntelligenceJul-25-2021

Software Vulnerabilities (SVs) are increasing in complexity and scale, posing great security risks to many software systems. Given the limited resources in practice, SV assessment and prioritization help practitioners devise optimal SV mitigation plans based on various SV characteristics. The surge in SV data sources and data-driven techniques such as Machine Learning and Deep Learning have taken SV assessment and prioritization to the next level. Our survey provides a taxonomy of the past research efforts and highlights the best practices for data-driven SV assessment and prioritization. We also discuss the current limitations and propose potential solutions to address such issues.

classification, international conference, svs, (11 more...)

arXiv.org Artificial Intelligence

2107.08364

Country:

Asia > China (0.04)
Oceania > Australia > South Australia > Adelaide (0.04)
North America > United States > Hawaii (0.04)
(2 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Information Technology > Software (0.92)

Technology:

Information Technology > Software (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
(7 more...)

Add feedback

A Review of Bangla Natural Language Processing Tasks and the Utility of Transformer Models

Alam, Firoj, Hasan, Arid, Alam, Tanvirul, Khan, Akib, Tajrin, Janntatul, Khan, Naira, Chowdhury, Shammur Absar

arXiv.org Artificial IntelligenceJul-25-2021

Bangla -- ranked as the 6th most widely spoken language across the world (https://www.ethnologue.com/guides/ethnologue200), with 230 million native speakers -- is still considered as a low-resource language in the natural language processing (NLP) community. With three decades of research, Bangla NLP (BNLP) is still lagging behind mainly due to the scarcity of resources and the challenges that come with it. There is sparse work in different areas of BNLP; however, a thorough survey reporting previous work and recent advances is yet to be done. In this study, we first provide a review of Bangla NLP tasks, resources, and tools available to the research community; we benchmark datasets collected from various platforms for nine NLP tasks using current state-of-the-art algorithms (i.e., transformer-based models). We provide comparative results for the studied NLP tasks by comparing monolingual vs. multilingual models of varying sizes. We report our results using both individual and consolidated datasets and provide data splits for future research. We reviewed a total of 108 papers and conducted 175 sets of experiments. Our results show promising performance using transformer-based models while highlighting the trade-off with computational costs. We hope that such a comprehensive survey will motivate the community to build on and further advance the research on Bangla NLP.

dataset, international conference, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2107.03844

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
Asia > India > West Bengal > Kolkata (0.04)
(11 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry: Media > News (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
(2 more...)

Add feedback

Best practices to build data literacy into your Gen Z workforce - Data Dreamer

#artificialintelligenceJul-24-2021, 22:05:39 GMT

This is a guest post by Kirk Borne, Ph.D., Chief Science Officer at DataPrime.ai, Kirk is also a consultant, astrophysicist, data scientist, blogger, data literacy advocate and renowned speaker, and is one of the most recognized names in the industry. A survey of 1,100 data practitioners and business leaders reported that 84% of organizations consider data literacy to be a core business skill, agreeing with the statement that the inability of the workforce to use and analyze data effectively can hamper their business success. In addition, 36% said data literacy is crucial to future-proofing their business. Another survey found that 75% of employees are not comfortable using data.

build data literacy, data literacy, workforce, (13 more...)

#artificialintelligence

Genre: Overview (0.55)

Industry: Information Technology (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.71)

Add feedback