AITopics

2112.08645

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > France (0.04)
(22 more...)

Genre: Research Report (0.82)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

#artificialintelligenceDec-15-2021, 12:10:47 GMT

Data Engineer - Support

It turns out… unicorns are real! In fact, FreshBooks just became one after raising our valuation to more than $1 billion. And ever since launching in 2003, we've been on a steady incline towards one goal: Building easy-to-use accounting software for small business owners. It's the goal that's driven us to expand into five offices, serving customers in over 160 countries. And it's the goal we'd love for you to be a part of as a member of our global team as we continue our journey.

accessibility, data engineer, small business owner, (4 more...)

#artificialintelligence

Country:

North America > Canada > Ontario > Toronto (0.09)
South America > Bolivia > Potosí Department > Tomás Frías Province > Potosí (0.06)
North America > United States (0.06)
(4 more...)

Technology: Information Technology > Artificial Intelligence (0.40)

#artificialintelligenceDec-15-2021, 08:46:14 GMT

Unit8: Artificial Intelligence for Good Causes – And Not Just at Christmas - Fintech Schweiz Digital Finance News - FintechNewsCH

The leading Swiss AI start-up Unit8 gives something back to society – not just at Christmas, but throughout the year, too. Employees use 10% of their working time for projects dedicated to charitable purposes. Refugees, homeless people, children and young people, and forest fire prevention benefit from these efforts – and last but not least, the employees and customers of Unit8 do, too. Artificial intelligence (AI) does not have the best reputation everywhere and is repeatedly stigmatised as unethical. 'Like any other technology, AI is neutral in nature. Whether it has a positive or negative impact depends on the decisions made by the people who develop and use AI,' says Dr Marcin Pietrzyk, CEO and co-founder of Unit8.

artificial intelligence, christmas, unit8, (6 more...)

#artificialintelligence

Country:

South America > Bolivia (0.06)
Europe > Poland > Lesser Poland Province > Kraków (0.06)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > e-Commerce > Financial Technology (0.85)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.51)

Shenkman, Carey, Thakur, Dhanaraj, Llansó, Emma

Do You See What I See? Capabilities and Limits of Automated Multimedia Content Analysis

The ever-increasing amount of user-generated content online has led, in recent years, to an expansion in research and investment in automated content analysis tools. Scrutiny of automated content analysis has accelerated during the COVID-19 pandemic, as social networking services have placed a greater reliance on these tools due to concerns about health risks to their moderation staff from in-person work. At the same time, there are important policy debates around the world about how to improve content moderation while protecting free expression and privacy. In order to advance these debates, we need to understand the potential role of automated content analysis tools. This paper explains the capabilities and limitations of tools for analyzing online multimedia content and highlights the potential risks of using these tools at scale without accounting for their limitations. It focuses on two main categories of tools: matching models and computer prediction models. Matching models include cryptographic and perceptual hashing, which compare user-generated content with existing and known content. Predictive models (including computer vision and computer audition) are machine learning techniques that aim to identify characteristics of new or previously unknown content.

arxiv, automated multimedia content analysis, capability and limit, (13 more...)

2201.11105

Country:

Europe > Belgium > Brussels-Capital Region > Brussels (0.14)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(5 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Media > News (1.00)
Media > Music (1.00)
Law > Civil Rights & Constitutional Law (1.00)
(7 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.68)

Kraiem, Mohamed S., Sánchez-Hernández, Fernando, Moreno-García, María N.

Selecting the suitable resampling strategy for imbalanced data classification regarding dataset properties

In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples. Thus, the prediction model is unreliable although the overall model accuracy can be acceptable. Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class. However, their effectiveness depends on several factors mainly related to data intrinsic characteristics, such as imbalance ratio, dataset size and dimensionality, overlapping between classes or borderline examples. In this work, the impact of these factors is analyzed through a comprehensive comparative study involving 40 datasets from different application areas. The objective is to obtain models for automatic selection of the best resampling strategy for any dataset based on its characteristics. These models allow us to check several factors simultaneously considering a wide range of values since they are induced from very varied datasets that cover a broad spectrum of conditions. This differs from most studies that focus on the individual analysis of the characteristics or cover a small range of values. In addition, the study encompasses both basic and advanced resampling strategies that are evaluated by means of eight different performance metrics, including new measures specifically designed for imbalanced data classification. The general nature of the proposal allows the choice of the most appropriate method regardless of the domain, avoiding the search for special purpose techniques that could be valid for the target data.

classification, dataset, imbalance ratio, (15 more...)

doi: 10.3390/app11188546

2201.07932

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
North America > United States > District of Columbia > Washington (0.04)
(16 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry:

Information Technology > Security & Privacy (0.87)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.93)

Textless Speech-to-Speech Translation on Real Data

Lee, Ann, Gong, Hongyu, Duquenne, Paul-Ambroise, Schwenk, Holger, Chen, Peng-Jen, Wang, Changhan, Popuri, Sravya, Pino, Juan, Gu, Jiatao, Hsu, Wei-Ning

We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language and can be built without the need of any text data. Different from existing work in the literature, we tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data. The key to our approach is a self-supervised unit-based speech normalization technique, which finetunes a pre-trained speech encoder with paired audios from multiple speakers and a single reference speaker to reduce the variations due to accents, while preserving the lexical content. With only 10 minutes of paired data for speech normalization, we obtain on average 3.2 BLEU gain when training the S2ST model on the \vp~S2ST dataset, compared to a baseline trained on un-normalized speech target. We also incorporate automatically mined S2ST data and show an additional 2.0 BLEU gain. To our knowledge, we are the first to establish a textless S2ST technique that can be trained with real-world data and works for multiple language pairs.

speech, speech normalizer, translation, (14 more...)

2112.08352

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

MMO: Meta Multi-Objectivization for Software Configuration Tuning

Chen, Tao, Li, Miqing

Software configuration tuning is essential for optimizing a given performance objective (e.g., minimizing latency). Yet, due to the software's intrinsically complex configuration landscape and expensive measurement, there has been a rather mild success, particularly in preventing the search from being trapped in local optima. To address this issue, in this paper we take a different perspective. Instead of focusing on improving the optimizer, we work on the level of optimization model and propose a meta multi-objectivization (MMO) model that considers an auxiliary performance objective (e.g., throughput in addition to latency). What makes this model unique is that we do not optimize the auxiliary performance objective, but rather use it to make similarly-performing while different configurations less comparable (i.e. Pareto nondominated to each other), thus preventing the search from being trapped in local optima. Importantly through a new normalization method we show how to effectively use the MMO model without worrying about its weight -- the only yet highly sensitive parameter that can affect its effectiveness. Experiments on 22 cases from 11 real-world software systems/environments confirm that our MMO model with the new normalization performs better than its state-of-the-art single-objective counterparts on 82% cases while achieving up to 2.09x speedup. For 67% of the cases, the new normalization also enables the MMO model to outperform the instance when using it with the normalization used in our prior FSE work under pre-tuned best weights, saving a great amount of resources which would be otherwise necessary to find a good weight. We also demonstrate that the MMO model with the new normalization can consolidate Flash, a recent model-based tuning tool, on 68% of the cases with 1.22x speedup in general.

configuration, objective, performance objective, (13 more...)

2112.07303

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Spain > Galicia > Madrid (0.04)
(34 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.70)

Industry: Information Technology (0.46)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Hanaka, Tesshu, Kobayashi, Yasuaki, Kurita, Kazuhiro, Lee, See Woo, Otachi, Yota

Computing Diverse Shortest Paths Efficiently: A Theoretical and Experimental Study

Finding diverse solutions in combinatorial problems recently has received considerable attention (Baste et al. 2020; Fomin et al. 2020; Hanaka et al. 2021). In this paper we study the following type of problems: given an integer $k$, the problem asks for $k$ solutions such that the sum of pairwise (weighted) Hamming distances between these solutions is maximized. Such solutions are called diverse solutions. We present a polynomial-time algorithm for finding diverse shortest $st$-paths in weighted directed graphs. Moreover, we study the diverse version of other classical combinatorial problems such as diverse weighted matroid bases, diverse weighted arborescences, and diverse bipartite matchings. We show that these problems can be solved in polynomial time as well. To evaluate the practical performance of our algorithm for finding diverse shortest $st$-paths, we conduct a computational experiment with synthetic and real-world instances.The experiment shows that our algorithm successfully computes diverse solutions within reasonable computational time.

algorithm, graph, polynomial time, (13 more...)

2112.05403

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
(8 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

#artificialintelligenceDec-14-2021, 20:30:42 GMT

Get used to hearing about machine learning operations startups – TechCrunch

Welcome to The TechCrunch Exchange, a weekly startups-and-markets newsletter. It's inspired by the daily TechCrunch column where it gets its name. If you aren't in the United States, it's a little hard to explain. In short, certain deficiencies in our policing and judicial systems flared brightly as the week came to a close. So, today's Exchange newsletter will be shorter than intended. Hug the people you love, and everyone else.

software, startup, techcrunch, (9 more...)

#artificialintelligence

Country:

North America > United States (0.27)
South America (0.05)
North America > Central America (0.05)
(3 more...)

Genre: Personal > Interview (0.32)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.56)
Banking & Finance (0.52)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Shevtsov, Alexander, Tzagkarakis, Christos, Antonakaki, Despoina, Ioannidis, Sotiris

Identification of Twitter Bots Based on an Explainable Machine Learning Framework: The US 2020 Elections Case Study

arXiv.org Artificial IntelligenceDec-14-2021

Twitter is one of the most popular social networks attracting millions of users, while a considerable proportion of online discourse is captured. It provides a simple usage framework with short messages and an efficient application programming interface (API) enabling the research community to study and analyze several aspects of this social network. However, the Twitter usage simplicity can lead to malicious handling by various bots. The malicious handling phenomenon expands in online discourse, especially during the electoral periods, where except the legitimate bots used for dissemination and communication purposes, the goal is to manipulate the public opinion and the electorate towards a certain direction, specific ideology, or political party. This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data. To this end, a supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm, where the hyper-parameters are tuned via cross-validation. Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions by calculating feature importance, using the game theoretic-based Shapley values. Experimental evaluation on distinct Twitter datasets demonstrate the superiority of our approach, in terms of bot detection accuracy, when compared against a recent state-of-the-art Twitter bot detection method.

artificial intelligence, machine learning, social media, (20 more...)

doi: 10.1609/icwsm.v16i1.19349

2112.04913

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
Government > Voting & Elections (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)