Goto

Collaborating Authors

 curate


BIGOS V2 Benchmark for Polish ASR: Curated Datasets and Tools for Reproducible Evaluation

Neural Information Processing Systems

Speech datasets available in the public domain are often underutilized because of challenges in accessibility and interoperability. To address this, a system to survey, catalog, and curate existing speech datasets was developed, enabling reproducible evaluation of automatic speech recognition (ASR) systems. The system was applied to curate over 24 datasets and evaluate 25 ASR models, with a specific focus on Polish. This research represents the most extensive comparison to date of commercial and free ASR systems for the Polish language, drawing insights from 600 system-model-test set evaluations across 8 analysis scenarios. Curated datasets and benchmark results are available publicly. The evaluation tools are open-sourced to support reproducibility of the benchmark, encourage community-driven improvements, and facilitate adaptation for other languages.


UniTox: Leveraging LLMs to Curate a Unified Dataset of Drug-Induced Toxicity from FDA Labels

Neural Information Processing Systems

Drug-induced toxicity is one of the leading reasons new drugs fail clinical trials. Machine learning models that predict drug toxicity from molecular structure could help researchers prioritize less toxic drug candidates. However, current toxicity datasets are typically small and limited to a single organ system (e.g., cardio, renal, or liver). Creating these datasets often involved time-intensive expert curation by parsing drug labelling documents that can exceed 100 pages per drug. Here, we introduce UniTox, a unified dataset of 2,418 FDA-approved drugs with drug-induced toxicity summaries and ratings created by using GPT-4o to process FDA drug labels.


CURATE: Scaling-up Differentially Private Causal Graph Discovery

Bhattacharjee, Payel, Tandon, Ravi

arXiv.org Artificial Intelligence

Causal Graph Discovery (CGD) is the process of estimating the underlying probabilistic graphical model that represents joint distribution of features of a dataset. CGD-algorithms are broadly classified into two categories: (i) Constraint-based algorithms (outcome depends on conditional independence (CI) tests), (ii) Score-based algorithms (outcome depends on optimized score-function). Since, sensitive features of observational data is prone to privacy-leakage, Differential Privacy (DP) has been adopted to ensure user privacy in CGD. Adding same amount of noise in this sequential-natured estimation process affects the predictive performance of the algorithms. As initial CI tests in constraint-based algorithms and later iterations of the optimization process of score-based algorithms are crucial, they need to be more accurate, less noisy. Based on this key observation, we present CURATE (CaUsal gRaph AdapTivE privacy), a DP-CGD framework with adaptive privacy budgeting. In contrast to existing DP-CGD algorithms with uniform privacy budgeting across all iterations, CURATE allows adaptive privacy budgeting by minimizing error probability (for constraint-based), maximizing iterations of the optimization problem (for score-based) while keeping the cumulative leakage bounded. To validate our framework, we present a comprehensive set of experiments on several datasets and show that CURATE achieves higher utility compared to existing DP-CGD algorithms with less privacy-leakage.


T-curator: a trust based curation tool for LOD logs

Lanasri, Dihia

arXiv.org Artificial Intelligence

Nowadays, companies are racing towards Linked Open Data (LOD) to improve their added value, but they are ignoring their SPARQL query logs. If well curated, these logs can present an asset for decision makers. A naive and straightforward use of these logs is too risky because their provenance and quality are highly questionable. Users of these logs in a trusted way have to be assisted by providing them with in-depth knowledge of the whole LOD environment and tools to curate these logs. In this paper, we propose an interactive and intuitive trust based tool that can be used to curate these LOD logs before exploiting them. This tool is proposed to support our approach proposed in our previous work Lanasri et al. [2020].


AI Generates Haunting New Tarot Cards

#artificialintelligence

The Swedish musician and AI enthusiast known as Supercomposite used an AI to create hundreds of creepy new tarot cards -- and has been blasting them on Twitter for days, in a delightful barrage of occult-flavored machine learning. The artist is using an AI called Looking Glass, which debuted last year and was made by Twitter user ai.curio. Some cards have humanoid characters with holes for faces, some feature monstrous-looking creatures in bloody shades of red, and some are creepy simply because they seem uncannily like tarot cards at first glance. These tarot cards do not exist. I generated 500 of these and I'm not stopping.


The Rise Of The Machines; Analogue Meets Artificial intelligence - Which-50

#artificialintelligence

When Southern Cross Austereo (SCA) become an early-stage investor in Melbourne-based Sonnant, Which-50.com Surprisingly, Sonnant CEO Tony Simmons responded by offering a demonstration to explain how it all worked. Sonnant styles itself as a "transformational artificial intelligence (AI) and machine learning (ML) company that provides content discovery for the spoken word". Simmons explained that the key to Sonnant's success was their initial decision to train the AI to understand the Australian accent in phase one. Anyone who's tried to make a booking at a restaurant while in America will know what I mean." Australian English is most associated with monophthongs (single vowels), where there are approximately 20 distinct sounds compared to American English, with only 16 sounds. Also difficult for the AI are Australian diphthongs, the timing between two vowel sounds and the tendency for a falling second sound. An accurate transcript of an analogue recording is necessary to map the ...


How to Build Lean AI Startups (Including Real-World Case Studies)

#artificialintelligence

This article will share insights on how to build lean startups that change society for the better and leave a positive impact on the planet. There are hundreds of use cases where AI can help to do exactly this. Impact-driven startups have the potential to solve real-world problems, tackle environmental problems, and improve the lives of many people, especially vulnerable populations. Billions of dollars are already flowing into AI ventures, which are primarily addressing profit gains and industrial automation. The AI for Good movement where often commercial meets social value is slowly picking up. Now, in order to build impact-driven AI startups, there are a few essential steps to follow.


Art and artifice – IAM Network

#artificialintelligence

An AI developed in Vienna is now debuting in the art business, and will curate the Bucharest Biennale. Practitioners in the arts labour under the misapprehension that the human factor of creativity would shield them from the depredations of artificial intelligence. It is assumed that like machines freed us from physical labour, machine intelligence would rid us of intellectual chores. They would put production line workers, bookkeepers, bank tellers and inventory managers out of work, but novelists and artists, and the marketing networks which have developed around their products, would be unharmed. A computer at Stanford which has digested the complete works of Shakespeare does almost passable knockoffs.


Is AI Right for Your Enterprise?

#artificialintelligence

AI is poised to deliver measurable value for a variety of public and private sector applications, and is just beginning to make inroads in the enterprise. This article offers best practices for picking an AI use case, points out barriers to AI success, and advises on how to find the best AI talent. Artificial intelligence (AI) technology dominates the headlines, but it's still not widely used. According to Gartner, between 2018 and 2019, the number of organizations deploying AI grew to just 14 percent. This may lead some enterprises to wonder, is the transition to AI necessary?


The Rise of the Machines and the Impact of Artificial Intelligence on Digital Marketing

#artificialintelligence

It's the year 2019--and the rise of the machines is transforming digital marketing. Nowadays, AI makes it much easier to generate information for online shoppers directly from a website or in the search results. Yes, that horror movie recommended in your streaming library is generated from AI. Artificial intelligence can both generate and curate content to personalize the user experience. The system is able to generate content based on matching data and information that has been indexed through the internet. For instance, you may notice this as a pop-up that appears as soon as a user visits your website.