AITopics | big data and data science

Collaborating Authors

big data and data science

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

From User Preferences to Optimization Constraints Using Large Language Models

Sanguinetti, Manuela, Perniciano, Alessandra, Zedda, Luca, Loddo, Andrea, Di Ruberto, Cecilia, Atzori, Maurizio

arXiv.org Artificial IntelligenceMar-27-2025

This work explores using Large Language Models (LLMs) to translate user preferences into energy optimization constraints for home appliances. We describe a task where natural language user utterances are converted into formal constraints for smart appliances, within the broader context of a renewable energy community (REC) and in the Italian scenario. We evaluate the effectiveness of various LLMs currently available for Italian in translating these preferences resorting to classical zero-shot, one-shot, and few-shot learning settings, using a pilot dataset of Italian user requests paired with corresponding formal constraint representation. Our contributions include establishing a baseline performance for this task, publicly releasing the dataset and code for further research, and providing insights on observed best practices and limitations of LLMs in this particular domain.

constraint, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2503.2136

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Italy > Sardinia > Cagliari (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Energy > Renewable (1.00)
Appliances & Durable Goods (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

A Methodology to extract Geo-Referenced Standard Routes from AIS Data

Corvino, Michela, Daffinà, Filippo, Francalanci, Chiara, Giacomazzi, Paolo, Magliani, Martina, Ravanelli, Paolo, Stahl, Torbjorn

arXiv.org Artificial IntelligenceMar-26-2025

Maritime AIS (Automatic Identification Systems) data serve as a valuable resource for studying vessel behavior. This study proposes a methodology to analyze route between maritime points of interest and extract geo-referenced standard routes, as maritime patterns of life, from raw AIS data. The underlying assumption is that ships adhere to consistent patterns when travelling in certain maritime areas due to geographical, environmental, or economic factors. Deviations from these patterns may be attributed to weather conditions, seasonality, or illicit activities. This enables maritime surveillance authorities to analyze the navigational behavior between ports, providing insights on vessel route patterns, possibly categorized by vessel characteristics (type, flag, or size). Our methodological process begins by segmenting AIS data into distinct routes using a finite state machine (FSM), which describes routes as seg-ments connecting pairs of points of interest. The extracted segments are ag-gregated based on their departure and destination ports and then modelled using iterative density-based clustering to connect these ports. The cluster-ing parameters are assigned manually to sample and then extended to the en-tire dataset using linear regression. Overall, the approach proposed in this paper is unsupervised and does not require any ground truth to be trained. The approach has been tested on data on the on a six-year AIS dataset cover-ing the Arctic region and the Europe, Middle East, North Africa areas. The total size of our dataset is 1.15 Tbytes. The approach has proved effective in extracting standard routes, with less than 5% outliers, mostly due to routes with either their departure or their destination port not included in the test areas.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.22734

Country:

Africa > North Africa (0.25)
Europe > Sweden > Örebro County > Örebro (0.04)
Europe > Italy > Lombardy > Milan (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry:

Government > Military (0.66)
Transportation > Marine (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Active Data Sampling and Generation for Bias Remediation

Maratea, Antonio, Perna, Rita

arXiv.org Artificial IntelligenceMar-26-2025

Adequate sampling space coverage is the keystone to effectively train trustworthy Machine Learning models. Unfortunately, real data do carry several inherent risks due to the many potential biases they exhibit when gathered without a proper random sampling over the reference population, and most of the times this is way too expensive or time consuming to be a viable option. Depending on how training data have been gathered, unmitigated biases can lead to harmful or discriminatory consequences that ultimately hinders large scale applicability of pre-trained models and undermine their truthfulness or fairness expectations. In this paper, a mixed active sampling and data generation strategy -- called samplation -- is proposed as a mean to compensate during fine-tuning of a pre-trained classifer the unfair classifications it produces, assuming that the training data come from a non-probabilistic sampling schema. Given a pre-trained classifier, first a fairness metric is evaluated on a test set, then new reservoirs of labeled data are generated and finally a number of reversely-biased artificial samples are generated for the fine-tuning of the model. Using as case study Deep Models for visual semantic role labeling, the proposed method has been able to fully cure a simulated gender bias starting from a 90/10 imbalance, with only a small percentage of new data and with a minor effect on accuracy.

artificial intelligence, big data and data science, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2503.20414

Country: North America > United States > New York > New York County > New York City (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Symbolic Audio Classification via Modal Decision Tree Learning

Marzano, Enrico, Pagliarini, Giovanni, Pasini, Riccardo, Sciavicco, Guido, Stan, Ionel Eduard

arXiv.org Artificial IntelligenceMar-21-2025

The range of potential applications of acoustic analysis is wide. Classification of sounds, in particular, is a typical machine learning task that received a lot of attention in recent years. The most common approaches to sound classification are sub-symbolic, typically based on neural networks, and result in black-box models with high performances but very low transparency. In this work, we consider several audio tasks, namely, age and gender recognition, emotion classification, and respiratory disease diagnosis, and we approach them with a symbolic technique, that is, (modal) decision tree learning. We prove that such tasks can be solved using the same symbolic pipeline, that allows to extract simple rules with very high accuracy and low complexity. In principle, all such tasks could be associated to an autonomous conversation system, which could be useful in different contexts, such as an automatic reservation agent for an hospital or a clinic.

artificial intelligence, big data and data science, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2503.17018

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Friuli Venezia Giulia (0.04)
Europe > Greece > Central Macedonia > Thessaloniki (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Interpretable Machine Learning for Oral Lesion Diagnosis through Prototypical Instances Identification

Cascione, Alessio, Setzu, Mattia, Galatolo, Federico A., Cimino, Mario G. C. A., Guidotti, Riccardo

arXiv.org Artificial IntelligenceMar-21-2025

Decision-making processes in healthcare can be highly complex and challenging. Machine Learning tools offer significant potential to assist in these processes. However, many current methodologies rely on complex models that are not easily interpretable by experts. This underscores the need to develop interpretable models that can provide meaningful support in clinical decision-making. When approaching such tasks, humans typically compare the situation at hand to a few key examples and representative cases imprinted in their memory. Using an approach which selects such exemplary cases and grounds its predictions on them could contribute to obtaining high-performing interpretable solutions to such problems. To this end, we evaluate PivotTree, an interpretable prototype selection model, on an oral lesion detection problem, specifically trying to detect the presence of neoplastic, aphthous and traumatic ulcerated lesions from oral cavity images. We demonstrate the efficacy of using such method in terms of performance and offer a qualitative and quantitative comparison between exemplary cases and ground-truth prototypes selected by experts.

artificial intelligence, inductive learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.16938

Country:

North America > United States > Wisconsin (0.04)
Europe > Italy > Tuscany (0.04)

Genre:

Overview (0.68)
Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (0.93)
Health & Medicine > Therapeutic Area > Dermatology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.47)
(2 more...)

Add feedback

From Text to Talent: A Pipeline for Extracting Insights from Candidate Profiles

Frazzetto, Paolo, Haq, Muhammad Uzair Ul, Fabris, Flavia, Sperduti, Alessandro

arXiv.org Artificial IntelligenceMar-21-2025

The recruitment process is undergoing a significant transformation with the increasing use of machine learning and natural language processing techniques. While previous studies have focused on automating candidate selection, the role of multiple vacancies in this process remains understudied. This paper addresses this gap by proposing a novel pipeline that leverages Large Language Models and graph similarity measures to suggest ideal candidates for specific job openings. Our approach represents candidate profiles as multimodal embeddings, enabling the capture of nuanced relationships between job requirements and candidate attributes. The proposed approach has significant implications for the recruitment industry, enabling companies to streamline their hiring processes and identify top talent more efficiently. Our work contributes to the growing body of research on the application of machine learning in human resources, highlighting the potential of LLMs and graph-based methods in revolutionizing the recruitment landscape.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.17438

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Italy (0.04)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Multivariate Time Series Anomaly Detection in Industry 5.0

Colombi, Lorenzo, Vespa, Michela, Belletti, Nicolas, Brina, Matteo, Dahdal, Simon, Tabanelli, Filippo, Bellodi, Elena, Tortonesi, Mauro, Stefanelli, Cesare, Vignoli, Massimiliano

arXiv.org Artificial IntelligenceMar-20-2025

Industry 5.0 environments present a critical need for effective anomaly detection methods that can indicate equipment malfunctions, process inefficiencies, or potential safety hazards. The ever-increasing sensorization of manufacturing lines makes processes more observable, but also poses the challenge of continuously analyzing vast amounts of multivariate time series data. These challenges include data quality since data may contain noise, be unlabeled or even mislabeled. A promising approach consists of combining an embedding model with other Machine Learning algorithms to enhance the overall performance in detecting anomalies. Moreover, representing time series as vectors brings many advantages like higher flexibility and improved ability to capture complex temporal dependencies. We tested our solution in a real industrial use case, using data collected from a Bonfiglioli plant. The results demonstrate that, unlike traditional reconstruction-based autoencoders, which often struggle in the presence of sporadic noise, our embedding-based framework maintains high performance across various noise conditions.

anomaly, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.15946

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Italy (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Visually Linking AI, Machine Learning, Deep Learning, Big Data and Data Science

#artificialintelligenceDec-18-2017, 11:16:18 GMT

Over the past few years AI has exploded, and especially since 2015. Much of that has to do with the wide availability of GPUs that make parallel processing ever faster, cheaper, and more powerful. It also has to do with the simultaneous one-two punch of practically infinite storage and a flood of data of every stripe (that whole Big Data movement) – images, text, transactions, mapping data, you name it.

artificial intelligence, data mining, machine learning, (3 more...)

#artificialintelligence

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Quantum Computing - How it may Revolutionize Big Data?

@machinelearnbotNov-8-2017, 19:20:12 GMT

Quantum computing is round the corner and uses in machine learning are already being hypothesised [2]. Big Data is involved in virtually every industry, from health care to finance. Each industry and task requires different approaches to how data is utilised. One thing that remains consistent is the increasing volume, velocity and variety of data (the 3V's of Big Data) involved and the challenge to put as much data to use for real time, effective solutions [3]. The involvement of machine learning has undoubtedly revolutionised this process, leading to far more accurate models and solutions to challenges in more unique ways [4].

artificial intelligence, data mining, machine learning, (14 more...)

@machinelearnbot

Country: North America > United States (0.16)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.96)
Information Technology > Artificial Intelligence > Machine Learning (0.81)

Add feedback

Visually Linking AI, Machine Learning, Deep Learning, Big Data and Data Science

#artificialintelligenceNov-26-2016, 14:10:16 GMT

artificial intelligence, data mining, machine learning, (3 more...)

#artificialintelligence

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback