AITopics | Communications: Overviews

Collaborating Authors

Communications: Overviews

News Overviews Instructional Materials AI-Alerts Classics

A Survey on Data Markets

Zhang, Jiayao, Bi, Yuran, Cheng, Mengye, Liu, Jinfei, Ren, Kui, Sun, Qiheng, Wu, Yihang, Cao, Yang, Fernandez, Raul Castro, Xu, Haifeng, Jia, Ruoxi, Kwon, Yongchan, Pei, Jian, Wang, Jiachen T., Xia, Haocheng, Xiong, Li, Yu, Xiaohui, Zou, James

arXiv.org Artificial IntelligenceNov-9-2024

Data is the new oil of the 21st century. The growing trend of trading data for greater welfare has led to the emergence of data markets. A data market is any mechanism whereby the exchange of data products including datasets and data derivatives takes place as a result of data buyers and data sellers being in contact with one another, either directly or through mediating agents. It serves as a coordinating mechanism by which several functions, including the pricing and the distribution of data as the most important ones, interact to make the value of data fully exploited and enhanced. In this article, we present a comprehensive survey of this important and emerging direction from the aspects of data search, data productization, data transaction, data pricing, revenue allocation as well as privacy, security, and trust issues. We also investigate the government policies and industry status of data markets across different countries and different domains. Finally, we identify the unresolved challenges and discuss possible future directions for the development of data markets.

kamalika chaudhuri and ruslan salakhutdinov, knowledge management, machine learning, (31 more...)

arXiv.org Artificial Intelligence

2411.07267

Country:

Asia > China (1.00)
Africa (1.00)
Europe > France (0.67)
(7 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.45)
Research Report > New Finding (0.45)

Industry:

Telecommunications (1.00)
Law > Civil Rights & Constitutional Law (1.00)
Law Enforcement & Public Safety (1.00)
(13 more...)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Software Engineering (1.00)
Information Technology > Security & Privacy (1.00)
(20 more...)

Add feedback

Sentiment Analysis of Cyberbullying Data in Social Media

Susmitha, Arvapalli Sai, Pujari, Pradeep

arXiv.org Artificial IntelligenceNov-8-2024

Social media has become an integral part of modern life, but it has also brought with it the pervasive issue of cyberbullying a serious menace in today's digital age. Cyberbullying, a form of harassment that occurs on social networks, has escalated alongside the growth of these platforms. Sentiment analysis holds significant potential not only for detecting bullying phrases but also for identifying victims who are at high risk of harm, whether to themselves or others. Our work focuses on leveraging deep learning and natural language understanding techniques to detect traces of bullying in social media posts. We developed a Recurrent Neural Network with Long Short-Term Memory (LSTM) cells, using different embeddings. One approach utilizes BERT embeddings, while the other replaces the embeddings layer with the recently released embeddings API from OpenAI. We conducted a performance comparison between these two approaches to evaluate their effectiveness in sentiment analysis of Formspring Cyberbullying data. Our Code is Available at https://github.com/ppujari/xcs224u

machine learning, natural language, sentiment analysis, (20 more...)

arXiv.org Artificial Intelligence

2411.05958

Country:

Europe (0.28)
Asia (0.28)
North America > Canada > Ontario (0.14)

Genre:

Research Report (0.64)
Overview (0.46)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.32)

Add feedback

The State and Fate of Summarization Datasets

Dahan, Noam, Stanovsky, Gabriel

arXiv.org Artificial IntelligenceNov-7-2024

Automatic summarization has consistently attracted attention, due to its versatility and wide application in various downstream tasks. Despite its popularity, we find that annotation efforts have largely been disjointed, and have lacked common terminology. Consequently, it is challenging to discover existing resources or identify coherent research directions. To address this, we survey a large body of work spanning 133 datasets in over 100 languages, creating a novel ontology covering sample properties, collection methods and distribution. With this ontology we make key observations, including the lack in accessible high-quality datasets for low-resource languages, and the field's over-reliance on the news domain and on automatically collected distant supervision. Finally, we make available a web interface that allows users to interact and explore our ontology and dataset collection, as well as a template for a summarization data card, which can be used to streamline future research into a more coherent body of work.

artificial intelligence, natural language, social media, (15 more...)

arXiv.org Artificial Intelligence

2411.04585

Country:

Asia (1.00)
Europe (0.93)
North America > United States (0.28)

Genre: Overview (1.00)

Industry:

Law (0.68)
Media > News (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media (0.68)

Add feedback

Survey on Semantic Interpretation of Tabular Data: Challenges and Directions

Cremaschi, Marco, Spahiu, Blerina, Palmonari, Matteo, Jimenez-Ruiz, Ernesto

arXiv.org Artificial IntelligenceNov-7-2024

Tabular data plays a pivotal role in various fields, making it a popular format for data manipulation and exchange, particularly on the web. The interpretation, extraction, and processing of tabular information are invaluable for knowledge-intensive applications. Notably, significant efforts have been invested in annotating tabular data with ontologies and entities from background knowledge graphs, a process known as Semantic Table Interpretation (STI). STI automation aids in building knowledge graphs, enriching data, and enhancing web-based question answering. This survey aims to provide a comprehensive overview of the STI landscape. It starts by categorizing approaches using a taxonomy of 31 attributes, allowing for comparisons and evaluations. It also examines available tools, assessing them based on 12 criteria. Furthermore, the survey offers an in-depth analysis of the Gold Standards used for evaluating STI approaches. Finally, it provides practical guidance to help end-users choose the most suitable approach for their specific tasks while also discussing unresolved issues and suggesting potential future research directions.

knowledge management, large language model, machine learning, (25 more...)

arXiv.org Artificial Intelligence

2411.11891

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (1.00)
Transportation > Passenger (0.67)
Transportation > Air (0.67)
(2 more...)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
(9 more...)

Add feedback

A Guide to Misinformation Detection Datasets

Thibault, Camille, Peloquin-Skulski, Gabrielle, Tian, Jacob-Junqi, Laflamme, Florence, Guan, Yuxiang, Rabbany, Reihaneh, Godbout, Jean-François, Pelrine, Kellin

arXiv.org Artificial IntelligenceNov-7-2024

Misinformation is a complex societal issue, and mitigating solutions are difficult to create due to data deficiencies. To address this problem, we have curated the largest collection of (mis)information datasets in the literature, totaling 75. From these, we evaluated the quality of all of the 36 datasets that consist of statements or claims. We assess these datasets to identify those with solid foundations for empirical work and those with flaws that could result in misleading and non-generalizable results, such as insufficient label quality, spurious correlations, or political bias. We further provide state-of-the-art baselines on all these datasets, but show that regardless of label quality, categorical labels may no longer give an accurate evaluation of detection model performance. We discuss alternatives to mitigate this problem. Overall, this guide aims to provide a roadmap for obtaining higher quality data and conducting more effective evaluations, ultimately improving research in misinformation detection. All datasets and other artifacts are available at https://misinfo-datasets.complexdatalab.com/.

data mining, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2411.0506

Country:

North America > United States (1.00)
Asia (1.00)
North America > Canada > Quebec > Montreal (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Media > News (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
(4 more...)

Add feedback

YouTube Comments Decoded: Leveraging LLMs for Low Resource Language Classification

Deroy, Aniket, Maity, Subhankar

arXiv.org Artificial IntelligenceNov-6-2024

Sarcasm detection is a significant challenge in sentiment analysis, particularly due to its nature of conveying opinions where the intended meaning deviates from the literal expression. This challenge is heightened in social media contexts where code-mixing, especially in Dravidian languages, is prevalent. Code-mixing involves the blending of multiple languages within a single utterance, often with non-native scripts, complicating the task for systems trained on monolingual data. This shared task introduces a novel gold standard corpus designed for sarcasm and sentiment detection within code-mixed texts, specifically in Tamil-English and Malayalam-English languages. The primary objective of this task is to identify sarcasm and sentiment polarity within a code-mixed dataset of Tamil-English and Malayalam-English comments and posts collected from social media platforms. Each comment or post is annotated at the message level for sentiment polarity, with particular attention to the challenges posed by class imbalance, reflecting real-world scenarios.In this work, we experiment with state-of-the-art large language models like GPT-3.5 Turbo via prompting to classify comments into sarcastic or non-sarcastic categories. We obtained a macro-F1 score of 0.61 for Tamil language. We obtained a macro-F1 score of 0.50 for Malayalam language.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.05039

Country:

Asia > India (0.29)
North America > United States (0.28)

Genre:

Overview (1.00)
Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis

Zbeeb, Mohammad, Ghorayeb, Mohammad, Salman, Mariam

arXiv.org Artificial IntelligenceNov-6-2024

Artificial Intelligence (AI) research often aims to develop models that can generalize reliably across complex datasets, yet this remains challenging in fields where data is scarce, intricate, or inaccessible. This paper introduces a novel approach that leverages three generative models of varying complexity to synthesize one of the most demanding structured datasets: Malicious Network Traffic. Our approach uniquely transforms numerical data into text, re-framing data generation as a language modeling task, which not only enhances data regularization but also significantly improves generalization and the quality of the synthetic data. Extensive statistical analyses demonstrate that our method surpasses state-of-the-art generative models in producing high-fidelity synthetic data. Additionally, we conduct a comprehensive study on synthetic data applications, effectiveness, and evaluation strategies, offering valuable insights into its role across various domains. Our code and pre-trained models are openly accessible at Github, enabling further exploration and application of our methodology. Index Terms: Data synthesis, machine learning, traffic generation, privacy preserving data, generative models.

data mining, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.33140/JDAEDM

2411.01929

Country: North America > United States (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

International Scientific Report on the Safety of Advanced AI (Interim Report)

Bengio, Yoshua, Mindermann, Sören, Privitera, Daniel, Besiroglu, Tamay, Bommasani, Rishi, Casper, Stephen, Choi, Yejin, Goldfarb, Danielle, Heidari, Hoda, Khalatbari, Leila, Longpre, Shayne, Mavroudis, Vasilios, Mazeika, Mantas, Ng, Kwan Yee, Okolo, Chinasa T., Raji, Deborah, Skeadas, Theodora, Tramèr, Florian, Adekanmbi, Bayo, Christiano, Paul, Dalrymple, David, Dietterich, Thomas G., Felten, Edward, Fung, Pascale, Gourinchas, Pierre-Olivier, Jennings, Nick, Krause, Andreas, Liang, Percy, Ludermir, Teresa, Marda, Vidushi, Margetts, Helen, McDermid, John A., Narayanan, Arvind, Nelson, Alondra, Oh, Alice, Ramchurn, Gopal, Russell, Stuart, Schaake, Marietje, Song, Dawn, Soto, Alvaro, Tiedrich, Lee, Varoquaux, Gaël, Yao, Andrew, Zhang, Ya-Qin

arXiv.org Artificial IntelligenceNov-5-2024

I am honoured to be chairing the delivery of the inaugural International Scientific Report on Advanced AI Safety. I am proud to publish this interim report which is the culmination of huge efforts by many experts over the six months since the work was commissioned at the Bletchley Park AI Safety Summit in November 2023. We know that advanced AI is developing very rapidly, and that there is considerable uncertainty over how these advanced AI systems might affect how we live and work in the future. AI has tremendous potential to change our lives for the better, but it also poses risks of harm. That is why having this thorough analysis of the available scientific literature and expert opinion is essential. The more we know, the better equipped we are to shape our collective destiny.

large language model, machine learning, pattern recognition, (26 more...)

arXiv.org Artificial Intelligence

2412.05282

Country:

Asia > Middle East (1.00)
Europe > United Kingdom > England > Buckinghamshire > Milton Keynes (0.24)
North America > United States > California > Santa Clara County (0.14)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)
(2 more...)

Industry:

Transportation (1.00)
Media > News (1.00)
Leisure & Entertainment (1.00)
(14 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
(13 more...)

Add feedback

Digital Twin for Autonomous Surface Vessels: Enabler for Safe Maritime Navigation

Menges, Daniel, Rasheed, Adil

arXiv.org Artificial IntelligenceNov-5-2024

Autonomous surface vessels (ASVs) are becoming increasingly significant in enhancing the safety and sustainability of maritime operations. To ensure the reliability of modern control algorithms utilized in these vessels, digital twins (DTs) provide a robust framework for conducting safe and effective simulations within a virtual environment. Digital twins are generally classified on a scale from 0 to 5, with each level representing a progression in complexity and functionality: Level 0 (Standalone) employs offline modeling techniques; Level 1 (Descriptive) integrates sensors and online modeling to enhance situational awareness; Level 2 (Diagnostic) focuses on condition monitoring and cybersecurity; Level 3 (Predictive) incorporates predictive analytics; Level 4 (Prescriptive) embeds decision-support systems; and Level 5 (Autonomous) enables advanced functionalities such as collision avoidance and path following. These digital representations not only provide insights into the vessel's current state and operational efficiency but also predict future scenarios and assess life endurance. By continuously updating with real-time sensor data, the digital twin effectively corrects modeling errors and enhances decision-making processes. Since DTs are key enablers for complex autonomous systems, this paper introduces a comprehensive methodology for establishing a digital twin framework specifically tailored for ASVs. Through a detailed literature survey, we explore existing state-of-the-art enablers across the defined levels, offering valuable recommendations for future research and development in this rapidly evolving field.

data mining, machine learning, real time system, (22 more...)

arXiv.org Artificial Intelligence

2411.03465

Country:

Europe (1.00)
Asia > China (1.00)
North America > United States > California (0.27)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Transportation > Marine (1.00)
Transportation > Freight & Logistics Services > Shipping (1.00)
Law (1.00)
(6 more...)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Modeling & Simulation (1.00)
(11 more...)

Add feedback

Foundations and Recent Trends in Multimodal Mobile Agents: A Survey

Wu, Biao, Li, Yanda, Fang, Meng, Song, Zirui, Zhang, Zhiwei, Wei, Yunchao, Chen, Ling

arXiv.org Artificial IntelligenceNov-4-2024

Mobile agents are essential for automating tasks in complex and dynamic mobile environments. As foundation models evolve, the demands for agents that can adapt in real-time and process multimodal data have grown. This survey provides a comprehensive review of mobile agent technologies, focusing on recent advancements that enhance real-time adaptability and multimodal interaction. Recent evaluation benchmarks have been developed better to capture the static and interactive environments of mobile tasks, offering more accurate assessments of agents' performance. We then categorize these advancements into two main approaches: prompt-based methods, which utilize large language models (LLMs) for instruction-based task execution, and training-based methods, which fine-tune multimodal models for mobile-specific applications. Additionally, we explore complementary technologies that augment agent performance. By discussing key challenges and outlining future research directions, this survey offers valuable insights for advancing mobile agent technologies. A comprehensive resource list is available at https://github.com/aialt/awesome-mobile-agents

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.02006

Country:

Europe (1.00)
North America > United States > Texas (0.14)
Asia > Middle East > UAE (0.14)

Genre: Overview (1.00)

Industry:

Information Technology (0.67)
Health & Medicine (0.46)
Education > Educational Setting (0.46)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback