AITopics | global data

Collaborating Authors

global data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models Angéline Pouget

Neural Information Processing SystemsFeb-17-2026, 22:22:01 GMT

Second, pretraining with global, unfiltered data before fine-tuning on English content can improve cultural understanding without sacrificing performance on said popular benchmarks.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.04)
South America > Brazil > São Paulo (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models Angéline Pouget

Neural Information Processing SystemsOct-10-2025, 15:30:45 GMT

Second, pretraining with global, unfiltered data before fine-tuning on English content can improve cultural understanding without sacrificing performance on said popular benchmarks.

arxiv preprint arxiv, benchmark, dataset, (14 more...)

Neural Information Processing Systems

Country:

Europe (0.04)
South America > Brazil > São Paulo (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Tuning LLMs by RAG Principles: Towards LLM-native Memory

Wei, Jiale, Wu, Shuchi, Liu, Ruochen, Ying, Xiang, Shang, Jingbo, Tao, Fangbo

arXiv.org Artificial IntelligenceMar-20-2025

Memory, additional information beyond the training of large language models (LLMs), is crucial to various real-world applications, such as personal assistant. The two mainstream solutions to incorporate memory into the generation process are long-context LLMs and retrieval-augmented generation (RAG). In this paper, we first systematically compare these two types of solutions on three renovated/new datasets and show that (1) long-context solutions, although more expensive, shall be easier to capture the big picture and better answer queries which require considering the memory as a whole; and (2) when the queries concern specific information, RAG solutions shall be more competitive especially when the keywords can be explicitly matched. Therefore, we propose a novel method RAG-Tuned-LLM which fine-tunes a relative small (e.g., 7B) LLM using the data generated following the RAG principles, so it can combine the advantages of both solutions. Extensive experiments on three datasets demonstrate that RAG-Tuned-LLM can beat long-context LLMs and RAG methods across a wide range of query types.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.16071

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Contrastive Federated Learning with Tabular Data Silos

Ginanjar, Achmad, Li, Xue, Hua, Wen

arXiv.org Artificial IntelligenceSep-9-2024

Learning from data silos is a difficult task for organizations that need to obtain knowledge of objects that appeared in multiple independent data silos. Objects in multi-organizations, such as government agents, are referred by different identifiers, such as driver license, passport number, and tax file number. The data distributions in data silos are mostly non-IID (Independently and Identically Distributed), labelless, and vertically partitioned (i.e., having different attributes). Privacy concerns harden the above issues. Conditions inhibit enthusiasm for collaborative work. While Federated Learning (FL) has been proposed to address these issues, the difficulty of labeling, namely, label costliness, often hinders optimal model performance. A potential solution lies in contrastive learning, an unsupervised self-learning technique to represent semantic data by contrasting similar data pairs. However, contrastive learning is currently not designed to handle tabular data silos that existed within multiple organizations where data linkage by quasi identifiers are needed. To address these challenges, we propose using semi-supervised contrastive federated learning, which we refer to as Contrastive Federated Learning with Data Silos (CFL). Our approach tackles the aforementioned issues with an integrated solution. Our experimental results demonstrate that CFL outperforms current methods in addressing these challenges and providing improvements in accuracy. Additionally, we present positive results that showcase the advantages of our contrastive federated learning approach in complex client environments.

contrastive learning, learning, silo, (13 more...)

arXiv.org Artificial Intelligence

2409.06123

Country:

Oceania > Australia > Queensland (0.04)
Asia > China > Hong Kong (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Asia > Indonesia (0.04)

Genre:

Research Report > New Finding (0.89)
Research Report > Promising Solution (0.88)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models

Pouget, Angéline, Beyer, Lucas, Bugliarello, Emanuele, Wang, Xiao, Steiner, Andreas Peter, Zhai, Xiaohua, Alabdulmohsin, Ibrahim

arXiv.org Artificial IntelligenceMay-24-2024

We study cultural and socioeconomic diversity in contrastive vision-language models (VLMs). Using a broad range of benchmark datasets and evaluation metrics, we bring to attention several important findings. First, the common filtering of training data to English image-text pairs disadvantages communities of lower socioeconomic status and negatively impacts cultural understanding. Notably, this performance gap is not captured by - and even at odds with - the currently popular evaluation metrics derived from the Western-centric ImageNet and COCO datasets. Second, pretraining with global, unfiltered data before fine-tuning on English content can improve cultural understanding without sacrificing performance on said popular benchmarks. Third, we introduce the task of geo-localization as a novel evaluation metric to assess cultural diversity in VLMs. Our work underscores the value of using diverse data to create more inclusive multimodal systems and lays the groundwork for developing VLMs that better represent global perspectives.

arxiv preprint arxiv, cultural diversity, dataset, (13 more...)

arXiv.org Artificial Intelligence

2405.13777

Country:

Europe (0.04)
South America > Brazil > São Paulo (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

Yang, Karren, Hu, Ting-Yao, Chang, Jen-Hao Rick, Koppula, Hema Swetha, Tuzel, Oncel

arXiv.org Artificial IntelligenceMar-26-2023

Adapting generic speech recognition models to specific individuals is a challenging problem due to the scarcity of personalized data. Recent works have proposed boosting the amount of training data using personalized text-to-speech synthesis. Here, we ask two fundamental questions about this strategy: when is synthetic data effective for personalization, and why is it effective in those cases? To address the first question, we adapt a state-of-the-art automatic speech recognition (ASR) model to target speakers from four benchmark datasets representative of different speaker types. We show that ASR personalization with synthetic data is effective in all cases, but particularly when (i) the target speaker is underrepresented in the global data, and (ii) the capacity of the global model is limited. To address the second question of why personalized synthetic data is effective, we use controllable speech synthesis to generate speech with varied styles and content. Surprisingly, we find that the text content of the synthetic data, rather than style, is important for speaker adaptation. These results lead us to propose a data selection strategy for ASR personalization based on speech content.

artificial intelligence, personalization, speech recognition, (15 more...)

arXiv.org Artificial Intelligence

2303.14885

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.92)

Add feedback

Leveraging the power of AI and machine learning for more resilient data centers - ET CIO

#artificialintelligenceJun-6-2022, 04:30:38 GMT

By Sachin Bhalla According to Market Research Company Technavio, the global data center market size is poised to grow by $304.87 million between 2020 and 2024. It will grow at an even faster pace in the Asia-Pacific region. An S&P study reveals that between 2017 and 2022, the Asia-Pacific region would reach an estimated 10% CAGR compared to the global data center industry that is expected to clock a 7% CAGR. The data center industry is undergoing changes to serve the needs in today's business landscape. It comes to no surprise when we hear organisations discuss their plans to enhance their data center infrastructure with technologies like Artificial Intelligence (AI) and focus on automation to improve uptime while controlling costs--all of which are important for companies to drive operational efficiency and business resiliency.

ai and machine, maintenance, resilient data, (10 more...)

#artificialintelligence

Country:

North America > United States (0.18)
Asia > India (0.06)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.45)

Add feedback

On-Device Learning with Cloud-Coordinated Data Augmentation for Extreme Model Personalization in Recommender Systems

Gu, Renjie, Niu, Chaoyue, Yan, Yikai, Wu, Fan, Tang, Shaojie, Jia, Rongfeng, Lyu, Chengfei, Chen, Guihai

arXiv.org Artificial IntelligenceJan-23-2022

Data heterogeneity is an intrinsic property of recommender systems, making models trained over the global data on the cloud, which is the mainstream in industry, non-optimal to each individual user's local data distribution. To deal with data heterogeneity, model personalization with on-device learning is a potential solution. However, on-device training using a user's small size of local samples will incur severe overfitting and undermine the model's generalization ability. In this work, we propose a new device-cloud collaborative learning framework, called CoDA, to break the dilemmas of purely cloud-based learning and on-device learning. The key principle of CoDA is to retrieve similar samples from the cloud's global pool to augment each user's local dataset to train the recommendation model. Specifically, after a coarse-grained sample matching on the cloud, a personalized sample classifier is further trained on each device for a fine-grained sample filtering, which can learn the boundary between the local data distribution and the outside data distribution. We also build an end-to-end pipeline to support the flows of data, model, computation, and control between the cloud and each device. We have deployed CoDA in a recommendation scenario of Mobile Taobao. Online A/B testing results show the remarkable performance improvement of CoDA over both cloud-based learning without model personalization and on-device training without data augmentation. Overhead testing on a real device demonstrates the computation, storage, and communication efficiency of the on-device tasks in CoDA.

classifier, learning, sample classifier, (13 more...)

arXiv.org Artificial Intelligence

2201.10382

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Virginia (0.04)
North America > United States > Texas (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Information Technology > Services (0.69)

Technology:

Information Technology > Communications (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
(2 more...)

Add feedback

What's Ahead for Data Processing, Hosting Industry?

#artificialintelligenceJul-30-2020, 10:05:48 GMT

Artificial intelligence (AI) may have popularized big data but applications like the internet of things (IoT) are taking it to the next level. The need to analyze large datasets to come up with insights and the increasing internet traffic is increasing the demand for data and digital storage services. The global data center market should expand by $304.87 billion between 2020 and 2024, according to a Technavio report. This puts the current global data center market size to more than $300 billion. According to Internet World Stats, in mid-2019 there were more than 4.5 billion Internet users worldwide, That's 58.8% of the global population with Internet access and the numbers are growing as internet penetration improves.

artificial intelligence, cloud computing, data processing, (15 more...)

#artificialintelligence

Country:

Europe > Lithuania (0.07)
Asia > Singapore (0.07)
North America > United States (0.05)

Industry: Information Technology > Services (0.84)

Technology:

Information Technology > Cloud Computing (0.84)
Information Technology > Communications > Networks (0.58)
Information Technology > Artificial Intelligence (0.53)

Add feedback

How India can become an AI powerhouse

#artificialintelligenceJul-26-2020, 15:02:56 GMT

Data is turning out to be more valuable than we thought. Google and Facebook's ad revenues exceeded $200 billion last year. They can hope to have a bigger source of income soon, thanks to the income generated by the Artificial Intelligence (AI) business built using the data of billions of individuals. No wonder, getting hold of data by paying top dollars is the new game in the digital world. This may explain the sudden investments of Google, Facebook, Intel, and many others in India, one of the largest data generators of the world.

artificial intelligence, india, social media, (7 more...)

#artificialintelligence

Country:

Asia > India (0.68)
Africa (0.05)

Industry:

Health & Medicine > Therapeutic Area (0.52)
Health & Medicine > Pharmaceuticals & Biotechnology (0.52)
Information Technology > Services (0.51)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback