AITopics | open-source data

Collaborating Authors

open-source data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

7efe88bb4138d602e56637cfcf713654-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 05:04:49 GMT

accuracy, dataset, efficiency, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.68)
Information Technology > Services (0.67)
Social Sector (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
Information Technology > Cloud Computing (1.00)
(6 more...)

Add feedback

7efe88bb4138d602e56637cfcf713654-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 05:04:45 GMT

dataset, learning, open-source data, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Social Sector (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(6 more...)

Add feedback

Outsourcing Training without Uploading Data via Efficient Collaborative Open-Source Sampling

Neural Information Processing SystemsDec-24-2025, 14:32:04 GMT

As deep learning blooms with growing demand for computation and data resources, outsourcing model training to a powerful cloud server becomes an attractive alternative to training at a low-power and cost-effective end device. Traditional outsourcing requires uploading device data to the cloud server, which can be infeasible in many real-world applications due to the often sensitive nature of the collected data and the limited communication bandwidth. To tackle these challenges, we propose to leverage widely available open-source data, which is a massive dataset collected from public and heterogeneous sources (e.g., Internet images). We develop a novel strategy called Efficient Collaborative Open-source Sampling (ECOS) to construct a proximal proxy dataset from open-source data for cloud training, in lieu of client data. ECOS probes open-source data on the cloud server to sense the distribution of client data via a communication-and computation-efficient sampling process, which only communicates a few compressed public features and client scalar responses. Extensive empirical studies show that the proposed ECOS improves the quality of automated client labeling, model compression, and label outsourcing when applied in various learning scenarios. Source codes will be released.

efficient collaborative open-source sampling, outsourcing training, uploading data, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Outsourcing Training without Uploading Data via Efficient Collaborative Open-Source Sampling

Neural Information Processing SystemsAug-16-2025, 10:38:48 GMT

accuracy, dataset, efficiency, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.68)
Information Technology > Services (0.67)
Social Sector (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Communications (1.00)
(5 more...)

Add feedback

Outsourcing Training without Uploading Data via Efficient Collaborative Open-Source Sampling

Neural Information Processing SystemsAug-16-2025, 10:38:40 GMT

artificial intelligence, cloud computing, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Social Sector (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Cloud Computing (1.00)
(4 more...)

Add feedback

Outsourcing Training without Uploading Data via Efficient Collaborative Open-Source Sampling

Neural Information Processing SystemsJan-15-2025, 16:35:31 GMT

As deep learning blooms with growing demand for computation and data resources, outsourcing model training to a powerful cloud server becomes an attractive alternative to training at a low-power and cost-effective end device. Traditional outsourcing requires uploading device data to the cloud server, which can be infeasible in many real-world applications due to the often sensitive nature of the collected data and the limited communication bandwidth. To tackle these challenges, we propose to leverage widely available open-source data, which is a massive dataset collected from public and heterogeneous sources (e.g., Internet images). We develop a novel strategy called Efficient Collaborative Open-source Sampling (ECOS) to construct a proximal proxy dataset from open-source data for cloud training, in lieu of client data. ECOS probes open-source data on the cloud server to sense the distribution of client data via a communication- and computation-efficient sampling process, which only communicates a few compressed public features and client scalar responses.

efficient collaborative open-source sampling, open-source data, outsourcing training, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Software (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Koala: A dialogue model for academic research

AIHubApr-18-2023, 13:12:00 GMT

In this post, we introduce Koala, a chatbot trained by fine-tuning Meta's LLaMA on dialogue data gathered from the web. We describe the dataset curation and training process of our model, and also present the results of a user study that compares our model to ChatGPT and Stanford's Alpaca. Our results show that Koala can effectively respond to a variety of user queries, generating responses that are often preferred over Alpaca, and at least tied with ChatGPT in over half of the cases. We hope that these results contribute further to the discourse around the relative performance of large closed-source models to smaller public models. In particular, it suggests that models that are small enough to be run locally can capture much of the performance of their larger cousins if trained on carefully sourced data.

large language model, machine learning, natural language, (20 more...)

AIHub

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Valuation of Public Bus Electrification with Open Data

Vijay, Upadhi, Woo, Soomin, Moura, Scott J., Jain, Akshat, Rodriguez, David, Gambacorta, Sergio, Ferrara, Giuseppe, Lanuzza, Luigi, Zulberti, Christian, Mellekas, Erika, Papa, Carlo

arXiv.org Artificial IntelligenceSep-24-2022

This research provides a novel framework to estimate the economic, environmental, and social values of electrifying public transit buses, for cities across the world, based on open-source data. Electric buses are a compelling candidate to replace diesel buses for the environmental and social benefits. However, the state-of-art models to evaluate the value of bus electrification are limited in applicability because they require granular and bespoke data on bus operation that can be difficult to procure. Our valuation tool uses General Transit Feed Specification, a standard data format used by transit agencies worldwide, to provide high-level guidance on developing a prioritization strategy for electrifying a bus fleet. We develop physics-informed machine learning models to evaluate the energy consumption, the carbon emissions, the health impacts, and the total cost of ownership for each transit route. We demonstrate the scalability of our tool with a case study of the bus lines in the Greater Boston and Milan metropolitan areas. Detailed Affiliation: U.Vijay, S.Woo, and S.J.Moura are at Department of Civil and Environmental Engineering, University of California-Berkeley, Davis Hall, Berkeley, California, 94720, USA. A.Jain is at Department of Electrical Engineering and Computer Sciences, University of California-Berkeley, Soda Hall, Berkeley, California, 94720, USA. D.Rodriguez and E.Mellekas are at Enel X, North America, Inc., One Marina Park Drive, Boston, 02210, MA, USA. S. Gambacorta is at Enel X, Innovation and Sustainability Global, Smart City, Viale Tor di Quinto, Rome, 00191, Italy. G.Ferrara is at Enel X, Innovation and Sustainability Global, Smart City, Passo Martino, Catania, 95121, Italy. L.Lanuzza is at Enel X, Innovation and Sustainability B2C & B2B Innovation Factory, Viale Tor di Quinto, Rome, 00191, Italy. C.Zulberti and C.Papa are at Enel Foundation, Via Bellini, Rome, 00198, Italy. Vehicle electrification is crucial for reducing the climate impact of the transportation sector, which currently accounts for 16.2% of the global greenhouse gas emissions [22]. Zero-emission electric vehicles can significantly improve the air quality, health, and environmental equity [23], [24].

artificial intelligence, electric bus, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2209.12107

Country:

North America > United States > California > Alameda County > Berkeley (0.94)
North America > United States > Massachusetts (0.05)
Europe > Italy > Lombardy > Milan (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.87)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Facebook is trying to make AI fairer by paying people to give it data

#artificialintelligenceJul-16-2021, 03:50:22 GMT

Artificial intelligence systems are often criticized for built-in biases. Commercial facial-recognition software, for instance, may fail when attempting to classify women and people of color. In an effort to help make AI fairer in a variety of ways, Facebook (FB) is rolling out a new data set for AI researchers that includes a diverse group of paid actors who were explicitly asked to provide their own ages and genders. Facebook hopes researchers will use the open-source data set, which it announced Thursday, to help judge whether AI systems work well for people of different ages, genders, skin tones, and in different types of lighting. Facebook also released the data set internally for use within Facebook itself; the company said in a blog post that it is "encouraging" teams to use it.

facebook, gender, make ai fairer, (12 more...)

#artificialintelligence

Country: North America > United States (0.06)

Industry: Information Technology > Services (0.53)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Council Post: How Open-Source Data Can Drive Automotive Innovation

#artificialintelligenceJun-1-2020, 15:02:31 GMT

Sophisticated innovations in artificial intelligence, computer vision, tactile sensing and more are the driving forces behind smart and autonomous vehicles. Progress on these fronts will be crucial to making smart cars even more intelligent and bringing self-driving cars to fruition, but industry stakeholders are also concentrating on another key to automotive innovation: open-source data, which can provide more shared tools to propel innovative developments. In one of the most prominent illustrations of this trend, Waymo, the AV subsidiary of Google's parent company, Alphabet, made the Waymo Open Dataset public in 2019. Collected by sensors in Waymo's self-driving cars, the dataset features high-resolution multimodal sensor data that covers a variety of environments, from dense urban centers to suburban landscapes, offering insights into a wide range of driving conditions. Its release came on the heels of Lyft and Argo AI's rollouts of their own open-source datasets, and has since then been followed by the release of the Ford Autonomous Vehicle Dataset and Google's open-sourced Android Automotive OS, among others.

artificial intelligence, automotive industry, open-source data, (13 more...)

#artificialintelligence

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Information Technology > Robotics & Automation (0.77)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)

Add feedback