open-source data
- North America > United States > Michigan (0.04)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Security & Privacy (1.00)
- Government (0.68)
- Information Technology > Services (0.67)
- Social Sector (0.67)
- North America > United States > Michigan (0.04)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Security & Privacy (1.00)
- Social Sector (0.67)
Outsourcing Training without Uploading Data via Efficient Collaborative Open-Source Sampling
As deep learning blooms with growing demand for computation and data resources, outsourcing model training to a powerful cloud server becomes an attractive alternative to training at a low-power and cost-effective end device. Traditional outsourcing requires uploading device data to the cloud server, which can be infeasible in many real-world applications due to the often sensitive nature of the collected data and the limited communication bandwidth. To tackle these challenges, we propose to leverage widely available open-source data, which is a massive dataset collected from public and heterogeneous sources (e.g., Internet images). We develop a novel strategy called Efficient Collaborative Open-source Sampling (ECOS) to construct a proximal proxy dataset from open-source data for cloud training, in lieu of client data. ECOS probes open-source data on the cloud server to sense the distribution of client data via a communication-and computation-efficient sampling process, which only communicates a few compressed public features and client scalar responses. Extensive empirical studies show that the proposed ECOS improves the quality of automated client labeling, model compression, and label outsourcing when applied in various learning scenarios. Source codes will be released.
- North America > United States > Michigan (0.04)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Security & Privacy (1.00)
- Government (0.68)
- Information Technology > Services (0.67)
- Social Sector (0.67)
- North America > United States > Michigan (0.04)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Security & Privacy (1.00)
- Social Sector (0.67)
Outsourcing Training without Uploading Data via Efficient Collaborative Open-Source Sampling
As deep learning blooms with growing demand for computation and data resources, outsourcing model training to a powerful cloud server becomes an attractive alternative to training at a low-power and cost-effective end device. Traditional outsourcing requires uploading device data to the cloud server, which can be infeasible in many real-world applications due to the often sensitive nature of the collected data and the limited communication bandwidth. To tackle these challenges, we propose to leverage widely available open-source data, which is a massive dataset collected from public and heterogeneous sources (e.g., Internet images). We develop a novel strategy called Efficient Collaborative Open-source Sampling (ECOS) to construct a proximal proxy dataset from open-source data for cloud training, in lieu of client data. ECOS probes open-source data on the cloud server to sense the distribution of client data via a communication- and computation-efficient sampling process, which only communicates a few compressed public features and client scalar responses.
- Information Technology > Software (1.00)
- Information Technology > Cloud Computing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (0.43)
Koala: A dialogue model for academic research
In this post, we introduce Koala, a chatbot trained by fine-tuning Meta's LLaMA on dialogue data gathered from the web. We describe the dataset curation and training process of our model, and also present the results of a user study that compares our model to ChatGPT and Stanford's Alpaca. Our results show that Koala can effectively respond to a variety of user queries, generating responses that are often preferred over Alpaca, and at least tied with ChatGPT in over half of the cases. We hope that these results contribute further to the discourse around the relative performance of large closed-source models to smaller public models. In particular, it suggests that models that are small enough to be run locally can capture much of the performance of their larger cousins if trained on carefully sourced data.
Valuation of Public Bus Electrification with Open Data
Vijay, Upadhi, Woo, Soomin, Moura, Scott J., Jain, Akshat, Rodriguez, David, Gambacorta, Sergio, Ferrara, Giuseppe, Lanuzza, Luigi, Zulberti, Christian, Mellekas, Erika, Papa, Carlo
This research provides a novel framework to estimate the economic, environmental, and social values of electrifying public transit buses, for cities across the world, based on open-source data. Electric buses are a compelling candidate to replace diesel buses for the environmental and social benefits. However, the state-of-art models to evaluate the value of bus electrification are limited in applicability because they require granular and bespoke data on bus operation that can be difficult to procure. Our valuation tool uses General Transit Feed Specification, a standard data format used by transit agencies worldwide, to provide high-level guidance on developing a prioritization strategy for electrifying a bus fleet. We develop physics-informed machine learning models to evaluate the energy consumption, the carbon emissions, the health impacts, and the total cost of ownership for each transit route. We demonstrate the scalability of our tool with a case study of the bus lines in the Greater Boston and Milan metropolitan areas. Detailed Affiliation: U.Vijay, S.Woo, and S.J.Moura are at Department of Civil and Environmental Engineering, University of California-Berkeley, Davis Hall, Berkeley, California, 94720, USA. A.Jain is at Department of Electrical Engineering and Computer Sciences, University of California-Berkeley, Soda Hall, Berkeley, California, 94720, USA. D.Rodriguez and E.Mellekas are at Enel X, North America, Inc., One Marina Park Drive, Boston, 02210, MA, USA. S. Gambacorta is at Enel X, Innovation and Sustainability Global, Smart City, Viale Tor di Quinto, Rome, 00191, Italy. G.Ferrara is at Enel X, Innovation and Sustainability Global, Smart City, Passo Martino, Catania, 95121, Italy. L.Lanuzza is at Enel X, Innovation and Sustainability B2C & B2B Innovation Factory, Viale Tor di Quinto, Rome, 00191, Italy. C.Zulberti and C.Papa are at Enel Foundation, Via Bellini, Rome, 00198, Italy. Vehicle electrification is crucial for reducing the climate impact of the transportation sector, which currently accounts for 16.2% of the global greenhouse gas emissions [22]. Zero-emission electric vehicles can significantly improve the air quality, health, and environmental equity [23], [24].
- North America > United States > California > Alameda County > Berkeley (0.94)
- North America > United States > Massachusetts (0.05)
- Europe > Italy > Lombardy > Milan (0.04)
- (4 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.87)
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Transportation > Electric Vehicle (1.00)
- (2 more...)
Facebook is trying to make AI fairer by paying people to give it data
Artificial intelligence systems are often criticized for built-in biases. Commercial facial-recognition software, for instance, may fail when attempting to classify women and people of color. In an effort to help make AI fairer in a variety of ways, Facebook (FB) is rolling out a new data set for AI researchers that includes a diverse group of paid actors who were explicitly asked to provide their own ages and genders. Facebook hopes researchers will use the open-source data set, which it announced Thursday, to help judge whether AI systems work well for people of different ages, genders, skin tones, and in different types of lighting. Facebook also released the data set internally for use within Facebook itself; the company said in a blog post that it is "encouraging" teams to use it.
Council Post: How Open-Source Data Can Drive Automotive Innovation
Sophisticated innovations in artificial intelligence, computer vision, tactile sensing and more are the driving forces behind smart and autonomous vehicles. Progress on these fronts will be crucial to making smart cars even more intelligent and bringing self-driving cars to fruition, but industry stakeholders are also concentrating on another key to automotive innovation: open-source data, which can provide more shared tools to propel innovative developments. In one of the most prominent illustrations of this trend, Waymo, the AV subsidiary of Google's parent company, Alphabet, made the Waymo Open Dataset public in 2019. Collected by sensors in Waymo's self-driving cars, the dataset features high-resolution multimodal sensor data that covers a variety of environments, from dense urban centers to suburban landscapes, offering insights into a wide range of driving conditions. Its release came on the heels of Lyft and Argo AI's rollouts of their own open-source datasets, and has since then been followed by the release of the Ford Autonomous Vehicle Dataset and Google's open-sourced Android Automotive OS, among others.
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
- Information Technology > Robotics & Automation (0.77)