AITopics | open data

Collaborating Authors

open data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PeLLE: Encoder-based language models for Brazilian Portuguese based on open data

de Mello, Guilherme Lamartine, Finger, Marcelo, Serras, and Felipe, Carpi, Miguel de Mello, Jose, Marcos Menon, Domingues, Pedro Henrique, Cavalim, Paulo

arXiv.org Artificial IntelligenceFeb-29-2024

In this paper we present PeLLE, a family of large language models based on the RoBERTa architecture, for Brazilian Portuguese, trained on curated, open data from the Carolina corpus. Aiming at reproducible results, we describe details of the pretraining of the models. We also evaluate PeLLE models against a set of existing multilingual and PT-BR refined pretrained Transformer-based LLM encoders, contrasting performance of large versus smaller-but-curated pretrained models in several downstream tasks. We conclude that several tasks perform better with larger models, but some tasks benefit from smaller-but-curated data in its pretraining.

brazilian portuguese, dataset, portuguese, (14 more...)

arXiv.org Artificial Intelligence

2402.19204

Country:

Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
South America > Brazil > São Paulo (0.04)
(4 more...)

Genre: Research Report (0.40)

Industry: Law (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

Towards building a monitoring platform for a challenge-oriented smart specialisation with RIS3-MCAT

Fuster, Enric, Fernández, Tatiana, Carretero, Hermes, Duran-Silva, Nicolau, Guixé, Roger, Pujol, Josep, Rondelli, Bernardo, Rull, Guillem, Cortijo, Marta, Romagosa, Montserrat

arXiv.org Artificial IntelligenceDec-19-2023

In the new research and innovation (R&I) paradigm, aimed at a transformation towards more sustainable, inclusive and fair pathways to address societal and environmental challenges, and at generating new patterns of specialisation and new trajectories for socioeconomic development, it is essential to provide monitoring systems and tools to map and understand the contribution of R&I policies and projects. To address this transformation, we present the RIS3-MCAT platform, the result of a line of work aimed at exploring the potential of open data, semantic analysis, and data visualisation, for monitoring challenge-oriented smart specialisation in Catalonia. RIS3-MCAT is an interactive platform that facilitates access to R&I project data in formats that allow for sophisticated analyses of a large volume of texts, enabling the detailed study of thematic specialisations and challenges beyond classical classification systems. Its conceptualisation, development framework and use are presented in this paper. Keywords: open data, research and innovation policy, smart specialisation strategies, text mining, data visualisation, scientometrics 1. INTRODUCTION The challenges posed by globalisation, technology, climate change, and the COVID-19 pandemic require significant changes in our way of living. Although large transition costs are associated with a successful attainment of all those challenges, the potential opportunities brought about are enormous (Bigas et al., 2021).

european commission, platform, specialisation, (17 more...)

arXiv.org Artificial Intelligence

2401.109

Country:

North America > Montserrat (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report (0.50)

Industry:

Government (1.00)
Health & Medicine (0.74)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow

del Rio-Chanona, Maria, Laurentsyeva, Nadzeya, Wachs, Johannes

arXiv.org Artificial IntelligenceJul-14-2023

Large language models like ChatGPT efficiently provide users with information about various topics, presenting a potential substitute for searching the web and asking people for help online. But since users interact privately with the model, these models may drastically reduce the amount of publicly available human-generated data and knowledge resources. This substitution can present a significant problem in securing training data for future models. In this work, we investigate how the release of ChatGPT changed human-generated open data on the web by analyzing the activity on Stack Overflow, the leading online Q\&A platform for computer programming. We find that relative to its Russian and Chinese counterparts, where access to ChatGPT is limited, and to similar forums for mathematics, where ChatGPT is less capable, activity on Stack Overflow significantly decreased. A difference-in-differences model estimates a 16\% decrease in weekly posts on Stack Overflow. This effect increases in magnitude over time, and is larger for posts related to the most widely used programming languages. Posts made after ChatGPT get similar voting scores than before, suggesting that ChatGPT is not merely displacing duplicate or low-quality content. These results suggest that more users are adopting large language models to answer questions and they are better substitutes for Stack Overflow for languages for which they have more training data. Using models like ChatGPT may be more efficient for solving certain programming problems, but its widespread adoption and the resulting shift away from public exchange on the web will limit the open data people and models can learn from in the future.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2307.07367

Country:

Asia > Russia (0.28)
Europe > Russia (0.14)
North America > United States > New York (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Open Data on GitHub: Unlocking the Potential of AI

Roman, Anthony Cintron, Xu, Kevin, Smith, Arfon, Vega, Jehu Torres, Robinson, Caleb, Ferres, Juan M Lavista

arXiv.org Artificial IntelligenceJun-9-2023

GitHub is the world's largest platform for collaborative software development, with over 100 million users. GitHub is also used extensively for open data collaboration, hosting more than 800 million open data files, totaling 142 terabytes of data. This study highlights the potential of open data on GitHub and demonstrates how it can accelerate AI research. We analyze the existing landscape of open data on GitHub and the patterns of how users share datasets. Our findings show that GitHub is one of the largest hosts of open data in the world and has experienced an accelerated growth of open data assets over the past four years. By examining the open data landscape on GitHub, we aim to empower users and organizations to leverage existing open datasets and improve their discoverability -- ultimately contributing to the ongoing AI revolution to help address complex societal issues. We release the three datasets that we have collected to support this analysis as open datasets at https://github.com/github/open-data-on-github.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2306.06191

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > Santa Clara County > Stanford (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Greece > Attica > Athens (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.47)

Add feedback

Trends and Challenges Towards an Effective Data-Driven Decision Making in UK SMEs: Case Studies and Lessons Learnt from the Analysis of 85 SMEs

Tawil, Abdel-Rahman, Mohamed, Muhidin, Schmoor, Xavier, Vlachos, Konstantinos, Haidar, Diana

arXiv.org Artificial IntelligenceMay-24-2023

The adoption of data science brings vast benefits to Small and Medium-sized Enterprises (SMEs) including business productivity, economic growth, innovation and jobs creation. Data Science can support SMEs to optimise production processes, anticipate customers' needs, predict machinery failures and deliver efficient smart services. Businesses can also harness the power of Artificial Intelligence (AI) and Big Data and the smart use of digital technologies to enhance productivity and performance, paving the way for innovation. However, integrating data science decisions into an SME requires both skills and IT investments. In most cases, such expenses are beyond the means of SMEs due to limited resources and restricted access to financing. This paper presents trends and challenges towards an effective data-driven decision making for organisations based on a case study of 85 SMEs, mostly from the West Midlands region of England. The work is supported as part of a 3 years ERDF (European Regional Development Funded project) in the areas of big data management, analytics and business intelligence. We present two case studies that demonstrates the potential of Digitisation, AI and Machine Learning and use these as examples to unveil challenges and showcase the wealth of current available opportunities for SMEs.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2305.15454

Country:

North America > United States > Hawaii (0.04)
Europe > United Kingdom > Scotland (0.04)
Europe > United Kingdom > Northern Ireland (0.04)
(2 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (0.93)
Overview (0.93)

Industry:

Law (1.00)
Health & Medicine (1.00)
Banking & Finance > Economy (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.87)

Add feedback

Towards the Automatic Generation of Conversational Interfaces to Facilitate the Exploration of Tabular Data

Gomez, Marcos, Cabot, Jordi, Clarisó, Robert

arXiv.org Artificial IntelligenceMay-24-2023

Tabular data is the most common format to publish and exchange structured data online. A clear example is the growing number of open data portals published by all types of public administrations. However, exploitation of these data sources is currently limited to technical people able to programmatically manipulate and digest such data. As an alternative, we propose the use of chatbots to offer a conversational interface to facilitate the exploration of tabular data sources. With our approach, any regular citizen can benefit and leverage them. Moreover, our chatbots are not manually created: instead, they are automatically generated from the data source itself thanks to the instantiation of a configurable collection of conversation patterns.

artificial intelligence, data source, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.11326

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry: Government (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

The Water Health Open Knowledge Graph

Carletti, Gianluca, Giulianelli, Elio, Lippolis, Anna Sofia, Lodi, Giorgia, Nuzzolese, Andrea Giovanni, Picone, Marco, Settanta, Giulio

arXiv.org Artificial IntelligenceMay-18-2023

Recently, an increasing interest in the management of water and health resources has been recorded. This interest is fed by the global sustainability challenges posed to the humanity that have water scarcity and quality at their core. Thus, the availability of effective, meaningful and open data is crucial to address those issues in the broader context of the Sustainable Development Goals of clean water and sanitation as targeted by the United Nations. In this paper, we present the Water Health Open Knowledge Graph (WHOW-KG) along with its design methodology and analysis on impact. WHOW-KG is a semantic knowledge graph that models data on water consumption, pollution, infectious disease rates and drug distribution. The WHOW-KG is developed in the context of the EU-funded WHOW (Water Health Open Knowledge) project and aims at supporting a wide range of applications: from knowledge discovery to decision-making, making it a valuable resource for researchers, policymakers, and practitioners in the water and health domains. The WHOW-KG consists of a network of five ontologies and related linked open data, modelled according to those ontologies.

artificial intelligence, ontology, whow-kg, (13 more...)

arXiv.org Artificial Intelligence

2305.11051

Country:

Europe > Italy > Lazio > Rome (0.04)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
South America > Colombia > Bogotá D.C. > Bogotá (0.04)
Europe > Italy > Lombardy > Milan (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine (1.00)
Government (1.00)
Water & Waste Management > Water Management > Water Supplies & Services (0.30)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback

Whose Text Is It Anyway? Exploring BigCode, Intellectual Property, and Ethics

Choksi, Madiha Zahrah, Goedicke, David

arXiv.org Artificial IntelligenceApr-5-2023

Intelligent or generative writing tools rely on large language models that recognize, summarize, translate, and predict content. This position paper probes the copyright interests of open data sets used to train large language models (LLMs). Our paper asks, how do LLMs trained on open data sets circumvent the copyright interests of the used data? We start by defining software copyright and tracing its history. We rely on GitHub Copilot as a modern case study challenging software copyright. Our conclusion outlines obstacles that generative writing assistants create for copyright, and offers a practical road map for copyright analysis for developers, software law experts, and general users to consider in the context of intelligent LLM-powered writing tools.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.02839

Country: North America > United States > New York > New York County > New York City (0.06)

Genre: Research Report (0.70)

Industry: Law > Intellectual Property & Technology Law (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Mapping STI ecosystems via Open Data: overcoming the limitations of conflicting taxonomies. A case study for Climate Change Research in Denmark

Bovenzi, Nicandro, Duran-Silva, Nicolau, Massucci, Francesco Alessandro, Multari, Francesco, Parra-Rojas, Cèsar, Pujol-Llatse, Josep

arXiv.org Artificial IntelligenceSep-19-2022

Science, Technology and Innovation (STI) decision-makers often need to have a clear vision of what is researched and by whom to design effective policies. Such a vision is provided by effective and comprehensive mappings of the research activities carried out within their institutional boundaries. A major challenge to be faced in this context is the difficulty in accessing the relevant data and in combining information coming from different sources: indeed, traditionally, STI data has been confined within closed data sources and, when available, it is categorised with different taxonomies. Here, we present a proof-of-concept study of the use of Open Resources to map the research landscape on the Sustainable Development Goal (SDG) 13 - Climate Action, for an entire country, Denmark, and we map it on the 25 ERC panels.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-16802-4_52

2209.0892

Country:

Europe > Denmark (0.64)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.70)

Add feedback

Understanding the Ethical Use of Open Data While Protecting PII

#artificialintelligenceJul-27-2022, 17:39:43 GMT

People have been wondering for years – when and even sometimes IF artificial intelligence will live up to its incredible potential. The technology is finally beginning to change industries and lives. Now implemented across everything from smartphone cameras and self-driving vehicles to manufacturing facilities, AI has racked up numerous high-profile success stories: People now rely on AI to silently optimize photos, perfect their parallel parking, and discover product defects. AI can either be cool or creepy, but it's currently on the right side of that line. At the same time, however, the public is becoming increasingly aware of AI ethics, as researchers and journalists question the sources of data powering AI innovations, and spotlight ways AI data is being misused by tech giants.

innovation, open data, protecting pii, (13 more...)

#artificialintelligence

Country: North America > United States (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.92)
Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.55)
Information Technology > Data Science > Data Mining > Big Data (0.41)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.35)

Add feedback