AITopics

2510.08609

Country: North America > United States (0.68)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.67)

Technology:

Information Technology > Software (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Pautsch, Erik, Singla, Tanmay, Jiang, Wenxin, Peng, Huiyun, Hassanshahi, Behnaz, Läufer, Konstantin, Thiruvathukal, George K., Davis, James C.

AgentHub: A Research Agenda for Agent Sharing Infrastructure

arXiv.org Artificial IntelligenceOct-7-2025

LLM-based agents are rapidly proliferating, yet the infrastructure for discovering, evaluating, and governing them remains fragmented compared to mature ecosystems like software package registries (e.g., npm) and model hubs (e.g., Hugging Face). Recent research and engineering works have begun to consider the requisite infrastructure, but so far they focus narrowly -- on distribution, naming, or protocol negotiation. However, considering broader software engineering requirements would improve open-source distribution and ease reuse. We therefore propose AgentHub, a research agenda for agent sharing. By framing the key challenges of capability clarity, lifecycle transparency, interoperability, governance, security, and workflow integration, AgentHub charts a community-wide agenda for building reliable and scalable agent ecosystems. Our vision is a future where agents can be shared, trusted, and composed as seamlessly as today's software libraries.

large language model, machine learning, natural language, (20 more...)

2510.03495

Country: North America > United States (1.00)

Genre: Research Report (0.53)

Industry:

Government > Regional Government > North America Government > United States Government (0.47)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Almeida, Thales Sales, Nogueira, Rodrigo, Pedrini, Helio

Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora

arXiv.org Artificial IntelligenceSep-11-2025

The performance of large language models (LLMs) is deeply influenced by the quality and composition of their training data. While much of the existing work has centered on English, there remains a gap in understanding how to construct effective training corpora for other languages. We explore scalable methods for building web-based corpora for LLMs. We apply them to build a new 120B token corpus in Portuguese that achieves competitive results to an industrial-grade corpus. Using a continual pretraining setup, we study how different data selection and preprocessing strategies affect LLM performance when transitioning a model originally trained in English to another language. Our findings demonstrate the value of language-specific filtering pipelines, including classifiers for education, science, technology, engineering, and mathematics (STEM), as well as toxic content. We show that adapting a model to the target language leads to performance improvements, reinforcing the importance of high-quality, language-specific data. While our case study focuses on Portuguese, our methods are applicable to other languages, offering insights for multilingual LLM development.

large language model, machine learning, natural language, (17 more...)

2509.08824

Country:

Asia (0.46)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Luo, Haochen, Zhou, Yi, Bollegala, Danushka

Together We Make Sense -- Learning Meta-Sense Embeddings from Pretrained Static Sense Embeddings

arXiv.org Artificial IntelligenceMay-30-2023

Sense embedding learning methods learn multiple vectors for a given ambiguous word, corresponding to its different word senses. For this purpose, different methods have been proposed in prior work on sense embedding learning that use different sense inventories, sense-tagged corpora and learning methods. However, not all existing sense embeddings cover all senses of ambiguous words equally well due to the discrepancies in their training resources. To address this problem, we propose the first-ever meta-sense embedding method -- Neighbour Preserving Meta-Sense Embeddings, which learns meta-sense embeddings by combining multiple independently trained source sense embeddings such that the sense neighbourhoods computed from the source embeddings are preserved in the meta-embedding space. Our proposed method can combine source sense embeddings that cover different sets of word senses. Experimental results on Word Sense Disambiguation (WSD) and Word-in-Context (WiC) tasks show that the proposed meta-sense embedding method consistently outperforms several competitive baselines.

artificial intelligence, machine learning, natural language, (19 more...)

2305.19092

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas (0.04)
North America > United States > New Mexico (0.04)
(14 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.87)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)

#artificialintelligenceSep-9-2022, 19:20:40 GMT

GitHub - RubensZimbres/best-of-ml-python: 🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

A ranked list of awesome machine learning Python libraries. This curated list contains 830 awesome open-source projects with a total of 2.6M stars grouped into 32 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from GitHub and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Discover other best-of lists or create your own.

conda, github, pypi, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Artificial IntelligenceJul-8-2022

Towards Semantic Communication Protocols: A Probabilistic Logic Perspective

Seo, Sejin, Park, Jihong, Ko, Seung-Woo, Choi, Jinho, Bennis, Mehdi, Kim, Seong-Lyun

Classical medium access control (MAC) protocols are interpretable, yet their task-agnostic control signaling messages (CMs) are ill-suited for emerging mission-critical applications. By contrast, neural network (NN) based protocol models (NPMs) learn to generate task-specific CMs, but their rationale and impact lack interpretability. To fill this void, in this article we propose, for the first time, a semantic protocol model (SPM) constructed by transforming an NPM into an interpretable symbolic graph written in the probabilistic logic programming language (ProbLog). This transformation is viable by extracting and merging common CMs and their connections while treating the NPM as a CM generator. By extensive simulations, we corroborate that the SPM tightly approximates its original NPM while occupying only 0.02% memory. By leveraging its interpretability and memory-efficiency, we demonstrate several SPM-enabled applications such as SPM reconfiguration for collision-avoidance, as well as comparing different SPMs via semantic entropy calculation and storing multiple SPMs to cope with non-stationary environments. Traditionally, cellular medium access control (MAC) protocols have been designed primarily for general purposes. Ko is with Inha University, Incheon, Korea (e-mail: swko@inha.ac.kr). This work has been submitted to the IEEE for possible publication. While handshaking rules and scheduling policies can partly be manipulated (e.g., grant-free access prioritization [2]), their control signaling messages (CMs) remain unchanged even when tasks and other environmental characteristics vary over time.

npm, protocol, spm, (15 more...)

2207.0392

Country:

Asia > South Korea > Incheon > Incheon (0.24)
Europe > Finland > Northern Ostrobothnia > Oulu (0.04)
Oceania > Australia (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Information Technology (0.68)
Transportation (0.66)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

#artificialintelligenceApr-28-2022, 23:14:32 GMT

GitHub - ml-tooling/best-of-ml-python: 🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

A ranked list of awesome machine learning Python libraries. This curated list contains 920 awesome open-source projects with a total of 3.4M stars grouped into 34 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from GitHub and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Discover other best-of lists or create your own.

conda, github, pypi, (11 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

#artificialintelligenceFeb-20-2019, 06:32:02 GMT

The Benefits of AI and Machine Learning in Network Monitoring

Artificial intelligence – also commonly known as AI – has revolutionized the technology world. Companies both inside and outside the tech circle are introducing AI into their work suite. AI takes the basic principles of computing and processing and applies intelligent environment analysis on top of it. For industries, AI analyzes the data they generate and provides them with insights based on its findings. AI can also apply machine learning to examine historical data in order to perform tasks without human input.

ai and machine learning, network monitoring, npm, (3 more...)

Industry:

Telecommunications > Networks (0.71)
Information Technology > Networks (0.71)
Information Technology > Smart Houses & Appliances (0.57)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence (1.00)

#artificialintelligenceFeb-1-2017, 16:15:25 GMT

Understanding How Machines Learn, Through Prototyping – Big Tomorrow

This is the second article in a larger series exploring the intersection of design and existing artificial intelligence technology through experiments, prototypes and concepts. We believe this is a critically important topic for the design community and beyond, so we're sharing what we learn along the way. Let's start by getting something out of the way: we're not machine learning experts -- we don't publish research about new algorithmic breakthroughs and we're not especially good at math. But we're curious about what to do with all the machine learning capability already floating around out in the world, and we're bullish about how far a'good enough' understanding can often take you. So how might non-experts begin to play with machine learning?

artificial intelligence, brain, machine learning, (10 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.84)

Iso, Ken-ichi, Watanabe, Takao

Speech Recognition Using Demi-Syllable Neural Prediction Model

Neural Information Processing SystemsDec-31-1991

The Neural Prediction Model is the speech recognition model based on pattern prediction by multilayer perceptrons. Its effectiveness was confirmed by the speaker-independent digit recognition experiments. This paper presents an improvement in the model and its application to large vocabulary speech recognition, based on subword units. The improvement involves an introduction of "backward prediction," which further improves the prediction accuracy of the original model with only "forward prediction". In application of the model to speaker-dependent large vocabulary speech recognition, the demi-syllable unit is used as a subword recognition unit.

feature vector, prediction, recognition, (12 more...)

Neural Information Processing Systems

Country: Asia > Japan (0.04)

Genre: Research Report > Promising Solution (0.35)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.31)