AITopics | Oceania

As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Slovenia (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
(29 more...)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
(4 more...)

Add feedback

f8e6ba1db0f3c4054afec1684ba8fb26-Supplemental.pdf

Neural Information Processing SystemsAug-19-2025, 00:13:31 GMT

artificial intelligence, dataset, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.05)
North America > United States > California > San Francisco County > San Francisco (0.05)
Oceania > New Zealand (0.04)
(8 more...)

Genre: Research Report (0.47)

Industry: Energy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

TGEA 2.0 Supplementary Materials A Appendix

Neural Information Processing SystemsAug-19-2025, 00:11:46 GMT

Table 2: The number of erroneous texts generated with different decoding strategies. Figure 2: The distribution of MiSEW over the number of tokens contained in each MiSEW . We have fine-tuned several commonly used Chinese PLMs as baselines. All models have 12 attention heads and the hidden size is 768. We train these models on 8 Tesla P100 with 16G memory.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia > China > Tianjin Province > Tianjin (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > China > Anhui Province (0.04)
(2 more...)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Advancing Data Equity: Practitioner Responsibility and Accountability in NLP Data Practices

Cunningham, Jay L., Shao, Kevin Zhongyang, Pang, Rock Yuren, Mengist, Nathaniel

arXiv.org Artificial IntelligenceAug-19-2025

While research has focused on surfacing and auditing algorithmic bias to ensure equitable AI development, less is known about how NLP practitioners - those directly involved in dataset development, annotation, and deployment - perceive and navigate issues of NLP data equity. This study is among the first to center practitioners' perspectives, linking their experiences to a multi-scalar AI governance framework and advancing participatory recommendations that bridge technical, policy, and community domains. Drawing on a 2024 questionnaire and focus group, we examine how U.S.-based NLP data practitioners conceptualize fairness, contend with organizational and systemic constraints, and engage emerging governance efforts such as the U.S. AI Bill of Rights. Findings reveal persistent tensions between commercial objectives and equity commitments, alongside calls for more participatory and accountable data workflows. We critically engage debates on data diversity and diversity washing, arguing that improving NLP equity requires structural governance reforms that support practitioner agency and community consent.

data mining, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2508.10071

Country:

Europe > Austria > Vienna (0.14)
Oceania > New Zealand (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Law (1.00)
Health & Medicine (1.00)
Government (1.00)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(4 more...)

Add feedback

SL-ACC: A Communication-Efficient Split Learning Framework with Adaptive Channel-wise Compression

Lin, Zehang, Lin, Zheng, Yang, Miao, Huang, Jianhao, Zhang, Yuxin, Fang, Zihan, Du, Xia, Chen, Zhe, Zhu, Shunzhi, Ni, Wei

arXiv.org Artificial IntelligenceAug-19-2025

The increasing complexity of neural networks poses a significant barrier to the deployment of distributed machine learning (ML) on resource-constrained devices, such as federated learning (FL). Split learning (SL) offers a promising solution by offloading the primary computing load from edge devices to a server via model partitioning. However, as the number of participating devices increases, the transmission of excessive smashed data (i.e., activations and gradients) becomes a major bottleneck for SL, slowing down the model training. To tackle this challenge, we propose a communication-efficient SL framework, named SL-ACC, which comprises two key components: adaptive channel importance identification (ACII) and channel grouping compression (CGC). ACII first identifies the contribution of each channel in the smashed data to model training using Shannon entropy. Following this, CGC groups the channels based on their entropy and performs group-wise adaptive compression to shrink the transmission volume without compromising training accuracy. Extensive experiments across various datasets validate that our proposed SL-ACC framework takes considerably less time to achieve a target accuracy than state-of-the-art benchmarks.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.12984

Country: