AITopics | standard test

Collaborating Authors

standard test

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BabyHuBERT: Multilingual Self-Supervised Learning for Segmenting Speakers in Child-Centered Long-Form Recordings

Charlot, Théo, Kunze, Tarek, Poli, Maxime, Cristia, Alejandrina, Dupoux, Emmanuel, Lavechin, Marvin

arXiv.org Artificial IntelligenceSep-19-2025

Child-centered long-form recordings are essential for studying early language development, but existing speech models trained on clean adult data perform poorly due to acoustic and linguistic differences. We introduce BabyHuBERT, the first self-supervised speech representation model trained on 13,000 hours of multilingual child-centered long-form recordings spanning over 40 languages. We evaluate BabyHuBERT on speaker segmentation, identifying when target children speak versus female adults, male adults, or other children -- a fundamental preprocessing step for analyzing naturalistic language experiences. BabyHuBERT achieves F1-scores from 52.1% to 74.4% across six diverse datasets, consistently outperforming W2V2-LL4300 (trained on English long-forms) and standard HuBERT (trained on clean adult speech). Notable improvements include 13.2 absolute F1 points over HuBERT on Vanuatu and 15.9 points on Solomon Islands corpora, demonstrating effectiveness on underrepresented languages. By sharing code and models, BabyHuBERT serves as a foundation model for child speech research, enabling fine-tuning on diverse downstream tasks.

artificial intelligence, long-form recording, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.15001

Country:

Europe (0.69)
North America > United States (0.29)
Oceania > Solomon Islands (0.25)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.41)

Add feedback

R-Judge: Benchmarking Safety Risk Awareness for LLM Agents

Yuan, Tongxin, He, Zhiwei, Dong, Lingzhong, Wang, Yiming, Zhao, Ruijie, Xia, Tian, Xu, Lizhen, Zhou, Binglin, Li, Fangqi, Zhang, Zhuosheng, Wang, Rui, Liu, Gongshen

arXiv.org Artificial IntelligenceJan-18-2024

Large language models (LLMs) have exhibited great potential in autonomously completing tasks across real-world applications. Despite this, these LLM agents introduce unexpected safety risks when operating in interactive environments. Instead of centering on LLM-generated content safety in most prior studies, this work addresses the imperative need for benchmarking the behavioral safety of LLM agents within diverse environments. We introduce R-Judge, a benchmark crafted to evaluate the proficiency of LLMs in judging safety risks given agent interaction records. R-Judge comprises 162 agent interaction records, encompassing 27 key risk scenarios among 7 application categories and 10 risk types. It incorporates human consensus on safety with annotated safety risk labels and high-quality risk descriptions. Utilizing R-Judge, we conduct a comprehensive evaluation of 8 prominent LLMs commonly employed as the backbone for agents. The best-performing model, GPT-4, achieves 72.29% in contrast to the human score of 89.38%, showing considerable room for enhancing the risk awareness of LLMs. Notably, leveraging risk descriptions as environment feedback significantly improves model performance, revealing the importance of salient safety risk feedback. Furthermore, we design an effective chain of safety analysis technique to help the judgment of safety risks and conduct an in-depth case study to facilitate future research. R-Judge is publicly available at https://github.com/Lordog/R-Judge.

agent, llm, scenario, (15 more...)

arXiv.org Artificial Intelligence

2401.10019

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)
Banking & Finance (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Genentech, Winterlight study finds speech analysis AI monitors Alzheimer's as well as standard tests

#artificialintelligenceOct-11-2022, 14:24:28 GMT

Vocal biomarkers have been linked to diseases ranging from COVID-19 to dementia, the latter of which is in the crosshairs for Winterlight Labs.

ai and machine learning ai, biotech medtech ai, speech analysis ai monitor alzheimer, (7 more...)

#artificialintelligence

Country: Europe (0.12)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (0.74)
Health & Medicine > Therapeutic Area (0.73)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.71)

Add feedback

Artificial intelligence 'better at diagnosing heart failure' than standard test

#artificialintelligenceJun-15-2022, 08:49:45 GMT

Dr Ken Lee, cardiology specialist registrar and clinical lecturer at Edinburgh University, said: "Heart failure can be a very challenging diagnosis to make in practice. "We have shown that CoDE-HF, our decision-support tool, can substantially improve the accuracy of diagnosing heart failure compared to current blood tests." Previous research has shown that patients who are diagnosed quickly benefit the most from treatment. Acute heart failure affects nearly one million people in the UK and accounts for five per cent of all unplanned hospital admissions. The prevalence is projected to rise by approximately 50% over the next 25 years owing to the ageing population. It is a sudden, life-threatening condition caused when the heart is suddenly unable to pump enough oxygen-rich blood around the body to meet its needs. It can be brought on by coronary heart disease – where the arteries become blocked, limiting blood flow – or by other ongoing conditions such as diabetes which damage cardiac ...

edinburgh university, heart failure, intelligence, (10 more...)

#artificialintelligence

Country:

Europe > United Kingdom > Scotland > North Lanarkshire (0.05)
Europe > United Kingdom > Scotland > Lanarkshire (0.05)

Genre: Research Report (0.32)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

IBM's latest AI predicts Alzheimer's better than standard tests

#artificialintelligenceNov-8-2020, 23:25:15 GMT

IBM has developed a new AI model which predicts the onset of Alzheimer's better than standard clinical tests. The AI is designed to be non-invasive and uses a short language sample from a verbal cognitive test given to a patient. Using this sample, the AI model is able to predict the onset of Alzheimer's with around 71 percent accuracy. For comparison, standard clinical tests are correct approximately 59 percent of the time and take much longer to diagnose. Current tests analyse the descriptive abilities of people as they age for potential warning signs.

alzheimer, ibm, standard test, (8 more...)

#artificialintelligence

Country:

North America > United States > Massachusetts (0.06)
North America > United States > California (0.06)
Europe > Netherlands > North Holland > Amsterdam (0.06)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Testing for Normality with Neural Networks

Simić, Miloš

arXiv.org Machine LearningOct-7-2020

In this paper, we treat the problem of testing for normality as a binary classification problem and construct a feedforward neural network that can successfully detect normal distributions by inspecting small samples from them. The numerical experiments conducted on small samples with no more than 100 elements indicated that the neural network which we trained was more accurate and far more powerful than the most frequently used and most powerful standard tests of normality: Shapiro-Wilk, Anderson-Darling, Lilliefors and Jarque-Berra, as well as the kernel tests of goodness-of-fit. The neural network had the AUROC score of almost 1, which corresponds to the perfect binary classifier. Additionally, the network's accuracy was higher than 96% on a set of larger samples with 250-1000 elements. Since the normality of data is an assumption of numerous techniques for analysis and inference, the neural network constructed in this study has a very high potential for use in everyday practice of statistics, data analysis and machine learning in both science and industry.

artificial intelligence, machine learning, normality, (18 more...)

arXiv.org Machine Learning

2009.13831

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Serbia > Central Serbia > Belgrade (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(8 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Question Difﬁculty Prediction for READING Problems in Standard Tests

Huang, Zhenya (University of Science and Technology of China) | Liu, Qi (University of Science and Technology of China) | Chen, Enhong (University of Science and Technology of China) | Zhao, Hongke (University of Science and Technology of China) | Gao, Mingyong ( iFLYTEK Co., Ltd. ) | Wei, Si ( iFLYTEK Co., Ltd. ) | Su, Yu (Anhui University) | Hu, Guoping ( iFLYTEK Co., Ltd. )

AAAI ConferencesFeb-14-2017

Standard tests aim to evaluate the performance of examinees using different tests with consistent difficulties. Thus, a critical demand is to predict the difficulty of each test question before the test is conducted. Existing studies are usually based on the judgments of education experts (e.g., teachers), which may be subjective and labor intensive. In this paper, we propose a novel Test-aware Attention-based Convolutional Neural Network (TACNN) framework to automatically solve this Question Difficulty Prediction (QDP) task for READING problems (a typical problem style in English tests) in standard tests. Specifically, given the abundant historical test logs and text materials of questions, we first design a CNN-based architecture to extract sentence representations for the questions. Then, we utilize an attention strategy to qualify the difficulty contribution of each sentence to questions. Considering the incomparability of question difficulties in different tests, we propose a test-dependent pairwise strategy for training TACNN and generating the difficulty prediction value. Extensive experiments on a real-world dataset not only show the effectiveness of TACNN, but also give interpretable insights to track the attention information for questions.

machine learning, natural language, tacnn, (19 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.62)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AI scores higher than the average person on standard test

Daily Mail - Science & techJan-19-2017, 20:45:09 GMT

Artificial intelligence can now outperform humans on a standard intelligence test. A new computational model scores within the 75th percentile, better than the average person, on a test known as Raven's Progressive Matrices. Researchers say this demonstrates that it can take on abstract visual reasoning tasks, and is a major step toward AI that can see and understand the world the way we do. Using Raven's Progressive Matrices, a nonverbal standardized test that measures abstract reasoning, the team found that their model is not only on par with humans, but performs better than many. In this example, participants choose which shape should come next in the sequence.

artificial intelligence, fluid intelligence, reasoning, (13 more...)

Daily Mail - Science & tech

Country: North America > United States (0.17)

Genre: Research Report > New Finding (0.57)

Industry:

Education > Assessment & Standards (0.58)
Government > Military > Navy (0.32)

Technology: Information Technology > Artificial Intelligence > Cognitive Science (0.38)

Add feedback

Validation of nonlinear PCA

Scholz, Matthias

arXiv.org Artificial IntelligenceApr-3-2012

Linear principal component analysis (PCA) can be extended to a nonlinear PCA by using artificial neural networks. But the benefit of curved components requires a careful control of the model complexity. Moreover, standard techniques for model selection, including cross-validation and more generally the use of an independent test set, fail when applied to nonlinear PCA because of its inherent unsupervised characteristics. This paper presents a new approach for validating the complexity of nonlinear PCA models by using the error in missing data estimation as a criterion for model selection. It is motivated by the idea that only the model of optimal complexity is able to predict missing values with the highest accuracy. While standard test set validation usually favours over-fitted nonlinear PCA models, the proposed model validation approach correctly selects the optimal model complexity.

complexity, nonlinear pca, validation, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s11063-012-9220-6

1204.0684

Country:

North America > United States > New York (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > Italy (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback