Sikkim
SimpleQA Verified: A Reliable Factuality Benchmark to Measure Parametric Knowledge
Haas, Lukas, Yona, Gal, D'Antonio, Giovanni, Goldshtein, Sasha, Das, Dipanjan
We introduce SimpleQA Verified, a 1,000-prompt benchmark for evaluating Large Language Model (LLM) short-form factuality based on OpenAI's SimpleQA. It addresses critical limitations in OpenAI's benchmark, including noisy and incorrect labels, topical biases, and question redundancy. SimpleQA Verified was created through a rigorous multi-stage filtering process involving de-duplication, topic balancing, and source reconciliation to produce a more reliable and challenging evaluation set, alongside improvements in the autorater prompt. On this new benchmark, Gemini 2.5 Pro achieves a state-of-the-art F1-score of 55.6, outperforming other frontier models, including GPT-5. This work provides the research community with a higher-fidelity tool to track genuine progress in parametric model factuality and to mitigate hallucinations. The benchmark dataset, evaluation code, and leaderboard are available at: https://www.kaggle.com/benchmarks/deepmind/simpleqa-verified.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- South America > Colombia (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- (7 more...)
- Leisure & Entertainment (1.00)
- Government (0.69)
- Media > Television (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)
Zero-Shot KWS for Children's Speech using Layer-Wise Features from SSL Models
Kutum, Subham, Sinha, Abhijit, Kathania, Hemant Kumar, Kadiri, Sudarsana Reddy, Govil, Mahesh Chandra
Numerous methods have been proposed to enhance Keyword Spotting (KWS) in adult speech, but children's speech presents unique challenges for KWS systems due to its distinct acoustic and linguistic characteristics. This paper introduces a zero-shot KWS approach that leverages state-of-the-art self-supervised learning (SSL) models, including Wav2Vec2, HuBERT and Data2Vec. Features are extracted layer-wise from these SSL models and used to train a Kaldi-based DNN KWS system. The WSJCAM0 adult speech dataset was used for training, while the PFSTAR children's speech dataset was used for testing, demonstrating the zero-shot capability of our method. Our approach achieved state-of-the-art results across all keyword sets for children's speech. Notably, the Wav2Vec2 model, particularly layer 22, performed the best, delivering an ATWV score of 0.691, a MTWV score of 0.7003 and probability of false alarm and probability of miss of 0.0164 and 0.0547 respectively, for a set of 30 keywords. Furthermore, age-specific performance evaluation confirmed the system's effectiveness across different age groups of children. To assess the system's robustness against noise, additional experiments were conducted using the best-performing layer of the best-performing Wav2Vec2 model. The results demonstrated a significant improvement over traditional MFCC-based baseline, emphasizing the potential of SSL embeddings even in noisy conditions. To further generalize the KWS framework, the experiments were repeated for an additional CMU dataset. Overall the results highlight the significant contribution of SSL features in enhancing Zero-Shot KWS performance for children's speech, effectively addressing the challenges associated with the distinct characteristics of child speakers.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > India > Sikkim (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
- (2 more...)
Can Layer-wise SSL Features Improve Zero-Shot ASR Performance for Children's Speech?
Sinha, Abhijit, Kathania, Hemant Kumar, Kadiri, Sudarsana Reddy, Narayanan, Shrikanth
Automatic Speech Recognition (ASR) systems often struggle to accurately process children's speech due to its distinct and highly variable acoustic and linguistic characteristics. While recent advancements in self-supervised learning (SSL) models have greatly enhanced the transcription of adult speech, accurately transcribing children's speech remains a significant challenge. This study investigates the effectiveness of layer-wise features extracted from state-of-the-art SSL pre-trained models - specifically, Wav2Vec2, HuBERT, Data2Vec, and WavLM in improving the performance of ASR for children's speech in zero-shot scenarios. A detailed analysis of features extracted from these models was conducted, integrating them into a simplified DNN-based ASR system using the Kaldi toolkit. The analysis identified the most effective layers for enhancing ASR performance on children's speech in a zero-shot scenario, where WSJCAM0 adult speech was used for training and PFSTAR children speech for testing. Experimental results indicated that Layer 22 of the Wav2Vec2 model achieved the lowest Word Error Rate (WER) of 5.15%, representing a 51.64% relative improvement over the direct zero-shot decoding using Wav2Vec2 (WER of 10.65%). Additionally, age group-wise analysis demonstrated consistent performance improvements with increasing age, along with significant gains observed even in younger age groups using the SSL features. Further experiments on the CMU Kids dataset confirmed similar trends, highlighting the generalizability of the proposed approach.
- North America > United States > California (0.14)
- Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)
- Asia > Russia (0.04)
- Asia > India > Sikkim (0.04)
Layer-Wise Analysis of Self-Supervised Representations for Age and Gender Classification in Children's Speech
Sinha, Abhijit, Kumar, Harishankar, Joshi, Mohit, Kathania, Hemant Kumar, Narayanan, Shrikanth, Kadiri, Sudarsana Reddy
Children's speech presents challenges for age and gender classification due to high variability in pitch, articulation, and developmental traits. While self-supervised learning (SSL) models perform well on adult speech tasks, their ability to encode speaker traits in children remains underexplored. This paper presents a detailed layer-wise analysis of four Wav2Vec2 variants using the PFSTAR and CMU Kids datasets. Results show that early layers (1-7) capture speaker-specific cues more effectively than deeper layers, which increasingly focus on linguistic information. Applying PCA further improves classification, reducing redundancy and highlighting the most informative components. The Wav2Vec2-large-lv60 model achieves 97.14% (age) and 98.20% (gender) on CMU Kids; base-100h and large-lv60 models reach 86.05% and 95.00% on PFSTAR. These results reveal how speaker traits are structured across SSL model depth and support more targeted, adaptive strategies for child-aware speech interfaces.
- North America > United States > California (0.14)
- Europe > Spain (0.04)
- Asia > India > Sikkim (0.04)
- Asia > China > Liaoning Province > Dalian (0.04)
Indian Voters Are Being Bombarded With Millions of Deepfakes. Political Candidates Approve
On a stifling April afternoon in Ajmer, in the Indian state of Rajasthan, local politician Shakti Singh Rathore sat down in front of a greenscreen to shoot a short video. It was his first time being cloned. Wearing a crisp white shirt and a ceremonial saffron scarf bearing a lotus flower--the logo of the BJP, the country's ruling party--Rathore pressed his palms together and greeted his audience in Hindi. Before he could continue, the director of the shoot walked into the frame. Divyendra Singh Jadoun, a 31-year-old with a bald head and a thick black beard, told Rathore he was moving around too much on camera.
- Government > Voting & Elections (1.00)
- Government > Regional Government > Asia Government > India Government (0.84)
Comparing skill of historical rainfall data based monsoon rainfall prediction in India with NCEP-NWP forecasts
Narula, Apoorva, Jain, Aastha, Batra, Jatin, Juneja, Sandeep
In this draft we consider the problem of forecasting rainfall across India during the four monsoon months, one day as well as three days in advance. We train neural networks using historical daily gridded precipitation data for India obtained from IMD for the time period $1901- 2022$, at a spatial resolution of $1^{\circ} \times 1^{\circ}$. This is compared with the numerical weather prediction (NWP) forecasts obtained from NCEP (National Centre for Environmental Prediction) available for the period 2011-2022. We conduct a detailed country wide analysis and separately analyze some of the most populated cities in India. Our conclusion is that forecasts obtained by applying deep learning to historical rainfall data are more accurate compared to NWP forecasts as well as predictions based on persistence. On average, compared to our predictions, forecasts from NCEP-NWP model have about 34% higher error for a single day prediction, and over 68% higher error for a three day prediction. Similarly, persistence estimates report a 29% higher error in a single day forecast, and over 54% error in a three day forecast. We further observe that data up to 20 days in the past is useful in reducing errors of one and three day forecasts, when a transformer based learning architecture, and to a lesser extent when an LSTM is used. A key conclusion suggested by our preliminary analysis is that NWP forecasts can be substantially improved upon through more and diverse data relevant to monsoon prediction combined with carefully selected neural network architecture.
- Asia > India > Maharashtra > Mumbai (0.05)
- Asia > India > Tamil Nadu > Chennai (0.05)
- Asia > India > West Bengal > Kolkata (0.05)
- (7 more...)
Constrained Twin Variational Auto-Encoder for Intrusion Detection in IoT Systems
Dinh, Phai Vu, Nguyen, Quang Uy, Hoang, Dinh Thai, Nguyen, Diep N., Bao, Son Pham, Dutkiewicz, Eryk
Intrusion detection systems (IDSs) play a critical role in protecting billions of IoT devices from malicious attacks. However, the IDSs for IoT devices face inherent challenges of IoT systems, including the heterogeneity of IoT data/devices, the high dimensionality of training data, and the imbalanced data. Moreover, the deployment of IDSs on IoT systems is challenging, and sometimes impossible, due to the limited resources such as memory/storage and computing capability of typical IoT devices. To tackle these challenges, this article proposes a novel deep neural network/architecture called Constrained Twin Variational Auto-Encoder (CTVAE) that can feed classifiers of IDSs with more separable/distinguishable and lower-dimensional representation data. Additionally, in comparison to the state-of-the-art neural networks used in IDSs, CTVAE requires less memory/storage and computing power, hence making it more suitable for IoT IDS systems. Extensive experiments with the 11 most popular IoT botnet datasets show that CTVAE can boost around 1% in terms of accuracy and Fscore in detection attack compared to the state-of-the-art machine learning and representation learning methods, whilst the running time for attack detection is lower than 2E-6 seconds and the model size is lower than 1 MB. We also further investigate various characteristics of CTVAE in the latent space and in the reconstruction representation to demonstrate its efficacy compared with current well-known methods.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Vietnam > Hanoi > Hanoi (0.04)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- (11 more...)
Machine Learning, Deep Learning and Data Preprocessing Techniques for Detection, Prediction, and Monitoring of Stress and Stress-related Mental Disorders: A Scoping Review
Razavi, Moein, Ziyadidegan, Samira, Jahromi, Reza, Kazeminasab, Saber, Janfaza, Vahid, Mahmoudzadeh, Ahmadreza, Baharlouei, Elaheh, Sasangohar, Farzan
This comprehensive review systematically evaluates Machine Learning (ML) methodologies employed in the detection, prediction, and analysis of mental stress and its consequent mental disorders (MDs). Utilizing a rigorous scoping review process, the investigation delves into the latest ML algorithms, preprocessing techniques, and data types employed in the context of stress and stress-related MDs. The findings highlight that Support Vector Machine (SVM), Neural Network (NN), and Random Forest (RF) models consistently exhibit superior accuracy and robustness among all machine learning algorithms examined. Furthermore, the review underscores that physiological parameters, such as heart rate measurements and skin response, are prevalently used as stress predictors in ML algorithms. This is attributed to their rich explanatory information concerning stress and stress-related MDs, as well as the relative ease of data acquisition. Additionally, the application of dimensionality reduction techniques, including mappings, feature selection, filtering, and noise reduction, is frequently observed as a crucial step preceding the training of ML algorithms. The synthesis of this review identifies significant research gaps and outlines future directions for the field. These encompass areas such as model interpretability, model personalization, the incorporation of naturalistic settings, and real-time processing capabilities for detection and prediction of stress and stress-related MDs.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Singapore (0.04)
- North America > United States > Texas (0.04)
- (22 more...)
- Overview (1.00)
- Research Report > New Finding (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
A look at how AI supports your smartphone, from voice recognition to photography - All The News From Sikkim, India and The World
Pakyong, 13 Feb: You might not realize it right away, but artificial intelligence (AI) actually powers many of your phone's features. Your phone's technology is always working in the background, handling various duties, even while you are not using it. It examines how your phone is used to maximize battery life, helps you take clear photographs, recognizes music, aids with language translation, and much more. AI was previously only found in pricey devices that incorporated the most cutting-edge technology. However, since AI is now such a crucial component of mobile applications, chipmakers saw the need to create AI processors specifically for machine learning and deep learning activities to speed up processing. The most widely used voice assistants at the moment are Google Assistant, Siri, and Bixby, and you can use at least one of them on any smartphone.
A generative, predictive model for menstrual cycle lengths that accounts for potential self-tracking artifacts in mobile health data
Li, Kathy, Urteaga, Iñigo, Shea, Amanda, Vitzthum, Virginia J., Wiggins, Chris H., Elhadad, Noémie
Mobile health (mHealth) apps such as menstrual trackers provide a rich source of self-tracked health observations that can be leveraged for health-relevant research. However, such data streams have questionable reliability since they hinge on user adherence to the app. Therefore, it is crucial for researchers to separate true behavior from self-tracking artifacts. By taking a machine learning approach to modeling self-tracked cycle lengths, we can both make more informed predictions and learn the underlying structure of the observed data. In this work, we propose and evaluate a hierarchical, generative model for predicting next cycle length based on previously-tracked cycle lengths that accounts explicitly for the possibility of users skipping tracking their period. Our model offers several advantages: 1) accounting explicitly for self-tracking artifacts yields better prediction accuracy as likelihood of skipping increases; 2) because it is a generative model, predictions can be updated online as a given cycle evolves, and we can gain interpretable insight into how these predictions change over time; and 3) its hierarchical nature enables modeling of an individual's cycle length history while incorporating population-level information. Our experiments using mHealth cycle length data encompassing over 186,000 menstruators with over 2 million natural menstrual cycles show that our method yields state-of-the-art performance against neural network-based and summary statistic-based baselines, while providing insights on disentangling menstrual patterns from self-tracking artifacts. This work can benefit users, mHealth app developers, and researchers in better understanding cycle patterns and user adherence.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Germany > Berlin (0.04)
- South America > Bolivia (0.04)
- (5 more...)
- Research Report > Experimental Study (0.67)
- Research Report > New Finding (0.46)