Kakinada
BhashaBench V1: A Comprehensive Benchmark for the Quadrant of Indic Domains
Devane, Vijay, Nauman, Mohd, Patel, Bhargav, Wakchoure, Aniket Mahendra, Sant, Yogeshkumar, Pawar, Shyam, Thakur, Viraj, Godse, Ananya, Patra, Sunil, Maurya, Neha, Racha, Suraj, Singh, Nitish Kamal, Nagpal, Ajay, Sawarkar, Piyush, Pundalik, Kundeshwar Vijayrao, Saluja, Rohit, Ramakrishnan, Ganesh
The rapid advancement of large language models(LLMs) has intensified the need for domain and culture specific evaluation. Existing benchmarks are largely Anglocentric and domain-agnostic, limiting their applicability to India-centric contexts. To address this gap, we introduce BhashaBench V1, the first domain-specific, multi-task, bilingual benchmark focusing on critical Indic knowledge systems. BhashaBench V1 contains 74,166 meticulously curated question-answer pairs, with 52,494 in English and 21,672 in Hindi, sourced from authentic government and domain-specific exams. It spans four major domains: Agriculture, Legal, Finance, and Ayurveda, comprising 90+ subdomains and covering 500+ topics, enabling fine-grained evaluation. Evaluation of 29+ LLMs reveals significant domain and language specific performance gaps, with especially large disparities in low-resource domains. For instance, GPT-4o achieves 76.49% overall accuracy in Legal but only 59.74% in Ayurveda. Models consistently perform better on English content compared to Hindi across all domains. Subdomain-level analysis shows that areas such as Cyber Law, International Finance perform relatively well, while Panchakarma, Seed Science, and Human Rights remain notably weak. BhashaBench V1 provides a comprehensive dataset for evaluating large language models across India's diverse knowledge domains. It enables assessment of models' ability to integrate domain-specific knowledge with bilingual understanding. All code, benchmarks, and resources are publicly available to support open research.
- North America > United States (0.14)
- Asia > India > Maharashtra (0.04)
- Asia > Middle East > Jordan (0.04)
- (17 more...)
- Law > Statutes (1.00)
- Health & Medicine (1.00)
- Food & Agriculture > Agriculture (1.00)
- Government > Regional Government > Asia Government > India Government (0.46)
Statistical Comparative Analysis of Semantic Similarities and Model Transferability Across Datasets for Short Answer Grading
Bonthu, Sridevi, Sree, S. Rama, Prasad, M. H. M. Krishna
Developing dataset-specific models involves iterative fine-tuning and optimization, incurring significant costs over time. This study investigates the transferability of state-of-the-art (SOTA) models trained on established datasets to an unexplored text dataset. The key question is whether the knowledge embedded within SOTA models from existing datasets can be harnessed to achieve high-performance results on a new domain. In pursuit of this inquiry, two well-established benchmarks, the STSB and Mohler datasets, are selected, while the recently introduced SPRAG dataset serves as the unexplored domain. By employing robust similarity metrics and statistical techniques, a meticulous comparative analysis of these datasets is conducted. The primary goal of this work is to yield comprehensive insights into the potential applicability and adaptability of SOTA models. The outcomes of this research have the potential to reshape the landscape of natural language processing (NLP) by unlocking the ability to leverage existing models for diverse datasets. This may lead to a reduction in the demand for resource-intensive, dataset-specific training, thereby accelerating advancements in NLP and paving the way for more efficient model deployment.
- Research Report > Experimental Study (0.69)
- Research Report > New Finding (0.49)
Constrained Centroid Clustering: A Novel Approach for Compact and Structured Partitioning
Veeramachaneni, Sowmini Devi, Garimella, Ramamurthy
This paper presents Constrained Centroid Clustering (CCC), a method that extends classical centroid-based clustering by enforcing a constraint on the maximum distance between the cluster center and the farthest point in the cluster. Using a Lagrangian formulation, we derive a closed-form solution that maintains interpretability while controlling cluster spread. To evaluate CCC, we conduct experiments on synthetic circular data with radial symmetry and uniform angular distribution. Using ring-wise, sector-wise, and joint entropy as evaluation metrics, we show that CCC achieves more compact clusters by reducing radial spread while preserving angular structure, outperforming standard methods such as K-means and GMM. The proposed approach is suitable for applications requiring structured clustering with spread control, including sensor networks, collaborative robotics, and interpretable pattern analysis.
- Asia > India > Telangana > Hyderabad (0.05)
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
- Asia > India > Andhra Pradesh > Kakinada (0.04)
- Research Report > Promising Solution (0.50)
- Overview > Innovation (0.40)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Communications > Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
End-to-End Speech Translation for Low-Resource Languages Using Weakly Labeled Data
Pothula, Aishwarya, Akkiraju, Bhavana, Bandarupalli, Srihari, D, Charan, Kesiraju, Santosh, Vuppala, Anil Kumar
The scarcity of high-quality annotated data presents a significant challenge in developing effective end-to-end speech-to-text translation (ST) systems, particularly for low-resource languages. This paper explores the hypothesis that weakly labeled data can be used to build ST models for low-resource language pairs. We constructed speech-to-text translation datasets with the help of bitext mining using state-of-the-art sentence encoders. We mined the multilingual Shrutilipi corpus to build Shrutilipi-anuvaad, a dataset comprising ST data for language pairs Bengali-Hindi, Malayalam-Hindi, Odia-Hindi, and Telugu-Hindi. We created multiple versions of training data with varying degrees of quality and quantity to investigate the effect of quality versus quantity of weakly labeled data on ST model performance. Results demonstrate that ST systems can be built using weakly labeled data, with performance comparable to massive multi-modal multilingual baselines such as SONAR and SeamlessM4T.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.05)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- (8 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.88)
Performance Evaluation of Sentiment Analysis on Text and Emoji Data Using End-to-End, Transfer Learning, Distributed and Explainable AI Models
Velampalli, Sirisha, Muniyappa, Chandrashekar, Saxena, Ashutosh
Emojis are being frequently used in todays digital world to express from simple to complex thoughts more than ever before. Hence, they are also being used in sentiment analysis and targeted marketing campaigns. In this work, we performed sentiment analysis of Tweets as well as on emoji dataset from the Kaggle. Since tweets are sentences we have used Universal Sentence Encoder (USE) and Sentence Bidirectional Encoder Representations from Transformers (SBERT) end-to-end sentence embedding models to generate the embeddings which are used to train the Standard fully connected Neural Networks (NN), and LSTM NN models. We observe the text classification accuracy was almost the same for both the models around 98 percent. On the contrary, when the validation set was built using emojis that were not present in the training set then the accuracy of both the models reduced drastically to 70 percent. In addition, the models were also trained using the distributed training approach instead of a traditional singlethreaded model for better scalability. Using the distributed training approach, we were able to reduce the run-time by roughly 15% without compromising on accuracy. Finally, as part of explainable AI the Shap algorithm was used to explain the model behaviour and check for model biases for the given feature set.
- Oceania > Australia > Queensland > Brisbane (0.04)
- North America > United States > Wisconsin > Brown County > Green Bay (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
- Information Technology > Services (0.69)
- Information Technology > Security & Privacy (0.46)
Information Security and Privacy in the Digital World: Some Selected Topics
Sen, Jaydip, Mayer, Joceli, Dasgupta, Subhasis, Nandi, Subrata, Krishnaswamy, Srinivasan, Mitra, Pinaki, Singh, Mahendra Pratap, Kundeti, Naga Prasanthi, MVP, Chandra Sekhara Rao, Chekuri, Sudha Sree, Pallapothu, Seshu Babu, Nanjundan, Preethi, George, Jossy P., Allahi, Abdelhadi El, Morino, Ilham, Oussous, Salma AIT, Beloualid, Siham, Tamtaoui, Ahmed, Bajit, Abderrahim
Recent developments in hardware and information technology have enabled the emergence of billions of connected, intelligent devices around the world exchanging information with minimal human involvement. This paradigm, known as the Internet of Things (IoT), is progressing quickly, with an estimated 27 billion devices by 2025 (almost four devices per person) [1, 2]. These smart devices help improve our quality of life, with wearables to monitor health, vehicles that interact with traffic centers and other vehicles to ensure safety, and various home appliances offering comfort. This increase in the number of IoT devices and successful IoT services has generated tremendous data. The International Data Corporation report estimates that by 2025 this data will grow from 4 to 140 zettabytes [3].
- Europe > United Kingdom > England > Greater London > London (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Austria > Vienna (0.14)
- (50 more...)
- Summary/Review (1.00)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Workflow (0.92)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.92)
- Information Technology > Services > e-Commerce Services (0.67)
From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy
Gupta, Maanak, Akiri, CharanKumar, Aryal, Kshitiz, Parker, Eli, Praharaj, Lopamudra
Undoubtedly, the evolution of Generative AI (GenAI) models has been the highlight of digital transformation in the year 2022. As the different GenAI models like ChatGPT and Google Bard continue to foster their complexity and capability, it's critical to understand its consequences from a cybersecurity perspective. Several instances recently have demonstrated the use of GenAI tools in both the defensive and offensive side of cybersecurity, and focusing on the social, ethical and privacy implications this technology possesses. This research paper highlights the limitations, challenges, potential risks, and opportunities of GenAI in the domain of cybersecurity and privacy. The work presents the vulnerabilities of ChatGPT, which can be exploited by malicious users to exfiltrate malicious information bypassing the ethical constraints on the model. This paper demonstrates successful example attacks like Jailbreaks, reverse psychology, and prompt injection attacks on the ChatGPT. The paper also investigates how cyber offenders can use the GenAI tools in developing cyber attacks, and explore the scenarios where ChatGPT can be used by adversaries to create social engineering attacks, phishing attacks, automated hacking, attack payload generation, malware creation, and polymorphic malware. This paper then examines defense techniques and uses GenAI tools to improve security measures, including cyber defense automation, reporting, threat intelligence, secure code generation and detection, attack identification, developing ethical guidelines, incidence response plans, and malware detection. We will also discuss the social, legal, and ethical implications of ChatGPT. In conclusion, the paper highlights open challenges and future directions to make this GenAI secure, safe, trustworthy, and ethical as the community understands its cybersecurity impacts.
- Europe > Italy (0.04)
- North America > United States > Tennessee > Putnam County > Cookeville (0.04)
- North America > United States > Texas (0.04)
- (3 more...)
- Research Report > New Finding (0.45)
- Research Report > Experimental Study (0.34)
- Information Technology > Security & Privacy (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military > Cyberwarfare (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.75)
Applying Machine Learning to DevOps
The world of using static tooling for packaging, provisioning, deployments, and monitoring, APM and log management will be over. With Docker adoption, the Cloud and API driven approaches and micro-services to deploying applications at a large scale, ensuring high reliability, requires an excellent take. So, it's essential to include creative managing tools for cloud instead of reinventing the wheel every time. With the rise of ML and AI, more DevOps tooling vendors are incorporating intelligence with their offerings for further simplifying the task of engineers. Machine Learning is the practical application of Artificial Intelligence (AI) in the form of a set of programs or algorithms. The aspect of learning relies on training time and data.
An Introduction to LUIS (Language Understanding Intelligent Service) - DZone AI
Language Understanding Intelligent Service (LUIS) enables developers to build smart applications that can understand human language and respond accordingly to user requests. Let's first try to understand why we need LUIS. In a web application, you search for a functionality in the menu. The menu, menu items, screen layout, and navigation vary for each application. Before you use any application, you need to familiarize yourself with the menu items and the navigation. The same functionality may have different names in different applications.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England (0.04)
- Asia > India > Telangana (0.04)
- (2 more...)
An Extensive Report on Cellular Automata Based Artificial Immune System for Strengthening Automated Protein Prediction
Sree, Pokkuluri Kiran, Babuhor, Inampudi Ramesh, N3, SSSN Usha Devi
Artificial Immune System (AIS-MACA) a novel computational intelligence technique is can be used for strengthening the automated protein prediction system with more adaptability and incorporating more parallelism to the system. Most of the existing approaches are sequential which will classify the input into four major classes and these are designed for similar sequences. AIS-MACA is designed to identify ten classes from the sequences that share twilight zone similarity and identity with the training sequences with mixed and hybrid variations. This method also predicts three states (helix, strand, and coil) for the secondary structure. Our comprehensive design considers 10 feature selection methods and 4 classifiers to develop MACA (Multiple Attractor Cellular Automata) based classifiers that are build for each of the ten classes. We have tested the proposed classifier with twilight-zone and 1-high-similarity benchmark datasets with over three dozens of modern competing predictors shows that AIS-MACA provides the best overall accuracy that ranges between 80% and 89.8% depending on the dataset.
- North America > United States (0.05)
- Asia > India > Telangana > Hyderabad (0.04)
- Asia > India > Andhra Pradesh > Kakinada (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
- Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (0.64)