Accuracy
Student sentiment Analysis Using Classification With Feature Extraction Techniques
Tamrakar, Latika, Shrivastava, Dr. Padmavati, Ghosh, Dr. S. M.
Technical growths have empowered, numerous revolutions in the educational system by acquainting with technology into the classroom and by elevating the learning experience. Nowadays Web-based learning is getting much popularity. This paper describes the web-based learning and their effectiveness towards students. One of the prime factors in education or learning system is feedback; it is beneficial to learning if it must be used effectively. In this paper, we worked on how machine learning techniques like Logistic Regression (LR), Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree (DT) can be applied over Web-based learning, emphasis given on sentiment present in the feedback students. We also work on two types of Feature Extraction Technique (FETs) namely Count Vector (CVr) or Bag of Words) (BoW) and Term Frequency and Inverse Document Frequency (TF-IDF) Vector. In the research study, it is our goal for our proposed LR, SVM, NB, and DT models to classify the presence of Student Feedback Dataset (SFB) with improved accuracy with cleaned dataset and feature extraction techniques. The SFB is one of the significant concerns among the student sentimental analysis.
MultiRocket: Effective summary statistics for convolutional outputs in time series classification
Tan, Chang Wei, Dempster, Angus, Bergmeir, Christoph, Webb, Geoffrey I.
Rocket and MiniRocket, while two of the fastest methods for time series classification, are both somewhat less accurate than the current most accurate methods (namely, HIVE-COTE and its variants). We show that it is possible to significantly improve the accuracy of MiniRocket (and Rocket), with some additional computational expense, by expanding the set of features produced by the transform, making MultiRocket (for MiniRocket with Multiple Features) overall the single most accurate method on the datasets in the UCR archive, while still being orders of magnitude faster than any algorithm of comparable accuracy other than its precursors.
MalNet: A Large-Scale Cybersecurity Image Database of Malicious Software
Freitas, Scott, Duggal, Rahul, Chau, Duen Horng
Computer vision is playing an increasingly important role in automated malware detection with to the rise of the image-based binary representation. These binary images are fast to generate, require no feature engineering, and are resilient to popular obfuscation methods. Significant research has been conducted in this area, however, it has been restricted to small-scale or private datasets that only a few industry labs and research teams have access to. This lack of availability hinders examination of existing work, development of new research, and dissemination of ideas. We introduce MalNet, the largest publicly available cybersecurity image database, offering 133x more images and 27x more classes than the only other public binary-image database. MalNet contains over 1.2 million images across a hierarchy of 47 types and 696 families. We provide extensive analysis of MalNet, discussing its properties and provenance. The scale and diversity of MalNet unlocks new and exciting cybersecurity opportunities to the computer vision community--enabling discoveries and research directions that were previously not possible. The database is publicly available at www.mal-net.org.
Taxonomic survey of Hindi Language NLP systems
Desai, Nikita P., Prof., null, Dabhi, Vipul K.
The field of Natural language processing can be formally defined as - "A theoretically motivated range of computational techniques for analyzing and representing naturally occurring texts at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of tasks or applications"[69]. The naturally occurring text can be in written or spoken form.A wide array of domains contribute to NLP development like linguistics, computer science and psychology.The linguistics field helps to understand the formal structure of language while computer science domain helps to find efficient internal representations and data structures.The study of "Psychology" can be useful to understand the methodology used by humans for dealing with languages. NLP can be considered to be having two distinct focus namely (1)Natural Language Generation(NLG) and (2)Natural Language Understanding(NLU). The NLG deals with planning to use the representation of language to decide what should be generated at each point in interaction, while NLU needs to analyze language and decide which is best way to represent it meaningfully.We, in this survey paper, concentrate on area of NLU for written text.Hence the NLP henceforth might be considered as NLU and vice versa. Motivation for designing Indian NLP systems Hindi and English are the official languages in central government of India(GOI). Indian community faces a "Digital Divide" due to dominance of English as mode of communication in higher education, judiciary, corporate sector and Public administration at Central level whereas the government in states work in their respective regional languages [67].The expansion of Internet has inter-connected the socioeconomic environment of the world and redefined the concept of global culture.As per a report in 2017 by the companies kpmg and Google
200+ Machine Learning Interview Questions and Answer for 2021
A Machine Learning interview calls for a rigorous interview process where the candidates are judged on various aspects such as technical and programming skills, knowledge of methods and clarity of basic concepts. If you aspire to apply for machine learning jobs, it is crucial to know what kind of interview questions generally recruiters and hiring managers may ask. This is an attempt to help you crack the machine learning interviews at major product based companies and start-ups. Usually, machine learning interviews at major companies require a thorough knowledge of data structures and algorithms. In the upcoming series of articles, we shall start from the basics of concepts and build upon these concepts to solve major interview questions. Machine learning interviews comprise of many rounds, which begin with a screening test. This comprises solving questions either on the white-board, or solving it on online platforms like HackerRank, LeetCode etc. Here, we have compiled a list of ...
Statistical Inference after Kernel Ridge Regression Imputation under item nonresponse
Wang, Hengfang, Kim, Jae-Kwang
Imputation is a popular technique for handling missing data. We consider a nonparametric approach to imputation using the kernel ridge regression technique and propose consistent variance estimation. The proposed variance estimator is based on a linearization approach which employs the entropy method to estimate the density ratio. The root-n consistency of the imputation estimator is established when a Sobolev space is utilized in the kernel ridge regression imputation, which enables us to develop the proposed variance estimator. Synthetic data experiments are presented to confirm our theory.
Tree-based Node Aggregation in Sparse Graphical Models
High-dimensional graphical models are often estimated using regularization that is aimed at reducing the number of edges in a network. In this work, we show how even simpler networks can be produced by aggregating the nodes of the graphical model. We develop a new convex regularized method, called the tree-aggregated graphical lasso or tag-lasso, that estimates graphical models that are both edge-sparse and node-aggregated. The aggregation is performed in a data-driven fashion by leveraging side information in the form of a tree that encodes node similarity and facilitates the interpretation of the resulting aggregated nodes. We provide an efficient implementation of the tag-lasso by using the locally adaptive alternating direction method of multipliers and illustrate our proposal's practical advantages in simulation and in applications in finance and biology.
Beyond traditional assumptions in fair machine learning
After challenging the validity of these assumptions in real-world applications, we propose ways to move forward when they are violated. First, we show that group fairness criteria purely based on statistical properties of observed data are fundamentally limited. Revisiting this limitation from a causal viewpoint we develop a more versatile conceptual framework, causal fairness criteria, and first algorithms to achieve them. We also provide tools to analyze how sensitive a believed-to-be causally fair algorithm is to misspecifications of the causal graph. Second, we overcome the assumption that sensitive data is readily available in practice. To this end we devise protocols based on secure multi-party computation to train, validate, and contest fair decision algorithms without requiring users to disclose their sensitive data or decision makers to disclose their models. Finally, we also accommodate the fact that outcome labels are often only observed when a certain decision has been made. We suggest a paradigm shift away from training predictive models towards directly learning decisions to relax the traditional assumption that labels can always be recorded. The main contribution of this thesis is the development of theoretically substantiated and practically feasible methods to move research on fair machine learning closer to real-world applications.
Multi-Threshold Attention U-Net (MTAU) based Model for Multimodal Brain Tumor Segmentation in MRI scans
Awasthi, Navchetan, Pardasani, Rohit, Gupta, Swati
Gliomas are one of the most frequent brain tumors and are classified into high grade and low grade gliomas. The segmentation of various regions such as tumor core, enhancing tumor etc. plays an important role in determining severity and prognosis. Here, we have developed a multi-threshold model based on attention U-Net for identification of various regions of the tumor in magnetic resonance imaging (MRI). We propose a multi-path segmentation and built three separate models for the different regions of interest. The proposed model achieved mean Dice Coefficient of 0.59, 0.72, and 0.61 for enhancing tumor, whole tumor and tumor core respectively on the training dataset. The same model gave mean Dice Coefficient of 0.57, 0.73, and 0.61 on the validation dataset and 0.59, 0.72, and 0.57 on the test dataset.
Better sampling in explanation methods can prevent dieselgate-like deception
Vreš, Domen, Šikonja, Marko Robnik
Machine learning models are used in many sensitive areas where besides predictive accuracy their comprehensibility is also important. Interpretability of prediction models is necessary to determine their biases and causes of errors, and is a necessary prerequisite for users' confidence. For complex state-of-the-art black-box models post-hoc model-independent explanation techniques are an established solution. Popular and effective techniques, such as IME, LIME, and SHAP, use perturbation of instance features to explain individual predictions. Recently, Slack et al. (2020) put their robustness into question by showing that their outcomes can be manipulated due to poor perturbation sampling employed. This weakness would allow dieselgate type cheating of owners of sensitive models who could deceive inspection and hide potentially unethical or illegal biases existing in their predictive models. This could undermine public trust in machine learning models and give rise to legal restrictions on their use. We show that better sampling in these explanation methods prevents malicious manipulations. The proposed sampling uses data generators that learn the training set distribution and generate new perturbation instances much more similar to the training set. We show that the improved sampling increases the robustness of the LIME and SHAP, while previously untested method IME is already the most robust of all.