logreg
M-DAIGT: A Shared Task on Multi-Domain Detection of AI-Generated Text
Lamsiyah, Salima, Ezzini, Saad, Mahdaouy, Abdelkader El, Alami, Hamza, Benlahbib, Abdessamad, Amrany, Samir El, Chafik, Salmane, Hammouchi, Hicham
The generation of highly fluent text by Large Language Models (LLMs) poses a significant challenge to information integrity and academic research. In this paper, we introduce the Multi-Domain Detection of AI-Generated Text (M-DAIGT) shared task, which focuses on detecting AI-generated text across multiple domains, particularly in news articles and academic writing. M-DAIGT comprises two binary classification subtasks: News Article Detection (NAD) (Subtask 1) and Academic Writing Detection (AWD) (Subtask 2). To support this task, we developed and released a new large-scale benchmark dataset of 30,000 samples, balanced between human-written and AI-generated texts. The AI-generated content was produced using a variety of modern LLMs (e.g., GPT-4, Claude) and diverse prompting strategies. A total of 46 unique teams registered for the shared task, of which four teams submitted final results. All four teams participated in both Subtask 1 and Subtask 2. We describe the methods employed by these participating teams and briefly discuss future directions for M-DAIGT.
- Europe > Austria > Vienna (0.17)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- (6 more...)
A model-free approach to fingertip slip and disturbance detection for grasp stability inference
Kitouni, Dounia, Khoramshahi, Mahdi, Perdereau, Veronique
Robotic capacities in object manipulation are incomparable to those of humans. Besides years of learning, humans rely heavily on the richness of information from physical interaction with the environment. In particular, tactile sensing is crucial in providing such rich feedback. Despite its potential contributions to robotic manipulation, tactile sensing is less exploited; mainly due to the complexity of the time series provided by tactile sensors. In this work, we propose a method for assessing grasp stability using tactile sensing. More specifically, we propose a methodology to extract task-relevant features and design efficient classifiers to detect object slippage with respect to individual fingertips. We compare two classification models: support vector machine and logistic regression. We use highly sensitive Uskin tactile sensors mounted on an Allegro hand to test and validate our method. Our results demonstrate that the proposed method is effective in slippage detection in an online fashion.
- Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)
Comparing Linear and Logistic Regression - KDnuggets
Data Science interviews vary in their depth. Some interviews go really deep and test the candidates on their knowledge of advanced models or tricky fine-tuning. But many interviews are conducted at an entry level, trying to test the basic knowledge of the candidate. In this article we will see a question that can be discussed in such an interview. Even though the question is very simple, the discussion brings up many interesting aspects of the fundamentals of machine learning. Question: What is the difference between Linear Regression and Logistic Regression? There are actually many similarities between the two, starting with the fact that their names are very similar sounding.
- Research Report > New Finding (0.62)
- Research Report > Experimental Study (0.62)
Decision Rule Elicitation for Domain Adaptation
Nikitin, Alexander, Kaski, Samuel
Human-in-the-loop machine learning is widely used in artificial intelligence (AI) to elicit labels for data points from experts or to provide feedback on how close the predicted results are to the target. This simplifies away all the details of the decision-making process of the expert. In this work, we allow the experts to additionally produce decision rules describing their decision-making; the rules are expected to be imperfect but to give additional information. In particular, the rules can extend to new distributions, and hence enable significantly improving performance for cases where the training and testing distributions differ, such as in domain adaptation. We apply the proposed method to lifelong learning and domain adaptation problems and discuss applications in other branches of AI, such as knowledge acquisition problems in expert systems. In simulated and real-user studies, we show that decision rule elicitation improves domain adaptation of the algorithm and helps to propagate expert's knowledge to the AI model.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Texas > Brazos County > College Station (0.06)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- (6 more...)
- Leisure & Entertainment (0.93)
- Media (0.68)
- Education > Educational Setting > Continuing Education (0.34)
- Information Technology > Knowledge Management > Knowledge Engineering (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
A Ranking Approach to Fair Classification
Schoeffer, Jakob, Kuehl, Niklas, Valera, Isabel
Algorithmic decision systems are increasingly used in areas such as hiring, school admission, or loan approval. Typically, these systems rely on labeled data for training a classification model. However, in many scenarios, ground-truth labels are unavailable, and instead we have only access to imperfect labels as the result of (potentially biased) human-made decisions. Despite being imperfect, historical decisions often contain some useful information on the unobserved true labels. In this paper, we focus on scenarios where only imperfect labels are available and propose a new fair ranking-based decision system, as an alternative to traditional classification algorithms. Our approach is both intuitive and easy to implement, and thus particularly suitable for adoption in real-world settings. More in detail, we introduce a distance-based decision criterion, which incorporates useful information from historical decisions and accounts for unwanted correlation between protected and legitimate features. Through extensive experiments on synthetic and real-world data, we show that our method is fair, as it a) assigns the desirable outcome to the most qualified individuals, and b) removes the effect of stereotypes in decision-making, thereby outperforming traditional classification algorithms. Additionally, we are able to show theoretically that our method is consistent with a prominent concept of individual fairness which states that "similar individuals should be treated similarly."
- Europe > Germany > Saarland > Saarbrücken (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- Asia (0.04)
Learning, transferring, and recommending performance knowledge with Monte Carlo tree search and neural networks
Making changes to a program to optimize its performance is an unscalable task that relies entirely upon human intuition and experience. In addition, companies operating at large scale are at a stage where no single individual understands the code controlling its systems, and for this reason, making changes to improve performance can become intractably difficult. In this paper, a learning system is introduced that provides AI assistance for finding recommended changes to a program. Specifically, it is shown how the evaluative feedback, delayed-reward performance programming domain can be effectively formulated via the Monte Carlo tree search (MCTS) framework. It is then shown that established methods from computational games for using learning to expedite tree-search computation can be adapted to speed up computing recommended program alterations. Estimates of expected utility from MCTS trees built for previous problems are used to learn a sampling policy that remains effective across new problems, thus demonstrating transferability of optimization knowledge. This formulation is applied to the Apache Spark distributed computing environment, and a preliminary result is observed that the time required to build a search tree for finding recommendations is reduced by up to a factor of 10x.
- Research Report (0.51)
- Overview (0.34)
Consistent Classification with Generalized Metrics
Wang, Xiaoyan, Li, Ran, Yan, Bowei, Koyejo, Oluwasanmi
We propose a framework for constructing and analyzing multiclass and multioutput classification metrics, i.e., involving multiple, possibly correlated multiclass labels. Our analysis reveals novel insights on the geometry of feasible confusion tensors -- including necessary and sufficient conditions for the equivalence between optimizing an arbitrary non-decomposable metric and learning a weighted classifier. Further, we analyze averaging methodologies commonly used to compute multioutput metrics and characterize the corresponding Bayes optimal classifiers. We show that the plug-in estimator based on this characterization is consistent and is easily implemented as a post-processing rule. Empirical results on synthetic and benchmark datasets support the theoretical findings.
- North America > United States > Illinois (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- North America > United States > California > Monterey County > Pacific Grove (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Bridging the Generalization Gap: Training Robust Models on Confounded Biological Data
Liu, Tzu-Yu, Kannan, Ajay, Drake, Adam, Bertin, Marvin, Wan, Nathan
Statistical learning on biological data can be challenging due to confounding variables in sample collection and processing. Confounders can cause models to generalize poorly and result in inaccurate prediction performance metrics if models are not validated thoroughly. In this paper, we propose methods to control for confounding factors and further improve prediction performance. We introduce OrthoNormal basis construction In cOnfounding factor Normalization (ONION) to remove confounding covariates and use the Domain-Adversarial Neural Network (DANN) to penalize models for encoding confounder information. We apply the proposed methods to simulated and empirical patient data and show significant improvements in generalization.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > San Mateo County > South San Francisco (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
An Introduction to Redis-ML (Part Three) Redis Labs
This post is part three of a series of posts introducing the Redis-ML module. The first article in the series can be found here. The sample code for this post requires several Python libraries and a Redis instance running Redis-ML. Detailed setup instructions to run the code can be found in either part one or part two of the series. Logistic regression is another linear model for building predictive models from observed data.
Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection
Kanamori, Takafumi, Hido, Shohei, Sugiyama, Masashi
We address the problem of estimating the ratio of two probability density functions (a.k.a.~the importance). The importance values can be used for various succeeding tasks such as non-stationarity adaptation or outlier detection. In this paper, we propose a new importance estimation method that has a closed-form solution; the leave-one-out cross-validation score can also be computed analytically. Therefore, the proposed method is computationally very efficient and numerically stable. We also elucidate theoretical properties of the proposed method such as the convergence rate and approximation error bound. Numerical experiments show that the proposed method is comparable to the best existing method in accuracy, while it is computationally more efficient than competing approaches.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
- Information Technology > Data Science > Data Mining > Anomaly Detection (0.62)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)