AITopics

2402.02364

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.63)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Mizutani, Koyu, Mitarai, Haruki, Miyazaki, Kakeru, Kumano, Soichiro, Yamasaki, Toshihiko

Data-Driven Prediction of Seismic Intensity Distributions Featuring Hybrid Classification-Regression Models

arXiv.org Artificial IntelligenceFeb-3-2024

Earthquakes are among the most immediate and deadly natural disasters that humans face. Accurately forecasting the extent of earthquake damage and assessing potential risks can be instrumental in saving numerous lives. In this study, we developed linear regression models capable of predicting seismic intensity distributions based on earthquake parameters: location, depth, and magnitude. Because it is completely data-driven, it can predict intensity distributions without geographical information. The dataset comprises seismic intensity data from earthquakes that occurred in the vicinity of Japan between 1997 and 2020, specifically containing 1,857 instances of earthquakes with a magnitude of 5.0 or greater, sourced from the Japan Meteorological Agency. We trained both regression and classification models and combined them to take advantage of both to create a hybrid model. The proposed model outperformed commonly used Ground Motion Prediction Equations (GMPEs) in terms of the correlation coefficient, F1 score, and MCC. Furthermore, the proposed model can predict even abnormal seismic intensity distributions, a task at conventional GMPEs often struggle.

artificial intelligence, intensity distribution, machine learning, (18 more...)

2402.0215

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.16)

Genre: Research Report (0.84)

Industry: Energy > Oil & Gas > Upstream (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.94)

Tapo, Allahsera Auguste, Traore, Ali, Danioko, Sidy, Tembine, Hamidou

Machine Intelligence in Africa: a survey

arXiv.org Artificial IntelligenceFeb-3-2024

In the last 5 years, the availability of large audio datasets in African countries has opened unlimited opportunities to build machine intelligence (MI) technologies that are closer to the people and speak, learn, understand, and do businesses in local languages, including for those who cannot read and write. Unfortunately, these audio datasets are not fully exploited by current MI tools, leaving several Africans out of MI business opportunities. Additionally, many state-of-the-art MI models are not culture-aware, and the ethics of their adoption indexes are questionable. The lack thereof is a major drawback in many applications in Africa. This paper summarizes recent developments in machine intelligence in Africa from a multi-layer multiscale and culture-aware ethics perspective, showcasing MI use cases in 54 African countries through 400 articles on MI research, industry, government actions, as well as uses in art, music, the informal economy, and small businesses in Africa. The survey also opens discussions on the reliability of MI rankings and indexes in the African continent as well as algorithmic definitions of unclear terms used in MI.

data mining, natural language, pattern recognition, (25 more...)

2402.02218

Country:

Africa > Nigeria (1.00)
Africa > Democratic Republic of the Congo (0.92)
Africa > Cameroon (0.67)
(45 more...)

Genre:

Summary/Review (1.00)
Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
(5 more...)

Industry:

Water & Waste Management > Water Management > Water Supplies & Services (1.00)
Transportation > Ground > Road (1.00)
Telecommunications (1.00)
(47 more...)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
(20 more...)

arXiv.org Machine LearningFeb-3-2024

A flexible Bayesian g-formula for causal survival analyses with time-dependent confounding

Chen, Xinyuan, Hu, Liangyuan, Li, Fan

In longitudinal observational studies with a time-to-event outcome, a common objective in causal analysis is to estimate the causal survival curve under hypothetical intervention scenarios within the study cohort. The g-formula is a particularly useful tool for this analysis. To enhance the traditional parametric g-formula approach, we developed a more adaptable Bayesian g-formula estimator. This estimator facilitates both longitudinal predictive and causal inference. It incorporates Bayesian additive regression trees in the modeling of the time-evolving generative components, aiming to mitigate bias due to model misspecification. Specifically, we introduce a more general class of g-formulas for discrete survival data. These formulas can incorporate the longitudinal balancing scores, which serve as an effective method for dimension reduction and are vital when dealing with an expanding array of time-varying confounders. The minimum sufficient formulation of these longitudinal balancing scores is linked to the nature of treatment regimes, whether static or dynamic. For each type of treatment regime, we provide posterior sampling algorithms, which are grounded in the Bayesian additive regression trees framework. We have conducted simulation studies to illustrate the empirical performance of our proposed Bayesian g-formula estimators, and to compare them with existing parametric estimators. We further demonstrate the practical utility of our methods in real-world scenarios using data from the Yale New Haven Health System's electronic health records.

confounder, longitudinal, treatment regime, (14 more...)

arXiv.org Machine Learning

2402.02306

Country:

North America > United States > North Carolina > Vance County > Henderson (0.04)
North America > United States > Mississippi (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Health Care Providers & Services (0.86)
Health & Medicine > Health Care Technology > Medical Record (0.54)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

van der Laan, Lars, Carone, Marco, Luedtke, Alex

Combining T-learning and DR-learning: a framework for oracle-efficient estimation of causal contrasts

We introduce efficient plug-in (EP) learning, a novel framework for the estimation of heterogeneous causal contrasts, such as the conditional average treatment effect and conditional relative risk. The EP-learning framework enjoys the same oracle-efficiency as Neyman-orthogonal learning strategies, such as DR-learning and R-learning, while addressing some of their primary drawbacks, including that (i) their practical applicability can be hindered by loss function non-convexity; and (ii) they may suffer from poor performance and instability due to inverse probability weighting and pseudo-outcomes that violate bounds. To avoid these drawbacks, EP-learner constructs an efficient plug-in estimator of the population risk function for the causal contrast, thereby inheriting the stability and robustness properties of plug-in estimation strategies like T-learning. Under reasonable conditions, EP-learners based on empirical risk minimization are oracle-efficient, exhibiting asymptotic equivalence to the minimizer of an oracle-efficient one-step debiased estimator of the population risk function. In simulation experiments, we illustrate that EP-learners of the conditional average treatment effect and conditional relative risk outperform state-of-the-art competitors, including T-learner, R-learner, and DR-learner. Open-source implementations of the proposed methods are available in our R package hte3.

ep-learner, estimator, mean squared error, (16 more...)

2402.01972

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > North Carolina > Wake County > Raleigh (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.67)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Data Science > Data Mining (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Díaz, Laura Fernández, Díaz, Miriam Fernández, Quevedo, José Ramón, Montañés, Elena

Capturing waste collection planning expert knowledge in a fitness function through preference learning

This paper copes with the COGERSA waste collection process. Up to now, experts have been manually designed the process using a trial and error mechanism. This process is not globally optimized, since it has been progressively and locally built as council demands appear. Planning optimization algorithms usually solve it, but they need a fitness function to evaluate a route planning quality. The drawback is that even experts are not able to propose one in a straightforward way due to the complexity of the process. Hence, the goal of this paper is to build a fitness function though a preference framework, taking advantage of the available expert knowledge and expertise. Several key performance indicators together with preference judgments are carefully established according to the experts for learning a promising fitness function. Particularly, the additivity property of them makes the task be much more affordable, since it allows to work with routes rather than with route plannings. Besides, a feature selection analysis is performed over such indicators, since the experts suspect of a potential existing (but unknown) redundancy among them. The experiment results confirm this hypothesis, since the best $C-$index ($98\%$ against around $94\%$) is reached when 6 or 8 out of 21 indicators are taken. Particularly, truck load seems to be a highly promising key performance indicator, together to the travelled distance along non-main roads. A comparison with other existing approaches shows that the proposed method clearly outperforms them, since the $C-$index goes from $72\%$ or $90\%$ to $98\%$.

collection point, fitness function, route planning, (14 more...)

doi: 10.1016/j.engappai.2020.104113

2402.01849

Country:

North America > United States > Oregon > Benton County > Corvallis (0.04)
North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.04)
Europe > Spain > Asturias (0.04)
(2 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Water & Waste Management > Solid Waste Management (1.00)
Government > Regional Government > North America Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Matabuena, Marcos, Vidal, Juan C., Padilla, Oscar Hernan Madrid, Onnela, Jukka-Pekka

kNN Algorithm for Conditional Mean and Variance Estimation with Automated Uncertainty Quantification and Variable Selection

In this paper, we introduce a kNN-based regression method that synergizes the scalability and adaptability of traditional non-parametric kNN models with a novel variable selection technique. This method focuses on accurately estimating the conditional mean and variance of random response variables, thereby effectively characterizing conditional distributions across diverse scenarios.Our approach incorporates a robust uncertainty quantification mechanism, leveraging our prior estimation work on conditional mean and variance. The employment of kNN ensures scalable computational efficiency in predicting intervals and statistical accuracy in line with optimal non-parametric rates. Additionally, we introduce a new kNN semi-parametric algorithm for estimating ROC curves, accounting for covariates. For selecting the smoothing parameter k, we propose an algorithm with theoretical guarantees.Incorporation of variable selection enhances the performance of the method significantly over conventional kNN techniques in various modeling tasks. We validate the approach through simulations in low, moderate, and high-dimensional covariate spaces. The algorithm's effectiveness is particularly notable in biomedical applications as demonstrated in two case studies. Concluding with a theoretical analysis, we highlight the consistency and convergence rate of our method over traditional kNN models, particularly when the underlying regression model takes values in a low-dimensional space.

algorithm, estimation, selection, (13 more...)

2402.01635

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > India (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.95)
Health & Medicine > Public Health (0.93)
Government (0.93)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)

Bertsimas, Dimitris, Delarue, Arthur, Pauphilet, Jean

Adaptive Optimization for Prediction with Missing Data

When training predictive models on data with missing entries, the most widely used and versatile approach is a pipeline technique where we first impute missing entries and then compute predictions. In this paper, we view prediction with missing data as a two-stage adaptive optimization problem and propose a new class of models, adaptive linear regression models, where the regression coefficients adapt to the set of observed features. We show that some adaptive linear regression models are equivalent to learning an imputation rule and a downstream linear regression model simultaneously instead of sequentially. We leverage this joint-impute-then-regress interpretation to generalize our framework to non-linear models. In settings where data is strongly not missing at random, our methods achieve a 2-10% improvement in out-of-sample accuracy.

adaptive linear regression model, dataset, regression model, (14 more...)

2402.01543

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Wisconsin (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Baylor, Emi, Ploeger, Esther, Bjerva, Johannes

Multilingual Gradient Word-Order Typology from Universal Dependencies

While information from the field of linguistic typology has the potential to improve performance on NLP tasks, reliable typological data is a prerequisite. Existing typological databases, including WALS and Grambank, suffer from inconsistencies primarily caused by their categorical format. Furthermore, typological categorisations by definition differ significantly from the continuous nature of phenomena, as found in natural language corpora. In this paper, we introduce a new seed dataset made up of continuous-valued data, rather than categorical data, that can better reflect the variability of language. While this initial dataset focuses on word-order typology, we also present the methodology used to create the dataset, which can be easily adapted to generate data for a broader set of features and languages.

computational linguistic, dataset, typology, (14 more...)

2402.01513

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Denmark > North Jutland > Aalborg (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(12 more...)

Genre: Research Report (0.53)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.33)

Fdez-Díaz, Laura, Quevedo, José Ramón, Montañés, Elena

Regularized boosting with an increasing coefficient magnitude stop criterion as meta-learner in hyperparameter optimization stacking ensemble

In Hyperparameter Optimization (HPO), only the hyperparameter configuration with the best performance is chosen after performing several trials, then, discarding the effort of training all the models with every hyperparameter configuration trial and performing an ensemble of all them. This ensemble consists of simply averaging the model predictions or weighting the models by a certain probability. Recently, other more sophisticated ensemble strategies, such as the Caruana method or the stacking strategy has been proposed. On the one hand, the Caruana method performs well in HPO ensemble, since it is not affected by the effects of multicollinearity, which is prevalent in HPO. It just computes the average over a subset of predictions with replacement. But it does not benefit from the generalization power of a learning process. On the other hand, stacking methods include a learning procedure since a meta-learner is required to perform the ensemble. Yet, one hardly finds advice about which meta-learner is adequate. Besides, some meta-learners may suffer from the effects of multicollinearity or need to be tuned to reduce them. This paper explores meta-learners for stacking ensemble in HPO, free of hyperparameter tuning, able to reduce the effects of multicollinearity and considering the ensemble learning process generalization power. At this respect, the boosting strategy seems promising as a stacking meta-learner. In fact, it completely removes the effects of multicollinearity. This paper also proposes an implicit regularization in the classical boosting method and a novel non-parametric stop criterion suitable only for boosting and specifically designed for HPO. The synergy between these two improvements over boosting exhibits competitive and promising predictive power performance compared to other existing meta-learners and ensemble approaches for HPO other than the stacking ensemble.

ensemble, rboost, stop criterion, (15 more...)

doi: 10.1016/j.neucom.2023.126516

2402.01379

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)