AITopics

2410.08783

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Government Relations & Public Policy (0.92)
Health & Medicine > Therapeutic Area > Gastroenterology (0.67)
(2 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(3 more...)

arXiv.org Artificial IntelligenceOct-16-2024

Personalized Prediction Models for Changes in Knee Pain among Patients with Osteoarthritis Participating in Supervised Exercise and Education

Rafiei, M., Das, S., Bakhtiari, M., Roos, E. M., Skou, S. T., Grønne, D. T., Baumbach, J., Baumbach, L.

Knee osteoarthritis (OA) is a widespread chronic condition that impairs mobility and diminishes quality of life. Despite the proven benefits of exercise therapy and patient education in managing the OA symptoms pain and functional limitations, these strategies are often underutilized. Personalized outcome prediction models can help motivate and engage patients, but the accuracy of existing models in predicting changes in knee pain remains insufficiently examined. To validate existing models and introduce a concise personalized model predicting changes in knee pain before to after participating in a supervised education and exercise therapy program (GLA:D) for knee OA patients. Our models use self-reported patient information and functional measures. To refine the number of variables, we evaluated the variable importance and applied clinical reasoning. We trained random forest regression models and compared the rate of true predictions of our models with those utilizing average values. We evaluated the performance of a full, continuous, and concise model including all 34, all 11 continuous, and the six most predictive variables respectively. All three models performed similarly and were comparable to the existing model, with R-squares of 0.31-0.32 and RMSEs of 18.65-18.85 - despite our increased sample size. Allowing a deviation of 15 VAS points from the true change in pain, our concise model and utilizing the average values estimated the change in pain at 58% and 51% correctly, respectively. Our supplementary analysis led to similar outcomes. Our concise personalized prediction model more accurately predicts changes in knee pain following the GLA:D program compared to average pain improvement values. Neither the increase in sample size nor the inclusion of additional variables improved previous models. To improve predictions, new variables beyond those in the GLA:D are required.

artificial intelligence, machine learning, prediction, (16 more...)

2410.12597

Country:

North America > United States > Massachusetts > Worcester County > Holden (0.04)
Europe > Denmark > Southern Denmark (0.04)
Europe > Portugal > Braga > Braga (0.04)
Europe > Germany > Hamburg (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Abedsoltan, Amirhesam, Radhakrishnan, Adityanarayanan, Wu, Jingfeng, Belkin, Mikhail

Context-Scaling versus Task-Scaling in In-Context Learning

arXiv.org Machine LearningOct-16-2024

Transformers exhibit In-Context Learning (ICL), where these models solve new tasks by using examples in the prompt without additional training. In our work, we identify and analyze two key components of ICL: (1) context-scaling, where model performance improves as the number of in-context examples increases and (2) task-scaling, where model performance improves as the number of pre-training tasks increases. While transformers are capable of both context-scaling and task-scaling, we empirically show that standard Multi-Layer Perceptrons (MLPs) with vectorized input are only capable of task-scaling. To understand how transformers are capable of context-scaling, we first propose a significantly simplified transformer architecture without key, query, value weights. We show that it performs ICL comparably to the original GPT-2 model in various statistical learning tasks including linear regression, teacher-student settings. Furthermore, a single block of our simplified transformer can be viewed as data dependent feature map followed by an MLP. This feature map on its own is a powerful predictor that is capable of context-scaling but is not capable of task-scaling. We show empirically that concatenating the output of this feature map with vectorized data as an input to MLPs enables both context-scaling and task-scaling. This finding provides a simple setting to study context and task-scaling for ICL.

context length, regression, transformer, (13 more...)

2410.12783

Country: North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.37)

RIbeiro, Antônio H., Schön, Thomas B., Zahariah, Dave, Bach, Francis

Efficient Optimization Algorithms for Linear Adversarial Training

arXiv.org Machine LearningOct-16-2024

Adversarial training can be used to learn models that are robust against perturbations. For linear models, it can be formulated as a convex optimization problem. Compared to methods proposed in the context of deep learning, leveraging the optimization structure allows significantly faster convergence rates. Still, the use of generic convex solvers can be inefficient for large-scale problems. Here, we propose tailored optimization algorithms for the adversarial training of linear models, which render large-scale regression and classification problems more tractable. For regression problems, we propose a family of solvers based on iterative ridge regression and, for classification, a family of solvers based on projected gradient descent. The methods are based on extended variable reformulations of the original problem. We illustrate their efficiency in numerical examples.

adversarial training, dataset, regression, (11 more...)

2410.12677

Country:

Europe > Sweden > Uppsala County > Uppsala (0.04)
North America > United States > Indiana > Hamilton County > Fishers (0.04)
Europe > Portugal (0.04)
Asia > Taiwan (0.04)

Genre: Research Report (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
(2 more...)

Sushma, Neeraj Mohan, Tian, Yudou, Mestha, Harshvardhan, Colombo, Nicolo, Kappel, David, Subramoney, Anand

State-space models can learn in-context by gradient descent

arXiv.org Artificial IntelligenceOct-15-2024

Deep state-space models (Deep SSMs) have shown capabilities for in-context learning on autoregressive tasks, similar to transformers. However, the architectural requirements and mechanisms enabling this in recurrent networks remain unclear. This study demonstrates that state-space model architectures can perform gradient-based learning and use it for in-context learning. We prove that a single structured state-space model layer, augmented with local self-attention, can reproduce the outputs of an implicit linear model with least squares loss after one step of gradient descent. Our key insight is that the diagonal linear recurrent layer can act as a gradient accumulator, which can be `applied' to the parameters of the implicit regression model. We validate our construction by training randomly initialized augmented SSMs on simple linear regression tasks. The empirically optimized parameters match the theoretical ones, obtained analytically from the implicit model construction. Extensions to multi-step linear and non-linear regression yield consistent results. The constructed SSM encompasses features of modern deep state-space models, with the potential for scalable training and effectiveness even in general tasks. The theoretical construction elucidates the role of local self-attention and multiplicative interactions in recurrent architectures as the key ingredients for enabling the expressive power typical of foundation models.

artificial intelligence, gradient descent, machine learning, (12 more...)

2410.11687

Country:

Europe > Germany (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > India (0.04)

Genre: Research Report (0.52)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Vávra, Jan, Prostmaier, Bernd Hans-Konrad, Grün, Bettina, Hofmarcher, Paul

A Structural Text-Based Scaling Model for Analyzing Political Discourse

arXiv.org Artificial IntelligenceOct-14-2024

Estimating ideological positions of lawmakers has a long tradition in political science. Poole & Rosenthal (1985) proposed a "scaling procedure" to estimate ideological positions of lawmakers based on their voting behavior. Dynamic weighted nominal three-step estimation (McCarty et al. 1997), an extension of this procedure, results in the DW-Nominate scores that are widely accepted as benchmark ideological positions both on party level as well as on individual level (see, e.g., Poole et al. 2011, Lewis et al. 2022, Boche et al. 2018). Legislative votes, however, provide limited information on the latent ideological positions because voting behavior on individual level is often not documented and lawmakers rarely diverge from party-line voting due to robust party discipline (Hug 2010). Consequently, roll-call analysis for inferring the ideological positions adopted by legislators both within and across parties is of limited value (see, e.g., Lauderdale & Herzog 2016). Text-based scaling models are a promising alternative method to discern ideological stances based on political discussions.

artificial intelligence, machine learning, natural language, (19 more...)

2410.11897

Country:

North America > United States > Massachusetts (0.04)
Asia > Vietnam (0.04)
North America > United States > New Hampshire > Merrimack County > Concord (0.04)
(10 more...)

Genre: Research Report (0.82)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Health & Medicine > Therapeutic Area (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Anceschi, Niccolò, Rigon, Tommaso, Zanella, Giacomo, Durante, Daniele

Optimal lower bounds for logistic log-likelihoods

arXiv.org Machine LearningOct-14-2024

The logit transform is arguably the most widely-employed link function beyond linear settings. This transformation routinely appears in regression models for binary data and provides, either explicitly or implicitly, a core building-block within state-of-the-art methodologies for both classification and regression. Its widespread use, combined with the lack of analytical solutions for the optimization of general losses involving the logit transform, still motivates active research in computational statistics. Among the directions explored, a central one has focused on the design of tangent lower bounds for logistic log-likelihoods that can be tractably optimized, while providing a tight approximation of these log-likelihoods. Although progress along these lines has led to the development of effective minorize-maximize (MM) algorithms for point estimation and coordinate ascent variational inference schemes for approximate Bayesian inference under several logit models, the overarching focus in the literature has been on tangent quadratic minorizers. In fact, it is still unclear whether tangent lower bounds sharper than quadratic ones can be derived without undermining the tractability of the resulting minorizer. This article addresses such a challenging question through the design and study of a novel piece-wise quadratic lower bound that uniformly improves any tangent quadratic minorizer, including the sharpest ones, while admitting a direct interpretation in terms of the classical generalized lasso problem. As illustrated in a ridge logistic regression, this unique connection facilitates more effective implementations than those provided by available piece-wise bounds, while improving the convergence speed of quadratic ones.

minorizer, regression, tangent minorizer, (16 more...)

2410.10309

Country:

Europe > Italy > Lombardy > Milan (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > North Carolina > Durham County > Durham (0.04)

Genre: Research Report > Experimental Study (0.35)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Rügamer, David, Liew, Bernard X. W., Altai, Zainab, Stöcker, Almond

A Functional Extension of Semi-Structured Networks

arXiv.org Machine LearningOct-13-2024

Semi-structured networks (SSNs) merge the structures familiar from additive models with deep neural networks, allowing the modeling of interpretable partial feature effects while capturing higher-order non-linearities at the same time. A significant challenge in this integration is maintaining the interpretability of the additive model component. Inspired by large-scale biomechanics datasets, this paper explores extending SSNs to functional data. Existing methods in functional data analysis are promising but often not expressive enough to account for all interactions and non-linearities and do not scale well to large datasets. Although the SSN approach presents a compelling potential solution, its adaptation to functional data remains complex. In this work, we propose a functional SSN method that retains the advantageous properties of classical functional regression approaches while also improving scalability. Our numerical experiments demonstrate that this approach accurately recovers underlying signals, enhances predictive performance, and performs favorably compared to competing methods.

application, neural network, regression, (17 more...)

2410.0543

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.46)
Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Musculoskeletal (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

arXiv.org Artificial IntelligenceOct-12-2024

Information Discovery in e-Commerce

Ren, Zhaochun, He, Xiangnan, Yin, Dawei, de Rijke, Maarten

Electronic commerce, or e-commerce, is the buying and selling of goods and services, or the transmitting of funds or data online. E-commerce platforms come in many kinds, with global players such as Amazon, Airbnb, Alibaba, eBay and platforms targeting specific geographic regions. Information retrieval has a natural role to play in e-commerce, especially in connecting people to goods and services. Information discovery in e-commerce concerns different types of search (e.g., exploratory search vs. lookup tasks), recommender systems, and natural language processing in e-commerce portals. The rise in popularity of e-commerce sites has made research on information discovery in e-commerce an increasingly active research area. This is witnessed by an increase in publications and dedicated workshops in this space. Methods for information discovery in e-commerce largely focus on improving the effectiveness of e-commerce search and recommender systems, on enriching and using knowledge graphs to support e-commerce, and on developing innovative question answering and bot-based solutions that help to connect people to goods and services. In this survey, an overview is given of the fundamental infrastructure, algorithms, and technical solutions for information discovery in e-commerce. The topics covered include user behavior and profiling, search, recommendation, and language technology in e-commerce.

conversational recommendation system, multi-turn chinese dialogue dataset, post-click conversion rate estimation, (15 more...)

2410.05763

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Africa > Togo (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
Research Report > Promising Solution (0.92)

Industry: Information Technology > Services > e-Commerce Services (1.00)

Technology:

Information Technology > e-Commerce (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
(13 more...)

Neural Information Processing SystemsOct-11-2024, 14:02:43 GMT

FasterRisk: Fast and Accurate Interpretable Risk Scores

Over the last century, risk scores have been the most popular form of predictive model used in healthcare and criminal justice. Risk scores are sparse linear models with integer coefficients; often these models can be memorized or placed on an index card. Typically, risk scores have been created either without data or by rounding logistic regression coefficients, but these methods do not reliably produce high-quality risk scores. Recent work used mathematical programming, which is computationally slow. We introduce an approach for efficiently producing a collection of high-quality risk scores learned from data.

accurate interpretable risk score, high-quality risk score, risk score, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.63)