AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Fast Evaluation of Additive Kernels: Feature Arrangement, Fourier Methods, and Kernel Derivatives

Wagner, Theresa, Nestler, Franziska, Stoll, Martin

arXiv.org Artificial IntelligenceApr-26-2024

One of the main computational bottlenecks when working with kernel based learning is dealing with the large and typically dense kernel matrix. Techniques dealing with fast approximations of the matrix vector product for these kernel matrices typically deteriorate in their performance if the feature vectors reside in higher-dimensional feature spaces. We here present a technique based on the non-equispaced fast Fourier transform (NFFT) with rigorous error analysis. We show that this approach is also well suited to allow the approximation of the matrix that arises when the kernel is differentiated with respect to the kernel hyperparameters; a problem often found in the training phase of methods such as Gaussian processes. We also provide an error analysis for this case. We illustrate the performance of the additive kernel scheme with fast matrix vector products on a number of data sets.

approximation, fourier coefficient, kernel, (14 more...)

arXiv.org Artificial Intelligence

2404.17344

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Indiana > Hamilton County > Fishers (0.04)
Europe > Germany (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)

Genre:

Overview (0.67)
Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Automated Model Selection for Generalized Linear Models

Schwendinger, Benjamin, Schwendinger, Florian, Vana-Gür, Laura

arXiv.org Machine LearningApr-25-2024

In this paper, we show how mixed-integer conic optimization can be used to combine feature subset selection with holistic generalized linear models to fully automate the model selection process. Concretely, we directly optimize for the Akaike and Bayesian information criteria while imposing constraints designed to deal with multicollinearity in the feature selection task. Specifically, we propose a novel pairwise correlation constraint that combines the sign coherence constraint with ideas from classical statistical models like Ridge regression and the OSCAR model.

constraint, regression, selection, (14 more...)

arXiv.org Machine Learning

2404.1656

Country:

Europe > Austria > Vienna (0.04)
North America > United States > Kansas > Riley County > Manhattan (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Using Artificial Intelligence to Unlock Crowdfunding Success for Small Businesses

Ye, Teng, Zheng, Jingnan, Jin, Junhui, Qiu, Jingyi, Ai, Wei, Mei, Qiaozhu

arXiv.org Artificial IntelligenceApr-24-2024

While small businesses are increasingly turning to online crowdfunding platforms for essential funding, over 40% of these campaigns may fail to raise any money, especially those from low socio-economic areas. We utilize the latest advancements in AI technology to identify crucial factors that influence the success of crowdfunding campaigns and to improve their fundraising outcomes by strategically optimizing these factors. Our best-performing machine learning model accurately predicts the fundraising outcomes of 81.0% of campaigns, primarily based on their textual descriptions. Interpreting the machine learning model allows us to provide actionable suggestions on improving the textual description before launching a campaign. We demonstrate that by augmenting just three aspects of the narrative using a large language model, a campaign becomes more preferable to 83% human evaluators, and its likelihood of securing financial support increases by 11.9%. Our research uncovers the effective strategies for crafting descriptions for small business fundraising campaigns and opens up a new realm in integrating large language models into crowdfunding methodologies.

campaign description, participant, small business, (17 more...)

arXiv.org Artificial Intelligence

2407.0948

Country:

Europe (0.05)
North America > United States > Illinois (0.04)
North America > United States > Michigan (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (0.93)

Industry:

Banking & Finance > Economy (1.00)
Education (0.68)
Government > Regional Government > North America Government > United States Government (0.68)
(2 more...)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Autonomous LLM-driven research from data to human-verifiable research papers

Ifargan, Tal, Hafner, Lukas, Kern, Maor, Alcalay, Ori, Kishony, Roy

arXiv.org Artificial IntelligenceApr-24-2024

As AI promises to accelerate scientific discovery, it remains unclear whether fully AI-driven research is possible and whether it can adhere to key scientific values, such as transparency, traceability and verifiability. Mimicking human scientific practices, we built data-to-paper, an automation platform that guides interacting LLM agents through a complete stepwise research process, while programmatically back-tracing information flow and allowing human oversight and interactions. In autopilot mode, provided with annotated data alone, data-to-paper raised hypotheses, designed research plans, wrote and debugged analysis codes, generated and interpreted results, and created complete and information-traceable research papers. Even though research novelty was relatively limited, the process demonstrated autonomous generation of de novo quantitative insights from data. For simple research goals, a fully-autonomous cycle can create manuscripts which recapitulate peer-reviewed publications without major errors in about 80-90%, yet as goal complexity increases, human co-piloting becomes critical for assuring accuracy. Beyond the process itself, created manuscripts too are inherently verifiable, as information-tracing allows to programmatically chain results, methods and data. Our work thereby demonstrates a potential for AI-driven acceleration of scientific discovery while enhancing, rather than jeopardizing, traceability, transparency and verifiability.

dataset, llm response, research goal, (16 more...)

arXiv.org Artificial Intelligence

2404.17605

Country:

Asia > Middle East > Israel > Haifa District > Haifa (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Consumer Health (0.69)
Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (0.68)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Differentially Private Federated Learning: Servers Trustworthiness, Estimation, and Statistical Inference

Zhang, Zhe, Nakada, Ryumei, Zhang, Linjun

arXiv.org Machine LearningApr-24-2024

Differentially private federated learning is crucial for maintaining privacy in distributed environments. This paper investigates the challenges of high-dimensional estimation and inference under the constraints of differential privacy. First, we study scenarios involving an untrusted central server, demonstrating the inherent difficulties of accurate estimation in high-dimensional problems. Our findings indicate that the tight minimax rates depends on the high-dimensionality of the data even with sparsity assumptions. Second, we consider a scenario with a trusted central server and introduce a novel federated estimation algorithm tailored for linear regression models. This algorithm effectively handles the slight variations among models distributed across different machines. We also propose methods for statistical inference, including coordinate-wise confidence intervals for individual parameters and strategies for simultaneous inference. Extensive simulation experiments support our theoretical advances, underscoring the efficacy and reliability of our approaches.

algorithm, confidence interval, federated learning, (14 more...)

arXiv.org Machine Learning

2404.16287

Country:

North America > United States > Virginia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Add feedback

Na\"ive Bayes and Random Forest for Crop Yield Prediction

Maazallahi, Abbas, Thota, Sreehari, Kondaboina, Naga Prasad, Muktineni, Vineetha, Annem, Deepthi, Rokkam, Abhi Stephen, Amini, Mohammad Hossein, Salari, Mohammad Amir, Norouzzadeh, Payam, Snir, Eli, Rahmani, Bahareh

arXiv.org Artificial IntelligenceApr-23-2024

This study analyzes crop yield prediction in India from 1997 to 2020, focusing on various crops and key environmental factors. It aims to predict agricultural yields by utilizing advanced machine learning techniques like Linear Regression, Decision Tree, KNN, Na\"ive Bayes, K-Mean Clustering, and Random Forest. The models, particularly Na\"ive Bayes and Random Forest, demonstrate high effectiveness, as shown through data visualizations. The research concludes that integrating these analytical methods significantly enhances the accuracy and reliability of crop yield predictions, offering vital contributions to agricultural data science.

dataset, prediction, random forest, (16 more...)

arXiv.org Artificial Intelligence

2404.15392

Country:

North America > United States > Missouri > St. Louis County > St. Louis (0.04)
Asia > India > Tamil Nadu > Chennai (0.04)
Asia > India > Maharashtra (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry: Food & Agriculture > Agriculture (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.93)
(2 more...)

Add feedback

Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares

Brown, Gavin, Hayase, Jonathan, Hopkins, Samuel, Kong, Weihao, Liu, Xiyang, Oh, Sewoong, Perdomo, Juan C., Smith, Adam

arXiv.org Machine LearningApr-23-2024

We present a sample- and time-efficient differentially private algorithm for ordinary least squares, with error that depends linearly on the dimension and is independent of the condition number of $X^\top X$, where $X$ is the design matrix. All prior private algorithms for this task require either $d^{3/2}$ examples, error growing polynomially with the condition number, or exponential time. Our near-optimal accuracy guarantee holds for any dataset with bounded statistical leverage and bounded residuals. Technically, we build on the approach of Brown et al. (2023) for private mean estimation, adding scaled noise to a carefully designed stable nonprivate estimator of the empirical regression vector.

algorithm, dataset, supp, (17 more...)

arXiv.org Machine Learning

2404.15409

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.45)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Security & Privacy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

Wang, Yuxia, Mansurov, Jonibek, Ivanov, Petar, Su, Jinyan, Shelmanov, Artem, Tsvigun, Akim, Afzal, Osama Mohammed, Mahmoud, Tarek, Puccetti, Giovanni, Arnold, Thomas, Whitehouse, Chenxi, Aji, Alham Fikri, Habash, Nizar, Gurevych, Iryna, Nakov, Preslav

arXiv.org Artificial IntelligenceApr-22-2024

We present the results and the main findings of SemEval-2024 Task 8: Multigenerator, Multidomain, and Multilingual Machine-Generated Text Detection. The task featured three subtasks. Subtask A is a binary classification task determining whether a text is written by a human or generated by a machine. This subtask has two tracks: a monolingual track focused solely on English texts and a multilingual track. Subtask B is to detect the exact source of a text, discerning whether it is written by a human or generated by a specific LLM. Subtask C aims to identify the changing point within a text, at which the authorship transitions from human to machine. The task attracted a large number of participants: subtask A monolingual (126), subtask A multilingual (59), subtask B (70), and subtask C (30). In this paper, we present the task, analyze the results, and discuss the system submissions and the methods they used. For all subtasks, the best systems used LLMs.

18th international workshop, detection, mexico city, (11 more...)

arXiv.org Artificial Intelligence

2404.14183

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > Mexico > Mexico City > Mexico City (0.07)
(8 more...)

Genre:

Research Report (1.00)
Overview (0.93)

Industry:

Media > News (0.67)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.92)

Add feedback

Differential contributions of machine learning and statistical analysis to language and cognitive sciences

Sun, Kun, Wang, Rong

arXiv.org Artificial IntelligenceApr-22-2024

Data-driven approaches have revolutionized scientific research. Machine learning and statistical analysis are commonly utilized in this type of research. Despite their widespread use, these methodologies differ significantly in their techniques and objectives. Few studies have utilized a consistent dataset to demonstrate these differences within the social sciences, particularly in language and cognitive sciences. This study leverages the Buckeye Speech Corpus to illustrate how both machine learning and statistical analysis are applied in data-driven research to obtain distinct insights. This study significantly enhances our understanding of the diverse approaches employed in data-driven strategies.

dataset, machine learning, word duration, (15 more...)

arXiv.org Artificial Intelligence

2404.14052

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Cognitive Architectures (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

What do Transformers Know about Government?

Hou, Jue, Katinskaia, Anisia, Kotilainen, Lari, Trangcasanchai, Sathianpong, Vu, Anh-Duc, Yangarber, Roman

arXiv.org Artificial IntelligenceApr-22-2024

This paper investigates what insights about linguistic features and what knowledge about the structure of natural language can be obtained from the encodings in transformer language models.In particular, we explore how BERT encodes the government relation between constituents in a sentence. We use several probing classifiers, and data from two morphologically rich languages. Our experiments show that information about government is encoded across all transformer layers, but predominantly in the early layers of the model. We find that, for both languages, a small number of attention heads encode enough information about the government relations to enable us to train a classifier capable of discovering new, previously unknown types of government, never seen in the training data. Currently, data is lacking for the research community working on grammatical constructions, and government in particular. We release the Government Bank -- a dataset defining the government relations for thousands of lemmas in the languages in our experiments.

classifier, computational linguistic, government, (16 more...)

arXiv.org Artificial Intelligence

2404.1427

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
(8 more...)

Genre:

Research Report > New Finding (0.88)
Research Report > Experimental Study (0.69)

Industry: Government > Regional Government > Europe Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback