AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Anomaly Detection by Robust Statistics

arXiv.org Machine LearningOct-14-2017

Real data often contain anomalous cases, also known as outliers. These may spoil the resulting analysis but they may also contain valuable information. In either case, the ability to detect such anomalies is essential. A useful tool for this purpose is robust statistics, which aims to detect the outliers by first fitting the majority of the data and then flagging data points that deviate from it. We present an overview of several robust methods and the resulting graphical outlier detection tools. We discuss robust procedures for univariate, low-dimensional, and high-dimensional data, such as estimating location and scatter, linear regression, principal component analysis, classification, clustering, and functional data analysis. Also the challenging new topic of cellwise outliers is introduced.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

1707.09752

Country: Europe > Austria (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback

24 Uses of Statistical Modeling (Part I)

@machinelearnbotOct-12-2017, 11:35:17 GMT

Here we discuss general applications of statistical models, whether they arise from data science, operations research, engineering, machine learning or statistics. We do not discuss specific algorithms such as decision trees, logistic regression, Bayesian modeling, Markov models, data reduction or feature selection. Instead, I discuss frameworks - each one using its own types of techniques and algorithms - to solve real life problems. Most of the entries below are found in Wikipedia, and I have used a few definitions or extracts from the relevant Wikipedia articles, in addition to personal contributions. Spatial dependency is the co-variation of properties within geographic space: characteristics at proximal locations appear to be correlated, either positively or negatively.

artificial intelligence, machine learning, social media, (16 more...)

@machinelearnbot

Industry: Banking & Finance (0.98)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Add feedback

Restricted Strong Convexity Implies Weak Submodularity

Elenberg, Ethan R., Khanna, Rajiv, Dimakis, Alexandros G., Negahban, Sahand

arXiv.org Machine LearningOct-12-2017

We connect high-dimensional subset selection and submodular maximization. Our results extend the work of Das and Kempe (2011) from the setting of linear regression to arbitrary objective functions. For greedy feature selection, this connection allows us to obtain strong multiplicative performance bounds on several methods without statistical modeling assumptions. We also derive recovery guarantees of this form under standard assumptions. Our work shows that greedy algorithms perform within a constant factor from the best possible subset-selection solution for a broad class of general objective functions. Our methods allow a direct control over the number of obtained features as opposed to regularization parameters that only implicitly control sparsity. Our proof technique uses the concept of weak submodularity initially defined by Das and Kempe. We draw a connection between convex analysis and submodular set function theory which may be of independent interest for other statistical learning applications that have combinatorial structure.

algorithm, assumption, regression, (11 more...)

arXiv.org Machine Learning

1612.00804

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

The Social Bow Tie

Mattie, Heather, Engø-Monsen, Kenth, Ling, Rich, Onnela, Jukka-Pekka

arXiv.org Machine LearningOct-12-2017

Understanding tie strength in social networks, and the factors that influence it, have received much attention in a myriad of disciplines for decades. Several models incorporating indicators of tie strength have been proposed and used to quantify relationships in social networks, and a standard set of structural network metrics have been applied to predominantly online social media sites to predict tie strength. Here, we introduce the concept of the "social bow tie" framework, a small subgraph of the network that consists of a collection of nodes and ties that surround a tie of interest, forming a topological structure that resembles a bow tie. We also define several intuitive and interpretable metrics that quantify properties of the bow tie. We use random forests and regression models to predict categorical and continuous measures of tie strength from different properties of the bow tie, including nodal attributes. We also investigate what aspects of the bow tie are most predictive of tie strength in two distinct social networks: a collection of 75 rural villages in India and a nationwide call network of European mobile phone users. Our results indicate several of the bow tie metrics are highly predictive of tie strength, and we find the more the social circles of two individuals overlap, the stronger their tie, consistent with previous findings. However, we also find that the more tightly-knit their non-overlapping social circles, the weaker the tie. This new finding complements our current understanding of what drives the strength of ties in social networks.

artificial intelligence, machine learning, tie strength, (20 more...)

arXiv.org Machine Learning

1710.04177

Country:

Asia > India (0.35)
North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Services (0.96)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

Predicting House Price Using Regression Algorithm for Machine Learning

#artificialintelligenceOct-11-2017, 19:55:23 GMT

Machine learning has been used for years to offer image recognition, spam detection, natural speech comprehension, product recommendations, and medical diagnoses. Today, machine learning algorithms can help us enhance cybersecurity, ensure public safety, and improve medical outcomes. Machine learning systems can also make customer service better and automobiles safer. When I started experimenting with machine learning, I wanted to come up with an application that would solve a real-world problem but would not be too complicated to implement. I also wanted to practice working with regression algorithms.

artificial intelligence, leasehold, machine learning, (15 more...)

#artificialintelligence

Country: Europe > Netherlands > North Holland > Amsterdam (0.05)

Industry:

Transportation (0.55)
Banking & Finance > Real Estate (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.74)

Add feedback

Efficient Data-Driven Geologic Feature Detection from Pre-stack Seismic Measurements using Randomized Machine-Learning Algorithm

Lin, Youzuo, Wang, Shusen, Thiagarajan, Jayaraman, Guthrie, George, Coblentz, David

arXiv.org Machine LearningOct-11-2017

Conventional seismic techniques for detecting the subsurface geologic features are challenged by limited data coverage, computational inefficiency, and subjective human factors. We developed a novel data-driven geological feature detection approach based on pre-stack seismic measurements. Our detection method employs an efficient and accurate machine-learning detection approach to extract useful subsurface geologic features automatically. Specifically, our method is based on kernel ridge regression model. The conventional kernel ridge regression can be computationally prohibited because of the large volume of seismic measurements. We employ a data reduction technique in combination with the conventional kernel ridge regression method to improve the computational efficiency and reduce memory usage. In particular, we utilize a randomized numerical linear algebra technique, named Nystr\"om method, to effectively reduce the dimensionality of the feature space without compromising the information content required for accurate detection. We provide thorough computational cost analysis to show efficiency of our new geological feature detection methods. We further validate the performance of our new subsurface geologic feature detection method using synthetic surface seismic data for 2D acoustic and elastic velocity models. Our numerical examples demonstrate that our new detection method significantly improves the computational efficiency while maintaining comparable accuracy. Interestingly, we show that our method yields a speed-up ratio on the order of $\sim10^2$ to $\sim 10^3$ in a multi-core computational environment.

approximation, health & medicine, upstream oil & gas, (22 more...)

arXiv.org Machine Learning

1710.04329

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Decentralized Online Learning with Kernels

Koppel, Alec, Paternain, Santiago, Richard, Cedric, Ribeiro, Alejandro

arXiv.org Machine LearningOct-11-2017

We consider multi-agent stochastic optimization problems over reproducing kernel Hilbert spaces (RKHS). In this setting, a network of interconnected agents aims to learn decision functions, i.e., nonlinear statistical models, that are optimal in terms of a global convex functional that aggregates data across the network, with only access to locally and sequentially observed samples. We propose solving this problem by allowing each agent to learn a local regression function while enforcing consensus constraints. We use a penalized variant of functional stochastic gradient descent operating simultaneously with low-dimensional subspace projections. These subspaces are constructed greedily by applying orthogonal matching pursuit to the sequence of kernel dictionaries and weights. By tuning the projection-induced bias, we propose an algorithm that allows for each individual agent to learn, based upon its locally observed data stream and message passing with its neighbors only, a regression function that is close to the globally optimal regression function. That is, we establish that with constant step-size selections agents' functions converge to a neighborhood of the globally optimal one while satisfying the consensus constraints as the penalty parameter is increased. Moreover, the complexity of the learned regression functions is guaranteed to remain finite. On both multi-class kernel logistic regression and multi-class kernel support vector classification with data generated from class-dependent Gaussian mixture models, we observe stable function estimation and state of the art performance for distributed online multi-class classification. Experiments on the Brodatz textures further substantiate the empirical validity of this approach.

agent, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1710.04062

Country: North America > United States (1.00)

Genre:

Instructional Material > Course Syllabus & Notes (0.48)
Research Report > New Finding (0.34)
Instructional Material > Online (0.34)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Machine Learning for Investors: A Primer -

@machinelearnbotOct-10-2017, 21:10:14 GMT

If you are out to describe the truth, leave elegance to the tailor. Machine learning is everywhere now, from self-driving cars to Siri and Google Translate, to news recommendation systems and, of course, trading. In the investing world, machine learning is at an inflection point. What was bleeding edge is rapidly going mainstream. It's being incorporated into mainstream tools, news recommendation engines, sentiment analysis, stock screeners. And the software frameworks are increasingly commoditized, so you don't need to be a machine learning specialist to make your own models and predictions. If you're an old-school quant investor, you may have been trained in traditional statistics paradigms and want to see if machine learning can improve your models and predictions. If so, then this primer is for you! Even if you're not planning to build your own models, AI tools are proliferating, and investors who use them will want to know the concepts behind them. And machine learning is transforming society with huge investing implications, so investors should know basically how it works. In school, when we studied modeling and forecasting, we were probably studying statistical methods. Those methods were created by geniuses like Pascal, Gauss, and Bernoulli.

artificial intelligence, machine learning, natural language, (17 more...)

@machinelearnbot

Industry:

Education (0.66)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.88)
(2 more...)

Add feedback

Using Task Descriptions in Lifelong Machine Learning for Improved Performance and Zero-Shot Transfer

Isele, David, Rostami, Mohammad, Eaton, Eric

arXiv.org Machine LearningOct-10-2017

Knowledge transfer between tasks can improve the performance of learned models, but requires an accurate estimate of the inter-task relationships to identify the relevant knowledge to transfer. These inter-task relationships are typically estimated based on training data for each task, which is inefficient in lifelong learning settings where the goal is to learn each consecutive task rapidly from as little data as possible. To reduce this burden, we develop a lifelong learning method based on coupled dictionary learning that utilizes high-level task descriptions to model the inter-task relationships. We show that using task descriptors improves the performance of the learned task policies, providing both theoretical justification for the benefit and empirical demonstration of the improvement across a variety of learning problems. Given only the descriptor for a new task, the lifelong learner is also able to accurately predict a model for the new task through zero-shot learning using the coupled dictionary, eliminating the need to gather training data before addressing the task.

large language model, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

1710.0385

Country: North America > United States > Pennsylvania (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Education > Educational Setting (0.91)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
(2 more...)

Add feedback

LinXGBoost: Extension of XGBoost to Generalized Local Linear Models

de Vito, Laurent

arXiv.org Machine LearningOct-10-2017

XGBoost is often presented as the algorithm that wins every ML competition. Surprisingly, this is true even though predictions are piecewise constant. This might be justified in high dimensional input spaces, but when the number of features is low, a piecewise linear model is likely to perform better. XGBoost was extended into LinXGBoost that stores at each leaf a linear model. This extension, equivalent to piecewise regularized least-squares, is particularly attractive for regression of functions that exhibits jumps or discontinuities. Those functions are notoriously hard to regress. Our extension is compared to the vanilla XGBoost and Random Forest in experiments on both synthetic and real-world data sets.

artificial intelligence, machine learning, xgboost, (18 more...)

arXiv.org Machine Learning

1710.03634

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback