Goto

Collaborating Authors

 Search


Optimal flow analysis, prediction and application

arXiv.org Artificial Intelligence

This thesis employs statistical learning technique to analyze, predict and solve the fixed charge network flow (FCNF) problem, which is common encountered in many real-world network problems. The cost structure for flows in the FCNF involves both fixed and variable costs. The FCNF problem is modeled mixed binary linear programs and can be solved with standard commercial solvers, which use branch and bound algorithm. This problem is important for its widely applications and solving challenges. There does not exist a efficient algorithm to solve this problem optimally due to lacking tight bounds. To the best of our knowledge, this is the first work that employs statistical learning technique to analyze the optimal flow of the FCNF problem. Most algorithms developed to solve the FCNF problem are based on the cost structure, relaxation, etc. We start from the network characteristics and explore the relationship between properties of nodes, arcs and networks and the optimal flow. This is a bi-direction approach and the findings can be used to locate the features that affect the optimal flow most significantly, predict the optimal arcs and provide information to solve the FCNF problem. In particular, we define 33 features based on the network characteristics, from which using step wise regression, we identify 26 statistical significant predictors for logistic regression to predict which arcs will have positive flow in the optimal solutions. The predictive model achieves 88% accuracy and the area under receiver operating characteristic curve is 0.95. Two applications are investigated. Firstly, the predictive results can be used directly as component critical index. The failure of arcs with higher critical index result in more cost increase over the entire network.


Greedy Algorithms for Approximating the Diameter of Machine Learning Datasets in Multidimensional Euclidean Space

arXiv.org Machine Learning

Finding the diameter of a dataset in multidimensional Euclidean space is a well-established problem, with well-known algorithms. However, most of the algorithms found in the literature do not scale well with large values of data dimension, so the time complexity grows exponentially in most cases, which makes these algorithms impractical. Therefore, we implemented 4 simple greedy algorithms to be used for approximating the diameter of a multidimensional dataset; these are based on minimum/maximum l2 norms, hill climbing search, Tabu search and Beam search approaches, respectively. The time complexity of the implemented algorithms is near-linear, as they scale near-linearly with data size and its dimensions. The results of the experiments (conducted on different machine learning data sets) prove the efficiency of the implemented algorithms and can therefore be recommended for finding the diameter to be used by different machine learning applications when needed.


New design of the Rubik's cube lets you battle other players online using Bluetooth

Daily Mail - Science & tech

One of the world's oldest and most popular toys is getting a face-lift. Israel-based startup Particula has unveiled its spin on the Rubik's cube, dubbed the GoCube, that can connect to your phone and enables users to play against other people. It marks a major step up from when the original Rubik's cube, developed by Hungarian sculptor Erno Rubik, was first released in 1974. Since then, over 350 million Rubik's cubes have been sold worldwide. Israel-based startup Particula has unveiled its spin on the Rubik's cube, dubbed the GoCube (pictured), that can connect to your phone and enables users to play against other people The GoCube syncs up with smartphones and tablets using a Bluetooth connection, giving users access to the Battle feature, which lets them'play friends (or enemies) across the world, according to Particula.


Are data scientists the highest paid professionals?

#artificialintelligence

There is considerable hype around data science. Websites and social media are flooded with articles on Big Data, Data Science, and Data Analytics. These fields are projected as top fields, while data scientists are considered as saviours of the world and hence are supposed to be highest paid professionals. Most articles project how, using data science, companies have become super-intelligent in understanding their customers and hence are now able to sell products and services in a sophisticated manner. The marketing is now based on customers' differential purchasing interest and buying behaviour.


Zen and the art of data structures: From self-tuning to self-designing data systems ZDNet

#artificialintelligence

What if the huge design space for data-driven software could be efficiently mapped and explored in order to have tailor-made, optimized solutions? Researchers from Harvard combine analytical models, benchmarks, and machine learning to make this possible. The Internet of Things is creating serious new security risks. We examine the possibilities and the dangers. Stratos Idreos is an Assistant Professor at Harvard, and leads the Data Systems Laboratory (DASlab).


Predicting Solution Summaries to Integer Linear Programs under Imperfect Information with Machine Learning

arXiv.org Machine Learning

The paper provides a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a methodology to quickly predict solution summaries (i.e., solution descriptions at a given level of detail) to discrete stochastic optimization problems. We approximate the solutions based on supervised learning and the training dataset consists of a large number of deterministic problems that have been solved independently and offline. Uncertainty regarding a missing subset of the inputs is addressed through sampling and aggregation methods. Our motivating application concerns booking decisions of intermodal containers on double-stack trains. Under perfect information, this is the so-called load planning problem and it can be formulated by means of integer linear programming. However, the formulation cannot be used for the application at hand because of the restricted computational budget and unknown container weights. The results show that standard deep learning algorithms allow one to predict descriptions of solutions with high accuracy in very short time (milliseconds or less).


A Fuzzy-Rough based Binary Shuffled Frog Leaping Algorithm for Feature Selection

arXiv.org Artificial Intelligence

Feature selection and attribute reduction are crucial problems, and widely used techniques in the field of machine learning, data mining and pattern recognition to overcome the well-known phenomenon of the Curse of Dimensionality, by either selecting a subset of features or removing unrelated ones. This paper presents a new feature selection method that efficiently carries out attribute reduction, thereby selecting the most informative features of a dataset. It consists of two components: 1) a measure for feature subset evaluation, and 2) a search strategy. For the evaluation measure, we have employed the fuzzy-rough dependency degree (FRFDD) in the lower approximation-based fuzzy-rough feature selection (L-FRFS) due to its effectiveness in feature selection. As for the search strategy, a new version of a binary shuffled frog leaping algorithm is proposed (B-SFLA). The new feature selection method is obtained by hybridizing the B-SFLA with the FRDD. Non-parametric statistical tests are conducted to compare the proposed approach with several existing methods over twenty two datasets, including nine high dimensional and large ones, from the UCI repository. The experimental results demonstrate that the B-SFLA approach significantly outperforms other metaheuristic methods in terms of the number of selected features and the classification accuracy.


Composable Core-sets for Determinant Maximization Problems via Spectral Spanners

arXiv.org Machine Learning

We study a spectral generalization of classical combinatorial graph spanners to the spectral setting. Given a set of vectors $V\subseteq \Re^d$, we say a set $U\subseteq V$ is an $\alpha$-spectral spanner if for all $v\in V$ there is a probability distribution $\mu_v$ supported on $U$ such that $$vv^\intercal \preceq \alpha\cdot\mathbb{E}_{u\sim\mu_v} uu^\intercal.$$ We show that any set $V$ has an $\tilde{O}(d)$-spectral spanner of size $\tilde{O}(d)$ and this bound is almost optimal in the worst case. We use spectral spanners to study composable core-sets for spectral problems. We show that for many objective functions one can use a spectral spanner, independent of the underlying functions, as a core-set and obtain almost optimal composable core-sets. For example, for the determinant maximization problem we obtain an $\tilde{O}(k)^k$-composable core-set and we show that this is almost optimal in the worst case. Our algorithm is a spectral analogue of the classical greedy algorithm for finding (combinatorial) spanners in graphs. We expect that our spanners find many other applications in distributed or parallel models of computation. Our proof is spectral. As a side result of our techniques, we show that the rank of diagonally dominant lower-triangular matrices are robust under `small perturbations' which could be of independent interests.


While Tuning is Good, No Tuner is Best

arXiv.org Artificial Intelligence

Hyperparameter tuning is the black art of automatically finding a good combination of control parameters for a data miner. While widely applied in Software Engineering, there has not been much discussion on which hyperparameter tuner is best for software analytics. To address this gap in the literature, this paper applied a range of hyperparameter optimizers (grid search, differential evolution, random search, SMAC) to defect prediction. No hyperparameter optimizer was observed to be "best" and, for one of the two evaluation measures studied here (F-measure), hyperparameter optimization, in 50\% cases, was no better than using default configurations. We conclude that hyperparameter optimization is more nuanced than previously believed. While such optimization can certainly lead to large improvements in the performance of classifiers used in software analytics, it remains to be seen which specific optimizers should be endorsed.


ScottyActivity: Mixed Discrete-Continuous Planning with Convex Optimization

Journal of Artificial Intelligence Research

The state of the art practice in robotics planning is to script behaviors manually, where each behavior is typically generated using trajectory optimization. However, in order for robots to be able to act robustly and adapt to novel situations, they need to plan these activity sequences autonomously. Since the conditions and effects of these behaviors are tightly coupled through time, state and control variables, many problems require that the tasks of activity planning and trajectory optimization are considered together. There are two key issues underlying effective hybrid activity and trajectory planning: the sufficiently accurate modeling of robot dynamics and the capability of planning over long horizons. Hybrid activity and trajectory planners that employ mixed integer programming within a discrete time formulation are able to accurately model complex dynamics for robot vehicles, but are often restricted to relatively short horizons. On the other hand, current hybrid activity planners that employ continuous time formulations can handle longer horizons but they only allow actions to have continuous effects with constant rate of change, and restrict the allowed state constraints to linear inequalities. This is insufficient for many robotic applications and it greatly limits the expressivity of the problems that these approaches can solve. In this work we present the ScottyActivity planner, that is able to generate practical hybrid activity and motion plans over long horizons by employing recent methods in convex optimization combined with methods for planning with relaxed plan graphs and heuristic forward search. Unlike other continuous time planners, ScottyActivity can solve a broad class of robotic planning problems by supporting convex quadratic constraints on state variables and control variables that are jointly constrained and that affect multiple state variables simultaneously. In order to support planning over long horizons, ScottyActivity does not resort to time, state or control variable discretization. While straightforward formulations of consistency checks are not convex and do not scale, we present an efficient convex formulation, in the form of a Second Order Cone Program (SOCP), that is very fast to solve. We also introduce several new realistic domains that demonstrate the capabilities and scalability of our approach, and their simplified linear versions, that we use to compare with other state of the art planners. This work demonstrates the power of integrating advanced convex optimization techniques with discrete search methods and paves the way for extensions dealing with non-convex disjoint constraints, such as obstacle avoidance.