AITopics | Information Technology

Collaborating Authors

Information Technology

Collective Mind: cleaning up the research and experimentation mess in computer engineering using crowdsourcing, big data and machine learning

arXiv.org Machine LearningAug-11-2013

Software and hardware co-design and optimization of HPC systems has become intolerably complex, ad-hoc, time consuming and error prone due to enormous number of available design and optimization choices, complex interactions between all software and hardware components, and multiple strict requirements placed on performance, power consumption, size, reliability and cost. We present our novel long-term holistic and practical solution to this problem based on customizable, plugin-based, schema-free, heterogeneous, open-source Collective Mind repository and infrastructure with unified web interfaces and on-line advise system. This collaborative framework distributes analysis and multi-objective off-line and on-line auto-tuning of computer systems among many participants while utilizing any available smart phone, tablet, laptop, cluster or data center, and continuously observing, classifying and modeling their realistic behavior. Any unexpected behavior is analyzed using shared data mining and predictive modeling plugins or exposed to the community at cTuning.org for collaborative explanation, top-down complexity reduction, incremental problem decomposition and detection of correlating program, architecture or run-time properties (features). Gradually increasing optimization knowledge helps to continuously improve optimization heuristics of any compiler, predict optimizations for new programs or suggest efficient run-time (online) tuning and adaptation strategies depending on end-user requirements. We decided to share all our past research artifacts including hundreds of codelets, numerical applications, data sets, models, universal experimental analysis and auto-tuning pipelines, self-tuning machine learning based meta compiler, and unified statistical analysis and machine learning plugins in a public repository to initiate systematic, reproducible and collaborative research, development and experimentation with a new publication model where experiments and techniques are validated, ranked and improved by the community.

artificial intelligence, optimization, optimization problem, (20 more...)

arXiv.org Machine Learning

1308.241

Country:

North America > United States (0.93)
Europe (0.68)

Genre: Research Report (0.50)

Industry: Information Technology > Services (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Near-Optimal Algorithms for Differentially-Private Principal Components

Chaudhuri, Kamalika, Sarwate, Anand D., Sinha, Kaushik

arXiv.org Machine LearningAug-7-2013

Principal components analysis (PCA) is a standard tool for identifying good low-dimensional approximations to data in high dimension. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method.

algorithm, artificial intelligence, survey article, (17 more...)

arXiv.org Machine Learning

1207.2812

Country:

North America > United States > New York (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

POMDPs Make Better Hackers: Accounting for Uncertainty in Penetration Testing

Sarraute, Carlos, Buffet, Olivier, Hoffmann, Joerg

arXiv.org Artificial IntelligenceJul-30-2013

Penetration Testing is a methodology for assessing network security, by generating and executing possible hacking attacks. Doing so automatically allows for regular and systematic testing. A key question is how to generate the attacks. This is naturally formulated as planning under uncertainty, i.e., under incomplete knowledge about the network configuration. Previous work uses classical planning, and requires costly pre-processes reducing this uncertainty by extensive application of scanning methods. By contrast, we herein model the attack planning problem in terms of partially observable Markov decision processes (POMDP). This allows to reason about the knowledge available, and to intelligently employ scanning actions as part of the attack. As one would expect, this accurate solution does not scale. We devise a method that relies on POMDPs to find good attacks on individual machines, which are then composed into an attack on the network as a whole. This decomposition exploits network structure to the extent possible, making targeted approximations (only) where needed. Evaluating this method on a suitably adapted industrial test suite, we demonstrate its effectiveness in both runtime and solution quality.

artificial intelligence, firewall, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1307.8182

Country: Europe > Germany > Saarland (0.14)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Learning an Integrated Distance Metric for Comparing Structure of Complex Networks

Aliakbary, Sadegh, Motallebi, Sadegh, Habibi, Jafar, Movaghar, Ali

arXiv.org Artificial IntelligenceJul-13-2013

Graph comparison plays a major role in many network applications. We often need a similarity metric for comparing networks according to their structural properties. Various network features - such as degree distribution and clustering coefficient - provide measurements for comparing networks from different points of view, but a global and integrated distance metric is still missing. In this paper, we employ distance metric learning algorithms in order to construct an integrated distance metric for comparing structural properties of complex networks. According to natural witnesses of network similarities (such as network categories) the distance metric is learned by the means of a dataset of some labeled real networks. For evaluating our proposed method which is called NetDistance, we applied it as the distance metric in K-nearest-neighbors classification. Empirical results show that NetDistance outperforms previous methods, at least 20 percent, with respect to precision.

artificial intelligence, similarity, survey article, (19 more...)

arXiv.org Artificial Intelligence

1307.3626

Country: North America > United States > New York > New York County > New York City (0.14)

Industry: Information Technology (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

Sensitivity of Diffusion Dynamics to Network Uncertainty

Adiga, Abhijin (Virginia Tech) | Kuhlman, Chris (Virginia Tech) | Mortveit, Henning (Virginia Tech) | Vullikanti, Anil Kumar S (Virginia Tech)

AAAI ConferencesJul-9-2013

Simple diffusion processes on networks have been used to model, analyze and predict diverse phenomena such as spread of diseases, information and memes. More often than not, the underlying network data is noisy and sampled. This prompts the following natural question: how sensitive are the diffusion dynamics and subsequent conclusions to uncertainty in the network structure? In this paper, we consider two popular diffusion models: Independent cascades (IC) model and Linear threshold (LT) model. We study how the expected number of vertices that are influenced/infected, given some initial conditions, are affected by network perturbation. By rigorous analysis under the assumption of a reasonable perturbation model we establish the following main results. (1) For the IC model, we characterize the susceptibility to network perturbation in terms of the critical probability for phase transition of the network. We find the expected number of infections is quite stable, unless the the transmission probability is close to the critical probability. (2) We show that the standard LT model with uniform edge weights is relatively stable under network perturbations. (3) Empirically, the transient behavior, i.e., the time series of the number of infections, in both models appears to be more sensitive to network perturbations. We also study these questions using extensive simulations on diverse real world networks, and find that our theoretical predictions for both models match the empirical observations quite closely.

artificial intelligence, infection, social media, (18 more...)

AAAI Conferences

Twenty-Seventh AAAI Conference on Artificial Intelligence

Industry: Information Technology (0.35)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Networks (0.88)

Add feedback

Locate the Hate: Detecting Tweets against Blacks

Kwok, Irene (Wellesley College) | Wang, Yuzhou (Wellesley College)

AAAI ConferencesJul-9-2013

Although the social medium Twitter grants users freedom of speech, its instantaneous nature and retweeting features also amplify hate speech. Because Twitter has a sizeable black constituency, racist tweets against blacks are especially detrimental in the Twitter community, though this effect may not be obvious against a backdrop of half a billion tweets a day.1 We apply a supervised machine learning approach, employing inexpensively acquired labeled data from diverse Twitter accounts to learn a binary classifier for the labels “racist” and “nonracist.” The classifier has a 76% average accuracy on individual tweets, suggesting that with further improvements, our work can contribute data on the sources of anti-black hate speech.

detecting tweet

AAAI Conferences

Twenty-Seventh AAAI Conference on Artificial Intelligence

Industry:

Law > Civil Rights & Constitutional Law (0.73)
Information Technology > Services (0.53)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

From Interest to Function: Location Estimation in Social Media

AAAI ConferencesJul-9-2013

Recent years have witnessed the tremendous development of social media, which attracts a vast number of Internet users. The high-dimension content generated by these users provides an unique opportunity to understand their behavior deeply. As one of the most fundamental topics, location estimation attracts more and more research efforts. Different from the previous literature, we find that user's location is strongly related to user interest. Based on this, we first build a detection model to mine user interest from short text. We then establish the mapping between location function and user interest before presenting an efficient framework to predict the user's location with convincing fidelity. Thorough evaluations and comparisons on an authentic data set show that our proposed model significantly outperforms the state-of-the-arts approaches. Moreover, the high efficiency of our model also guarantees its applicability in real-world scenarios.

bayesian inference, social media, tweet, (19 more...)

AAAI Conferences

Twenty-Seventh AAAI Conference on Artificial Intelligence

Country:

Asia (0.94)
North America > United States (0.28)

Genre:

Research Report (0.34)
Overview (0.34)

Industry: Information Technology > Services (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Search More, Disclose Less

Hajaj, Chen (Bar-Ilan University) | Hazon, Noam (Bar-Ilan University) | Sarne, David (Bar-Ilan University) | Elmalech, Avshalom (Bar-Ilan University)

AAAI ConferencesJul-9-2013

The blooming of comparison shopping agents (CSAs) in recent years enables buyers in today's markets to query more than a single CSA while shopping, thus substantially expanding the list of sellers whose prices they obtain. From the individual CSA point of view, however, the multi-CSAs querying is definitely non-favorable as most of today's CSAs benefit depends on payments they receive from sellers upon transferring buyers to their websites (and making a purchase). The most straightforward way for the CSA to improve its competence is through spending more resources on getting more sellers' prices, potentially resulting in a more attractive ``best price''. In this paper we suggest a complementary approach that improves the attractiveness of the best price returned to the buyer without having to extend the CSAs' price database. This approach, which we term ``selective price disclosure'' relies on removing some of the prices known to the CSA from the list of results returned to the buyer. The advantage of this approach is in the ability to affect the buyer's beliefs regarding the probability of obtaining more attractive prices if querying additional CSAs. The paper presents two methods for choosing the subset of prices to be presented to a fully-rational buyer, attempting to overcome the computational complexity associated with evaluating all possible subsets. The effectiveness and efficiency of the methods are demonstrated using real data, collected from five CSAs for four products. Furthermore, since people are known to have an inherently bounded rationality, the two methods are also evaluated with human buyers, demonstrating that selective price-disclosing can be highly effective with people, however the subset of prices that needs to be used should be extracted in a different (and more simplistic) manner.

artificial intelligence, csa, information technology services, (18 more...)

AAAI Conferences

Twenty-Seventh AAAI Conference on Artificial Intelligence

Country: Europe > United Kingdom (0.14)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry: Information Technology > Services > e-Commerce Services (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > e-Commerce (0.69)

Add feedback

Hypothesis Exploration for Malware Detection Using Planning

Sohrabi, Shirin (IBM T. J. Watson Research Center) | Udrea, Octavian (IBM T. J. Watson Research Center) | Riabov, Anton (IBM T. J. Watson Research Center)

AAAI ConferencesJul-9-2013

In this paper we apply AI planning to address the hypothesis exploration problem and provide assistance to network administrators in detecting malware based on unreliable observations derived from network traffic.Building on the already established characterization and use of AI planning for similar problems, we propose a formulation of the hypothesis generation problem for malware detection as an AI planning problem with temporally extended goals and actions costs. Furthermore, we propose a notion of hypothesis ``plausibility'' under unreliable observations, which we model as plan quality. We then show that in the presence of unreliable observations, simply finding one most ``plausible'' hypothesis, although challenging, is not sufficient for effective malware detection. To that end, we propose a method for applying a state-of-the-art planner within a principled exploration process, to generate multiple distinct high-quality plans. We experimentally evaluate this approach by generating random problems of varying hardness both with respect to the number of observations, as well as the degree of unreliability. Based on these experiments, we argue that our approach presents a significant improvement over prior work that are focused on finding a single optimal plan, and that our hypothesis exploration application can motivate the development of new planners capable of generating the top high-quality plans.

hypothesis, neural network, planning & scheduling, (19 more...)

AAAI Conferences

Twenty-Seventh AAAI Conference on Artificial Intelligence

Genre: Research Report (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.94)

Add feedback

The Automated Acquisition of Suggestions from Tweets

AAAI ConferencesJul-9-2013

This paper targets at automatically detecting and classifying user's suggestions from tweets. The short and informal nature of tweets, along with the imbalanced characteristics of suggestion tweets, makes the task extremely challenging. To this end, we develop a classification framework on Factorization Machines, which is effective and efficient especially in classification tasks with feature sparsity settings. Moreover, we tackle the imbalance problem by introducing cost-sensitive learning techniques in Factorization Machines. Extensively experimental studies on a manually annotated real-life data set show that the proposed approach significantly improves the baseline approach, and yields the precision of 71.06% and recall of 67.86%. We also investigate the reason why Factorization Machines perform better. Finally, we introduce the first manually annotated dataset for suggestion classification.

artificial intelligence, text processing, tweet, (20 more...)

AAAI Conferences

Twenty-Seventh AAAI Conference on Artificial Intelligence

Country:

Asia (0.29)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.49)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback