Getoor, Lise


Hinge-Loss Markov Random Fields and Probabilistic Soft Logic

arXiv.org Artificial Intelligence

A fundamental challenge in developing high-impact machine learning technologies is balancing the need to model rich, structured domains with the ability to scale to big data. Many important problem areas are both richly structured and large scale, from social and biological networks, to knowledge graphs and the Web, to images, video, and natural language. In this paper, we introduce two new formalisms for modeling structured data, and show that they can both capture rich structure and scale to big data. The first, hinge-loss Markov random fields (HL-MRFs), is a new kind of probabilistic graphical model that generalizes different approaches to convex inference. We unite three approaches from the randomized algorithms, probabilistic graphical models, and fuzzy logic communities, showing that all three lead to the same inference objective. We then define HL-MRFs by generalizing this unified objective. The second new formalism, probabilistic soft logic (PSL), is a probabilistic programming language that makes HL-MRFs easy to define using a syntax based on first-order logic. We introduce an algorithm for inferring most-probable variable assignments (MAP inference) that is much more scalable than general-purpose convex optimization methods, because it uses message passing to take advantage of sparse dependency structures. We then show how to learn the parameters of HL-MRFs. The learned HL-MRFs are as accurate as analogous discrete models, but much more scalable. Together, these algorithms enable HL-MRFs and PSL to model rich, structured data at scales not previously possible.


Capturing Planned Protests from Open Source Indicators

AI Magazine

Civil unrest events (protests, strikes, and “occupy” events) are common occurrences in both democracies and authoritarian regimes. The study of civil unrest is a key topic for political scientists as it helps capture an important mechanism by which citizenry express themselves. In countries where civil unrest is lawful, qualitative analysis has revealed that more than 75 percent of the protests are planned, organized, or announced in advance; therefore detecting references to future planned events in relevant news and social media is a direct way to develop a protest forecasting system. We report on a system for doing that in this article. It uses a combination of keyphrase learning to identify what to look for, probabilistic soft logic to reason about location occurrences in extracted results, and time normalization to resolve future time mentions. We illustrate the application of our system to 10 countries in Latin America: Argentina, Brazil, Chile, Colombia, Ecuador, El Salvador, Mexico, Paraguay, Uruguay, and Venezuela. Results demonstrate our successes in capturing significant societal unrest in these countries with an average lead time of 4.08 days. We also study the selective superiorities of news media versus social media (Twitter, Facebook) to identify relevant trade-offs.


Using Semantics and Statistics to Turn Data into Knowledge

AI Magazine

Many information extraction and knowledge base construction systems are addressing the challenge of deriving knowledge from text. In this article, we represent the desired knowledge base as a knowledge graph and introduce the problem of knowledge graph identification, collectively resolving the entities, labels, and relations present in the knowledge graph. Knowledge graph identification requires reasoning jointly over millions of extractions simultaneously, posing a scalability challenge to many approaches. We use probabilistic soft logic (PSL), a recently-introduced statistical relational learning framework, to implement an efficient solution to knowledge graph identification and present state-of-the-art results for knowledge graph construction while performing an order of magnitude faster than competing methods.


Using Semantics and Statistics to Turn Data into Knowledge

AI Magazine

Many information extraction and knowledge base construction systems are addressing the challenge of deriving knowledge from text. A key problem in constructing these knowledge bases from sources like the web is overcoming the erroneous and incomplete information found in millions of candidate extractions. To solve this problem, we turn to semantics — using ontological constraints between candidate facts to eliminate errors. In this article, we represent the desired knowledge base as a knowledge graph and introduce the problem of knowledge graph identification, collectively resolving the entities, labels, and relations present in the knowledge graph. Knowledge graph identification requires reasoning jointly over millions of extractions simultaneously, posing a scalability challenge to many approaches. We use probabilistic soft logic (PSL), a recently-introduced statistical relational learning framework, to implement an efficient solution to knowledge graph identification and present state-of-the-art results for knowledge graph construction while performing an order of magnitude faster than competing methods.


Active Surveying: A Probabilistic Approach for Identifying Key Opinion Leaders

AAAI Conferences

Opinion leaders play an important role in influencing people’s beliefs, actions and behaviors. Although a number of methods have been proposed for identifying influentials using secondary sources of information, the use of primary sources, such as surveys, is still favored in many domains. In this work we present a new surveying method which combines secondary data with partial knowledge from primary sources to guide the information gathering process. We apply our proposed active surveying method to the problem of identifying key opinion leaders in the medical field, and show how we are able to accurately identify the opinion leaders while minimizing the amount of primary data required, which results in significant cost reduction in data acquisition without sacrificing its integrity.


AI Theory and Practice: A Discussion on Hard Challenges and Opportunities Ahead

AI Magazine

The Microsoft Research Faculty Summit brought together eight experts in different areas of AI to share their thoughts about the key challenges ahead in theory and/or practice in the broad constellation of artificial intelligence.  This article summarizes their conversation.


Collective Classification in Network Data

AI Magazine

Many real-world applications produce networked data such as the world-wide web (hypertext documents connected via hyperlinks), social networks (for example, people connected by friendship links), communication networks (computers connected via communication links) and biological networks (for example, protein interaction networks). A recent focus in machine learning research has been to extend traditional machine learning classification techniques to classify nodes in such networks. In this article, we provide a brief introduction to this area of research and how it has progressed during the past decade. We introduce four of the most widely used inference algorithms for classifying networked data and empirically compare them on both synthetic and real-world data.


AAAI 2000 Workshop Reports

AI Magazine

The AAAI-2000 Workshop Program was held Sunday and Monday, 3031 July 2000 at the Hyatt Regency Austin and the Austin Convention Center in Austin, Texas. The 15 workshops held were (1) Agent-Oriented Information Systems, (2) Artificial Intelligence and Music, (3) Artificial Intelligence and Web Search, (4) Constraints and AI Planning, (5) Integration of AI and OR: Techniques for Combinatorial Optimization, (6) Intelligent Lessons Learned Systems, (7) Knowledge-Based Electronic Markets, (8) Learning from Imbalanced Data Sets, (9) Learning Statistical Models from Rela-tional Data, (10) Leveraging Probability and Uncertainty in Computation, (11) Mobile Robotic Competition and Exhibition, (12) New Research Problems for Machine Learning, (13) Parallel and Distributed Search for Reasoning, (14) Representational Issues for Real-World Planning Systems, and (15) Spatial and Temporal Granularity.


AAAI 2000 Workshop Reports

AI Magazine

The AAAI-2000 Workshop Program was held Sunday and Monday, 3031 July 2000 at the Hyatt Regency Austin and the Austin Convention Center in Austin, Texas. The 15 workshops held were (1) Agent-Oriented Information Systems, (2) Artificial Intelligence and Music, (3) Artificial Intelligence and Web Search, (4) Constraints and AI Planning, (5) Integration of AI and OR: Techniques for Combinatorial Optimization, (6) Intelligent Lessons Learned Systems, (7) Knowledge-Based Electronic Markets, (8) Learning from Imbalanced Data Sets, (9) Learning Statistical Models from Rela-tional Data, (10) Leveraging Probability and Uncertainty in Computation, (11) Mobile Robotic Competition and Exhibition, (12) New Research Problems for Machine Learning, (13) Parallel and Distributed Search for Reasoning, (14) Representational Issues for Real-World Planning Systems, and (15) Spatial and Temporal Granularity.