cedar
Explainable Distributed Constraint Optimization Problems
Rachmut, Ben, Vasileiou, Stylianos Loukas, Weinstein, Nimrod Meir, Zivan, Roie, Yeoh, William
The Distributed Constraint Optimization Problem (DCOP) formulation is a powerful tool to model cooperative multi-agent problems that need to be solved distributively. A core assumption of existing approaches is that DCOP solutions can be easily understood, accepted, and adopted, which may not hold, as evidenced by the large body of literature on Explainable AI. In this paper, we propose the Explainable DCOP (X-DCOP) model, which extends a DCOP to include its solution and a contrastive query for that solution. We formally define some key properties that contrastive explanations must satisfy for them to be considered as valid solutions to X-DCOPs as well as theoretical results on the existence of such valid explanations. To solve X-DCOPs, we propose a distributed framework as well as several optimizations and suboptimal variants to find valid explanations. We also include a human user study that showed that users, not surprisingly, prefer shorter explanations over longer ones. Our empirical evaluations showed that our approach can scale to large problems, and the different variants provide different options for trading off explanation lengths for smaller runtimes. Thus, our model and algorithmic contributions extend the state of the art by reducing the barrier for users to understand DCOP solutions, facilitating their adoption in more real-world applications.
cedar: Composable and Optimized Machine Learning Input Data Pipelines
Zhao, Mark, Adamiak, Emanuel, Kozyrakis, Christos
The input data pipeline is an essential component of each machine learning (ML) training job. It is responsible for reading massive amounts of training data, processing batches of samples using complex transformations, and loading them onto training nodes at low latency and high throughput. Performant input data systems are becoming increasingly critical, driven by skyrocketing data volumes and training throughput demands. Unfortunately, current input data systems cannot fully leverage key performance optimizations, resulting in hugely inefficient infrastructures that require significant resources -- or worse -- underutilize expensive accelerators. To address these demands, we present cedar, a programming model and framework that allows users to easily build, optimize, and execute input data pipelines. cedar presents an easy-to-use programming interface, allowing users to define input data pipelines using composable operators that support arbitrary ML frameworks and libraries. Meanwhile, cedar transparently applies a complex and extensible set of optimization techniques (e.g., offloading, caching, prefetching, fusion, and reordering). It then orchestrates processing across a customizable set of local and distributed compute resources in order to maximize processing performance and efficiency, all without user input. On average across six diverse input data pipelines, cedar achieves a 2.49x, 1.87x, 2.18x, and 2.74x higher performance compared to tf.data, tf.data service, Ray Data, and PyTorch's DataLoader, respectively.
CEDAR: Communication Efficient Distributed Analysis for Regressions
Chang, Changgee, Bu, Zhiqi, Long, Qi
Electronic health records (EHRs) offer great promises for advancing precision medicine and, at the same time, present significant analytical challenges. Particularly, it is often the case that patient-level data in EHRs cannot be shared across institutions (data sources) due to government regulations and/or institutional policies. As a result, there are growing interests about distributed learning over multiple EHRs databases without sharing patient-level data. To tackle such challenges, we propose a novel communication efficient method that aggregates the local optimal estimates, by turning the problem into a missing data problem. In addition, we propose incorporating posterior samples of remote sites, which can provide partial information on the missing quantities and improve efficiency of parameter estimates while having the differential privacy property and thus reducing the risk of information leaking. The proposed approach, without sharing the raw patient level data, allows for proper statistical inference and can accommodate sparse regressions. We provide theoretical investigation for the asymptotic properties of the proposed method for statistical inference as well as differential privacy, and evaluate its performance in simulations and real data analyses in comparison with several recently developed methods.
About CEDAR
Welcome to the website of the Center of Excellence for Document Analysis and Recognition (CEDAR). A wide variety of documents are encountered by each of us everyday. They cover all spheres of our lives including commerce, education, law, health, religion, music and entertainment. Some of these documents have a simple and predictable structure such as a page in a printed book. Others have much more complex structure such as those involving figures, tables, logos, signatures, handwriting, etc. Discovering methods and algorithms for analyzing the structure and content of complex documents, and their generalization to related domains, is the focus of research at CEDAR.
UB's Srihari Wins Major International Computer Science Award - University at Buffalo
Sargur N. Srihari, director of the University at Buffalo's Center of Excellence in Document Analysis and Recognition (CEDAR) and SUNY Distinguished Professor of Computer Science and Engineering, has won the 2011 International Conference on Document Analysis and Recognition (ICDAR) Outstanding Achievements award. He is being honored with the award for his outstanding and continued contributions to research and education in handwriting recognition and document analysis, and for his service to the community. Srihari recently traveled to Beijing to accept the award and serve as a keynote speaker at the conference, held bi-annually by the International Association for Pattern Recognition. His speech, entitled "Probabilistic Graphical Models in Machine Learning," focused on the design of computer programs that learn and are able to modify their behavior in an environment of constantly changing information. Without machine learning, many computers that deal with rapidly changing data would require constant reprogramming.
Machine learning platform minimized Brexit fallout for investors
The U.K.'s Brexit vote was something few prognosticators saw coming prior to the June 23 referendum. But once the results were in, it was clear the vote to leave the European Union would have a major impact on financial markets. The pound sterling fell in value by 11% two days after the vote, and both the Dow Jones Industrial Average and the London Stock Exchange's FTSE 100 index lost more than 2% of their total value. This left millions of traders all over the world scrambling to find safer investment positions. But at least one group of investors was relatively calm, according to Omer Cedar, CEO of Omega Point Research Inc., a New York-based software company that sells analytics tools to help investment managers review their portfolios for risks.