Goto

Collaborating Authors

Results


Generating Representative Headlines for News Stories

arXiv.org Artificial Intelligence

Millions of news articles are published online every day, which can be overwhelming for readers to follow. Grouping articles that are reporting the same event into news stories is a common way of assisting readers in their news consumption. However, it remains a challenging research problem to efficiently and effectively generate a representative headline for each story. Automatic summarization of a document set has been studied for decades, while few studies have focused on generating representative headlines for a set of articles. Unlike summaries, which aim to capture most information with least redundancy, headlines aim to capture information jointly shared by the story articles in short length, and exclude information that is too specific to each individual article. In this work, we study the problem of generating representative headlines for news stories. We develop a distant supervision approach to train large-scale generation models without any human annotation. This approach centers on two technical components. First, we propose a multi-level pre-training framework that incorporates massive unlabeled corpus with different quality-vs.-quantity balance at different levels. We show that models trained within this framework outperform those trained with pure human curated corpus. Second, we propose a novel self-voting-based article attention layer to extract salient information shared by multiple articles. We show that models that incorporate this layer are robust to potential noises in news stories and outperform existing baselines with or without noises. We can further enhance our model by incorporating human labels, and we show our distant supervision approach significantly reduces the demand on labeled data.


Chinese court rules AI-written article is protected by copyright

#artificialintelligence

A court in Shenzhen, China, has ruled that an article generated by artificial intelligence (AI) is protected by copyright, according to state news outlet China News Service, representing a notable milestone for AI's credentials as a creative force. For the past five years Chinese tech titan Tencent has published content produced by automated software called Dreamwriter, with a focus on business and financial stories. In 2018, an online platform operated by a company called Shanghai Yingxun Technology Company replicated an AI-generated financial report from Tencent on its own website. The article included a disclaimer that said it was "automatically written by Tencent Robot Dreamwriter"; however, the court found that the article's articulation and expression had a "certain originality" and met the legal requirements to be classed as a written work -- thus it qualified for copyright protection. While the defendant had already removed the article from its own website, it was still required to pay a fine of 1,500 yuan ($217).


On Understanding Knowledge Graph Representation

arXiv.org Machine Learning

Many methods have been developed to represent knowledge graph data, which implicitly exploit low-rank latent structure in the data to encode known information and enable unknown facts to be inferred. To predict whether a relationship holds between entities, their embeddings are typically compared in the latent space following a relation-specific mapping. Whilst link prediction has steadily improved, the latent structure, and hence why such models capture semantic information, remains unexplained. We build on recent theoretical interpretation of word embeddings as a basis to consider an explicit structure for representations of relations between entities. For identifiable relation types, we are able to predict properties and justify the relative performance of leading knowledge graph representation methods, including their often overlooked ability to make independent predictions.


Topic Modeling with Wasserstein Autoencoders

arXiv.org Artificial Intelligence

We propose a novel neural topic model in the Wasserstein autoencoders (W AE) framework. Unlike existing variational autoencoder based models, we directly enforce Dirichlet prior on the latent document-topic vectors. We exploit the structure of the latent space and apply a suitable kernel in minimizing the Maximum Mean Discrepancy (MMD) to perform distribution matching. We discover that MMD performs much better than the Generative Adversarial Network (GAN) in matching high dimensional Dirichlet distribution. We further discover that incorporating randomness in the encoder output during training leads to significantly more coherent topics. To measure the diversity of the produced topics, we propose a simple topic uniqueness metric. Together with the widely used coherence measure NPMI, we offer a more wholistic evaluation of topic quality. Experiments on several real datasets show that our model produces significantly better topics than existing topic models.


Neural Consciousness Flow

arXiv.org Artificial Intelligence

The ability of reasoning beyond data fitting is substantial to deep learning systems in order to make a leap forward towards artificial general intelligence. A lot of efforts have been made to model neural-based reasoning as an iterative decision-making process based on recurrent networks and reinforcement learning. Instead, inspired by the consciousness prior proposed by Yoshua Bengio, we explore reasoning with the notion of attentive awareness from a cognitive perspective, and formulate it in the form of attentive message passing on graphs, called neural consciousness flow (NeuCFlow). Aiming to bridge the gap between deep learning systems and reasoning, we propose an attentive computation framework with a three-layer architecture, which consists of an unconsciousness flow layer, a consciousness flow layer, and an attention flow layer. We implement the NeuCFlow model with graph neural networks (GNNs) and conditional transition matrices. Our attentive computation greatly reduces the complexity of vanilla GNN-based methods, capable of running on large-scale graphs. We validate our model for knowledge graph reasoning by solving a series of knowledge base completion (KBC) tasks. The experimental results show NeuCFlow significantly outperforms previous state-of-the-art KBC methods, including the embedding-based and the path-based. The reproducible code can be found by the link below.


Feature and TV films

Los Angeles Times

Mr. Smith Goes to Washington 1939 TCM Tue. 7 p.m. Mean Streets 1973 Cinemax Sun. 6 a.m. Batman Begins 2005 AMC Sun. Throw Momma From the Train 1987 EPIX Sun. Die Hard 1988 IFC Sun. I Know What You Did Last Summer 1997 Starz Tue. Gone in 60 Seconds 2000 CMT Wed. 8 p.m., Thur. Total Recall 1990 Encore Thur. 2 a.m. A Fish Called Wanda 1988 Encore Thur. 2 p.m., 9 p.m. The World Is Not Enough 1999 EPIX Sat. 4 p.m. Look Who's Talking 1989 OVA Sun. Die Hard With a Vengeance 1995 IFC Thur. Oil-platform workers, including an estranged couple, and a Navy SEAL make a startling deep-sea discovery. A clueless politician falls in love with a waitress whose erratic behavior is caused by a nail stuck in her head. After glimpsing his future, an ambitious politician battles the agents of Fate itself to be with the woman he loves. To help a friend, a suburban baby sitter drives into downtown Chicago with her two charges and a neighbor. Two teenage baby sitters and a group of children spend a wild night ...


Prediction by the Numbers -- NOVA PBS

@machinelearnbot

NARRATOR: The future unfolds before our eyes, but is it always beyond our grasp? What was once the province of the gods has now come more clearly into view, through mathematics and data. Out of some early observations about gambling, arose tools that guide our scientific understanding of the world and more, through the power of prediction. BOATSWAIN'S MATE 1 LUKE SCHAFFER (United States Coast Guard): Keep a good look out. NARRATOR: …every day mathematics and data combine to help us envision what might be. LIBERTY VITTERT (University of Glasgow): It's the best crystal ball that humankind can have. NARRATOR: Take a trip on the wings of probability, into the future. MONA CHALABI (The Guardian, United States Edition): We are thinking about luck or misfortune, but they just, basically, are a question of math, right? The Orange County Fair, held in Southern California: in theory, these crowds hold a predictive power that can have startling accuracy, but it doesn't belong to any individual, only the group. And even then, it has to be viewed through the lens of mathematics. The theory is known as the "wisdom of crowds," a phenomenon first documented about a hundred years ago. Statistician Talithia Williams is here to see if the theory checks out and to spend some time with the Fair's most beloved animal, Patches, a 14-year-old ox. TALITHIA WILLIAMS (Harvey Mudd College): It was a fair, kind of like this one, where, in 1906, Sir Francis Galton came across a contest where you had to guess the weight of an ox, like Patches, you see here behind me. NARRATOR: After the ox weight-guessing contest was over, Galton took all the entries home and analyzed them statistically. To his surprise, while none of the individual guesses were correct, the average of all the guesses was off by less than one percent. But is it still true? TALITHIA WILLIAMS: So, here's how I think we can test that today. What if we ask a random sample of people, here at the fair, if they can guess how many jellybeans they think are in the jar, and then we take those numbers and average them and see if that's actually close to the true number of jellybeans?


5 important stories that have (almost) nothing to do with politics

PBS NewsHour

Atlanta Braves coaches and players wearing the No. 42 in honor of Jackie Robinson stand during the national anthem before a game against the San Diego Padres at SunTrust Park. If you ask the media, who took the informal marker of the presidency as an opportunity to dive into Donald Trump's early record in office, it was a lot of talk and international outreach, but not much movement on the domestic issues -- like healthcare and tax reform -- that made him popular as a candidate. If you ask budget chief Mick Mulvaney, as NewsHour's Judy Woodruff did on air last week, the first hundred days was spent undoing damage from the previous administration. As for the chief: The presidency is harder than he thought, he told Reuters. No matter how you feel about the administration's first three-and-a-half months in the Oval, here are five important stories overlooked in the 100-day fanfare that are still worth your attention.


Digital Commerce Success in 2017 - IBM Commerce

#artificialintelligence

For many, the holiday season is a time of reflection, both from a personal and professional standpoint. I won't get too deep into politics and pop culture, but can't reflect on 2016 without thinking about the 2016 US presidential election and the Chicago Cub's finally winning the world series after 108 years. In my professional life I recall the challenges faced and overcome, the triumphs and even missed opportunities. Hopefully there were more triumphs than missed opportunities in your professional life, but reflecting on both can help prepare and potentially create your own opportunities in the new year – particularly if you work in digital commerce. If you're a digital commerce professional, either an online retailer or a B2B seller, last year saw a few milestones that are sure to impact your business in 2017 and beyond.


Three Dimensions of Design Development

AAAI Conferences

Formal specifications are difficult to understand for a number of reasons. When the developer of a large specification explains it to another person, he typically includes mformatlon in his explanation that is is not present, even Implicitly.