Goto

Collaborating Authors

 sequential order





dataset release, tournament evaluation, architectural design, input representation, and other insights

Neural Information Processing Systems

We want to thank the reviewers for their helpful comments. The dataset will be made available to any interested researchers. We agree with R3 that there are a lot of non-trivial modeling choices in our architecture. We call the first one unit-based and the latter token-based. We apologize for writing some of the claims without referring to the evidence, like "orders from the last movement Our input representation is a result of both empirical findings and domain knowledge.


Reviews: XLNet: Generalized Autoregressive Pretraining for Language Understanding

Neural Information Processing Systems

Originality: The architecture is novel compare to recent lines of language model work, which all used variation of BERT or GPT (SciBERT, MT-DNN, MASS and etc). The example ("New York is a city" one) makes sense, but considering the permutation is random when computing the objective function, I still couldn't get why it works better than sequential order because human speaks/writes in sequential order. Could you add more intuitions in paper? Or have you tried predicting n-gram, compare to permutation? Quality: Very high considering they did extensive of studies on multiple benchmarks, also the ablation study is nicely done as well.


Next-Token Prediction Task Assumes Optimal Data Ordering for LLM Training in Proof Generation

An, Chenyang, Imani, Shima, Yao, Feng, Dong, Chengyu, Abbasi, Ali, Shrivastava, Harsh, Buss, Samuel, Shang, Jingbo, Mahalingam, Gayathri, Sharma, Pramod, Diesendruck, Maurice

arXiv.org Artificial Intelligence

In the field of large language model (LLM)-based proof generation, despite being trained on extensive corpora such as OpenWebMath and Arxiv, these models still exhibit only modest performance on proving tasks of moderate difficulty. We believe that this is partly due to the suboptimal order of each proof data used in training. Published proofs often follow a purely logical order, where each step logically proceeds from the previous steps based on the deductive rules. However, this order aims to facilitate the verification of the proof's soundness, rather than to help people and models learn the discovery process of the proof. In proof generation, we argue that the optimal order for one training data sample occurs when the relevant intermediate supervision for a particular proof step in the proof is always positioned to the left of that proof step. We call such order the intuitively sequential order. We validate our claims using two tasks: intuitionistic propositional logic theorem-proving and digit multiplication. Our experiments verify the order effect and provide support for our explanations. We demonstrate that training is most effective when the proof is in the intuitively sequential order. Moreover, the order effect and the performance gap between models trained on different data orders are substantial -- with an 11 percent improvement in proof success rate observed in the propositional logic theorem-proving task, between models trained on the optimal order compared to the worst order.


How AI Localization Differs from Traditional Localization

#artificialintelligence

Localizing content delivers strong business benefits. According to white paper released by Pactera EDGE and Nimdzi Insights, companies that localize the user experience see a 100%–400% increase in sales, and by localizing into just 10 languages, a brand's message will effectively reach 90% of online customers. As brands appreciate the business benefits of localization, they are increasingly turning to artificial intelligence to make localization more effective. This is true especially for large, complex, multinational businesses that need to adapt multiple products and services across hundreds of geographic markets and cultures. In fact, we believe AI can unlock hyperlocal and hyper-personalized experiences that are culturally aware, as my colleague Ilia Shifrin blogged recently.


Coordinating complex behaviors between hundreds of robots: A new approach to designing motion plans for multiple robots grows

#artificialintelligence

In a building several stories tall with numerous rooms, hundreds of obstacles and thousands of places to inspect, the several dozen robots move as one cohesive unit. They spread out in a search pattern to thoroughly check the entire building while simultaneously splitting tasks so as to not waste time doubling back on their own paths or re-checking places other robots have already visited. Such cohesion would be difficult for human controllers to achieve, let alone for an artificial controller to compute in real-time. "If a control problem has three or four robots that live in a world with only a handful of rooms, and if the collaborative task is specified by simple logic rules, there are state-of-the-art tools that can compute an optimal solution that satisfies the task in a reasonable amount of time," said Michael M. Zavlanos, the Mary Milus Yoh and Harold L. Yoh, Jr. Associate Professor of Mechanical Engineering and Materials Science at Duke University. "And if you don't care about the best solution possible, you can solve for a few more rooms and more complex tasks in a matter of minutes, but still only a dozen robots tops," Zavlanos said.


Learning Hierarchical Discourse-level Structure for Fake News Detection

Karimi, Hamid, Tang, Jiliang

arXiv.org Machine Learning

On the one hand, nowadays, fake news articles are easily propagated through various online media platforms and have become a grand threat to the trustworthiness of information. On the other hand, our understanding of the language of fake news is still minimal. Incorporating hierarchical discourse-level structure of fake and real news articles is one crucial step toward a better understanding of how these articles are structured. Nevertheless, this has rarely been investigated in the fake news detection domain and faces tremendous challenges. First, existing methods for capturing discourse-level structure rely on annotated corpora which are not available for fake news datasets. Second, how to extract out useful information from such discovered structures is another challenge. To address these challenges, we propose Hierarchical Discourse-level Structure for Fake news detection. HDSF learns and constructs a discourse-level structure for fake/real news articles in an automated and data-driven manner. Moreover, we identify insightful structure-related properties, which can explain the discovered structures and boost our understating of fake news. Conducted experiments show the effectiveness of the proposed approach. Further structural analysis suggests that real and fake news present substantial differences in the hierarchical discourse-level structures.


A^2-Net: Molecular Structure Estimation from Cryo-EM Density Volumes

Xu, Kui, Wang, Zhe, Shi, Jiangping, Li, Hongsheng, Zhang, Qiangfeng Cliff

arXiv.org Machine Learning

Constructing of molecular structural models from Cryo-Electron Microscopy (Cryo-EM) density volumes is the critical last step of structure determination by Cryo-EM technologies. Methods have evolved from manual construction by structural biologists to perform 6D translation-rotation searching, which is extremely compute-intensive. In this paper, we propose a learning-based method and formulate this problem as a vision-inspired 3D detection and pose estimation task. We develop a deep learning framework for amino acid determination in a 3D Cryo-EM density volume. We also design a sequence-guided Monte Carlo Tree Search (MCTS) to thread over the candidate amino acids to form the molecular structure. This framework achieves 91% coverage on our newly proposed dataset and takes only a few minutes for a typical structure with a thousand amino acids. Our method is hundreds of times faster and several times more accurate than existing automated solutions without any human intervention.