Goto

Collaborating Authors

 Commonsense Reasoning


Some things to know about achieving artificial general intelligence

arXiv.org Artificial Intelligence

Current and foreseeable GenAI models are not capable of achieving artificial general intelligence because they are burdened with anthropogenic debt. They depend heavily on human input to provide well-structured problems, architecture, and training data. They cast every problem as a language pattern learning problem and are thus not capable of the kind of autonomy needed to achieve artificial general intelligence. Current models succeed at their tasks because people solve most of the problems to which these models are directed, leaving only simple computations for the model to perform, such as gradient descent. Another barrier is the need to recognize that there are multiple kinds of problems, some of which cannot be solved by available computational methods (for example, "insight problems"). Current methods for evaluating models (benchmarks and tests) are not adequate to identify the generality of the solutions, because it is impossible to infer the means by which a problem was solved from the fact of its solution. A test could be passed, for example, by a test-specific or a test-general method. It is a logical fallacy (affirming the consequent) to infer a method of solution from the observation of success.


The Cake that is Intelligence and Who Gets to Bake it: An AI Analogy and its Implications for Participation

arXiv.org Artificial Intelligence

In a widely popular analogy by Turing Award Laureate Yann LeCun, machine intelligence has been compared to cake --where unsupervised learning forms the base, supervised learning adds the icing, and reinforcement learning is the cherry on top. We expand this "cake that is intelligence" analogy from a simple structural metaphor to the full life-cycle of AI systems, extending it to sourcing of ingredients (data), conception of recipes (instructions), the baking process (training), and the tasting and selling of the cake (evaluation and distribution). Leveraging our re-conceptualization, we describe each step's entailed social ramifications and how they are bounded by statistical assumptions within machine learning. Whereas these technical foundations and social impacts are deeply intertwined, they are often studied in isolation, creating barriers that restrict meaningful participation. Our re-conceptualization paves the way to bridge this gap by mapping where technical foundations interact with social outcomes, highlighting opportunities for cross-disciplinary dialogue. Finally, we conclude with actionable recommendations at each stage of the metaphorical AI cake's life-cycle, empowering prospective AI practitioners, users, and researchers, with increased awareness and ability to engage in broader AI discourse.


ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization

arXiv.org Artificial Intelligence

The optimal bit-width for achieving the best trade-off between quantized model size and accuracy has been a subject of ongoing debate. While some advocate for 4-bit quantization, others propose that 1.58-bit offers superior results. However, the lack of a cohesive framework for different bits has left such conclusions relatively tenuous. We present ParetoQ, the first unified framework that facilitates rigorous comparisons across 1-bit, 1.58-bit, 2-bit, 3-bit, and 4-bit quantization settings. Our findings reveal a notable learning transition between 2 and 3 bits: For 3-bits and above, the fine-tuned models stay close to their original pre-trained distributions, whereas for learning 2-bit networks or below, the representations change drastically. By optimizing training schemes and refining quantization functions, ParetoQ surpasses all previous methods tailored to specific bit widths. Remarkably, our ParetoQ ternary 600M-parameter model even outperforms the previous SoTA ternary 3B-parameter model in accuracy, using only one-fifth of the parameters. Extensive experimentation shows that ternary, 2-bit, and 3-bit quantization maintains comparable performance in the size-accuracy trade-off and generally exceeds 4-bit and binary quantization. Considering hardware constraints, 2-bit quantization offers promising potential for memory reduction and speedup.


Reviews: Heterogeneous Graph Learning for Visual Commonsense Reasoning

Neural Information Processing Systems

Originality: The VCR task is a novel task (proposed by Zellers et al, CVPR19). The proposed HGL framework for this interesting task is novel and interesting. The paper applies the HGL framework on top of the baseline model (R2C from Zellers et al., CVPR19) and shows significant improvements. The paper compares other existing graph learning approaches. The main difference between the proposed approach and other graph learning approaches is the heterogeneous nature (across domains โ€“ vision and language) of the graph learning framework. Quality: The paper does a good job of evaluating the propsed approach and its ablations.


Reviews: Heterogeneous Graph Learning for Visual Commonsense Reasoning

Neural Information Processing Systems

After considering the author response and discussing this submission, all reviewers recommend acceptance -- including two high ratings. The reviewers generally found the approach novel but were interested in how it applies outside of the VCR task to other question answering datasets. With the addition of the experiments from the rebuttal, this is a strong submission.


Reviews: Connective Cognition Network for Directional Visual Commonsense Reasoning

Neural Information Processing Systems

Originality: The paper proposes a novel model for the recently introduced VCR task. The main novelty of the proposed model lies in the component GraphVLAD and directional GCN modules. The paper describes that one of the closest works to this work is that of Narsimhan et al., NeurIPS 2018 that used GCN to infer answers in VQA, however that work constructs an undirected graph, ignoring the directional information between the graph nodes. This paper uses directed graph instead and shows the usefulness of incorporating directional information. It would be good for this paper to include more related work on GraphVLAD front. Quality: The paper evaluates the proposed approach on the VCR dataset and compares with the baselines and previous state-of-the-art, demonstrating how the proposed work improves the previous best performance significantly.


Reviews: Connective Cognition Network for Directional Visual Commonsense Reasoning

Neural Information Processing Systems

After considering the author response and discussing the submission, all reviewers voted to accept. Most reviewers shared a concern that the neurological connection is relatively weak and encourage authors to discuss it in looser terms like "inspired by". Reviewers generally found the paper's claims well established.


Figurative-cum-Commonsense Knowledge Infusion for Multimodal Mental Health Meme Classification

arXiv.org Artificial Intelligence

The expression of mental health symptoms through non-traditional means, such as memes, has gained remarkable attention over the past few years, with users often highlighting their mental health struggles through figurative intricacies within memes. While humans rely on commonsense knowledge to interpret these complex expressions, current Multimodal Language Models (MLMs) struggle to capture these figurative aspects inherent in memes. To address this gap, we introduce a novel dataset, AxiOM, derived from the GAD anxiety questionnaire, which categorizes memes into six fine-grained anxiety symptoms. Next, we propose a commonsense and domain-enriched framework, M3H, to enhance MLMs' ability to interpret figurative language and commonsense knowledge. The overarching goal remains to first understand and then classify the mental health symptoms expressed in memes. We benchmark M3H against 6 competitive baselines (with 20 variations), demonstrating improvements in both quantitative and qualitative metrics, including a detailed human evaluation. We observe a clear improvement of 4.20% and 4.66% on weighted-F1 metric. To assess the generalizability, we perform extensive experiments on a public dataset, RESTORE, for depressive symptom identification, presenting an extensive ablation study that highlights the contribution of each module in both datasets. Our findings reveal limitations in existing models and the advantage of employing commonsense to enhance figurative understanding.


SituatedGen: Incorporating Geographical and Temporal Contexts into Generative Commonsense Reasoning

Neural Information Processing Systems

Recently, commonsense reasoning in text generation has attracted much attention. Generative commonsense reasoning is the task that requires machines, given a group of keywords, to compose a single coherent sentence with commonsense plausibility. While existing datasets targeting generative commonsense reasoning focus on everyday scenarios, it is unclear how well machines reason under specific geographical and temporal contexts. We formalize this challenging task as SituatedGen, where machines with commonsense should generate a pair of contrastive sentences given a group of keywords including geographical or temporal entities. We introduce a corresponding English dataset consisting of 8,268 contrastive sentence pairs, which are built upon several existing commonsense reasoning benchmarks with minimal manual labor.


SalKG: Learning From Knowledge Graph Explanations for Commonsense Reasoning

Neural Information Processing Systems

Augmenting pre-trained language models with knowledge graphs (KGs) has achieved success on various commonsense reasoning tasks. However, for a given task instance, the KG, or certain parts of the KG, may not be useful. Although KG-augmented models often use attention to focus on specific KG components, the KG is still always used, and the attention mechanism is never explicitly taught which KG components should be used. Meanwhile, saliency methods can measure how much a KG feature (e.g., graph, node, path) influences the model to make the correct prediction, thus explaining which KG features are useful. This paper explores how saliency explanations can be used to improve KG-augmented models' performance.