AITopics

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Neural Information Processing SystemsOct-9-2025, 17:05:18 GMT

Unified Lexical Representation for Interpretable Visual-Language Alignment

Visual-Language Alignment (VLA) has gained a lot of attention since CLIP's groundbreaking work.

arxiv preprint arxiv, lexical representation, lexvla, (12 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Neural Information Processing SystemsAug-7-2025, 00:27:26 GMT

Unified Lexical Representation for Interpretable Visual-Language Alignment

Visual-Language Alignment (VLA) has gained a lot of attention since CLIP's groundbreaking work.

arxiv preprint arxiv, lexical representation, lexvla, (12 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Neural Information Processing SystemsMay-26-2025, 14:51:47 GMT

Unified Lexical Representation for Interpretable Visual-Language Alignment

artificial intelligence, natural language, visual-language alignment, (6 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Ji, Xingyu, Glenn, Parker, Parameswaran, Aditya G., Hulsebos, Madelon

TARGET: Benchmarking Table Retrieval for Generative Tasks

arXiv.org Artificial IntelligenceMay-20-2025

The data landscape is rich with structured data, often of high value to organizations, driving important applications in data analysis and machine learning. Recent progress in representation learning and generative models for such data has led to the development of natural language interfaces to structured data, including those leveraging text-to-SQL. Contextualizing interactions, either through conversational interfaces or agentic components, in structured data through retrieval-augmented generation can provide substantial benefits in the form of freshness, accuracy, and comprehensiveness of answers. The key question is: how do we retrieve the right table(s) for the analytical query or task at hand? To this end, we introduce TARGET: a benchmark for evaluating TAble Retrieval for GEnerative Tasks. With TARGET we analyze the retrieval performance of different retrievers in isolation, as well as their impact on downstream tasks. We find that dense embedding-based retrievers far outperform a BM25 baseline which is less effective than it is for retrieval over unstructured text. We also surface the sensitivity of retrievers across various metadata (e.g., missing table titles), and demonstrate a stark variation of retrieval performance across datasets and tasks. TARGET is available at https://target-benchmark.github.io.

information retrieval, large language model, machine learning, (21 more...)

2505.11545

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Pendleton, Clive, Harrington, Ewan, Fairbrother, Giles, Arkwright, Jasper, Fenwick, Nigel, Katrix, Richard

Probabilistic Lexical Manifold Construction in Large Language Models via Hierarchical Vector Field Interpolation

arXiv.org Artificial IntelligenceFeb-14-2025

Hierarchical vector field interpolation introduces a structured probabilistic framework for lexical representation, ensuring that word embeddings transition smoothly across a continuous manifold rather than being constrained to discrete token mappings. The proposed methodology constructs a probabilistic function space where word representations adhere to topological consistency, mitigating representational discontinuities commonly observed in transformer-based embeddings. Empirical evaluations reveal that probabilistic constraints enhance lexical coherence by refining contextual relationships, leading to improvements in semantic stability across multiple linguistic distributions. The application of divergence minimization techniques ensures that interpolated embeddings maintain probabilistic consistency while preserving computational feasibility for large-scale implementations. Experimental findings demonstrate that interpolated lexical manifolds improve representation density alignment, reducing anisotropic distortions in contextual embedding distributions. Comparative analyses with standard transformer-based models highlight that structured interpolation yields more stable representations, particularly in tasks requiring fine-grained semantic differentiation. The statistical evaluation of embedding divergence confirms that probabilistic lexical manifolds reduce representational inconsistencies while maintaining coherence across varying scales of contextual abstraction. An assessment of computational efficiency reveals that while interpolation introduces minor processing overhead, the structured representation learning approach remains scalable for practical deployment.

large language model, machine learning, natural language, (15 more...)

2502.10013

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Martus, Natasha, Crowther, Sebastian, Dorrington, Maxwell, Applethwaite, Jonathan, Tillinghurst, Edgar, Birkenshaw, Quentin, Petrov, Lukas, Willoughby, Constance

Hierarchical Lexical Manifold Projection in Large Language Models: A Novel Mechanism for Multi-Scale Semantic Representation

arXiv.org Artificial IntelligenceFeb-7-2025

The integration of structured hierarchical embeddings into transformer-based architectures introduces a refined approach to lexical representation, ensuring that multi-scale semantic relationships are preserved without compromising computational efficiency. A projection mechanism that maps tokens onto a structured manifold provides improved lexical alignment, enhancing the adaptability of word representations across diverse linguistic tasks. The structured encoding framework ensures that hierarchical embeddings maintain coherence across varying abstraction levels, allowing for stable transitions between localized syntactic features and global semantic structures. Experimental evaluations indicate that hierarchical embeddings consistently outperform conventional token representations, improving accuracy in linguistic benchmarks while maintaining lower computational overhead. Comparative analysis across multiple domains highlights the ability of hierarchical embeddings to retain contextual consistency, particularly in specialized language applications where structured lexical alignment is essential. Statistical assessments further demonstrate that hierarchical embeddings exhibit enhanced robustness under perturbation conditions, ensuring that linguistic structures remain stable across adversarial text modifications. The integration of hierarchical projections with transformer attention mechanisms enables improved contextual adaptation, ensuring that token representations are dynamically adjusted based on varying linguistic distributions. The refined hierarchical organization of embeddings provides greater interpretability in lexical modeling, facilitating enhanced generalization capabilities across diverse text processing tasks.

large language model, machine learning, natural language, (21 more...)

2502.05395

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJul-25-2024

Unified Lexical Representation for Interpretable Visual-Language Alignment

Li, Yifan, Wang, Yikai, Fu, Yanwei, Ru, Dongyu, Zhang, Zheng, He, Tong

Visual-Language Alignment (VLA) has gained a lot of attention since CLIP's groundbreaking work. Although CLIP performs well, the typical direct latent feature alignment lacks clarity in its representation and similarity scores. On the other hand, lexical representation, a vector whose element represents the similarity between the sample and a word from the vocabulary, is a natural sparse representation and interpretable, providing exact matches for individual words. However, lexical representations is difficult to learn due to no ground-truth supervision and false-discovery issues, and thus requires complex design to train effectively. In this paper, we introduce LexVLA, a more interpretable VLA framework by learning a unified lexical representation for both modalities without complex design. We use DINOv2 as our visual model for its local-inclined features and Llama 2, a generative language model, to leverage its in-context lexical prediction ability. To avoid the false discovery, we propose an overuse penalty to refrain the lexical representation from falsely frequently activating meaningless words. We demonstrate that these two pre-trained uni-modal models can be well-aligned by fine-tuning on modest multi-modal dataset and avoid intricate training configurations. On cross-modal retrieval benchmarks, LexVLA, trained on the CC-12M multi-modal dataset, outperforms baselines fine-tuned on larger datasets (e.g., YFCC15M) and those trained from scratch on even bigger datasets (e.g., 1.1B data, including CC-12M). We conduct extensive experiments to analyze LexVLA.

arxiv preprint arxiv, lexical representation, lexvla, (12 more...)

2407.17827

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Biswas, Biplob, Ramnath, Rajiv

Efficient and Interpretable Information Retrieval for Product Question Answering with Heterogeneous Data

arXiv.org Artificial IntelligenceMay-21-2024

Expansion-enhanced sparse lexical representation improves information retrieval (IR) by minimizing vocabulary mismatch problems during lexical matching. In this paper, we explore the potential of jointly learning dense semantic representation and combining it with the lexical one for ranking candidate information. We present a hybrid information retrieval mechanism that maximizes lexical and semantic matching while minimizing their shortcomings. Our architecture consists of dual hybrid encoders that independently encode queries and information elements. Each encoder jointly learns a dense semantic representation and a sparse lexical representation augmented by a learnable term expansion of the corresponding text through contrastive learning. We demonstrate the efficacy of our model in single-stage ranking of a benchmark product question-answering dataset containing the typical heterogeneous information available on online product pages. Our evaluation demonstrates that our hybrid approach outperforms independently trained retrievers by 10.95% (sparse) and 2.7% (dense) in MRR@5 score. Moreover, our model offers better interpretability and performs comparably to state-of-the-art cross encoders while reducing response time by 30% (latency) and cutting computational load by approximately 38% (FLOPs).

information, lexical representation, representation, (16 more...)

2405.13173

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.91)

Stern, Michael C., Shaw, Jason A.

Neural inhibition during speech planning contributes to contrastive hyperarticulation

arXiv.org Artificial IntelligenceMar-14-2023

Previous work has demonstrated that words are hyperarticulated on dimensions of speech that differentiate them from a minimal pair competitor. This phenomenon has been termed contrastive hyperarticulation (CH). We present a dynamic neural field (DNF) model of voice onset time (VOT) planning that derives CH from an inhibitory influence of the minimal pair competitor during planning. We test some predictions of the model with a novel experiment investigating CH of voiceless stop consonant VOT in pseudowords. The results demonstrate a CH effect in pseudowords, consistent with a basis for the effect in the real-time planning and production of speech. The scope and magnitude of CH in pseudowords was reduced compared to CH in real words, consistent with a role for interactive activation between lexical and phonological levels of planning. We discuss the potential of our model to unify an apparently disparate set of phenomena, from CH to phonological neighborhood effects to phonetic trace effects in speech errors.

artificial intelligence, machine learning, natural language, (17 more...)

2209.12278

Country:

North America > United States > California (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)