Bridgeport
Neural Attention: A Novel Mechanism for Enhanced Expressive Power in Transformer Models
DiGiugno, Andrew, Mahmood, Ausif
Transformer models typically calculate attention matrices using dot products, which have limitations when capturing nonlinear relationships between embedding vectors. We propose Neural Attention, a technique that replaces dot products with feed-forward networks, enabling a more expressive representation of relationships between tokens. This approach modifies only the attention matrix calculation while preserving the matrix dimensions, making it easily adaptable to existing transformer-based architectures. We provide a detailed mathematical justification for why Neural Attention increases representational capacity and conduct controlled experiments to validate this claim. When comparing Neural Attention and Dot-Product Attention, NLP experiments on WikiText-103 show a reduction in perplexity of over 5 percent. Similarly, experiments on CIFAR-10 and CIFAR-100 show comparable improvements for image classification tasks. While Neural Attention introduces higher computational demands, we develop techniques to mitigate these challenges, ensuring practical usability without sacrificing the increased expressivity it provides. This work establishes Neural Attention as an effective means of enhancing the predictive capabilities of transformer models across a variety of applications.
Evaluating Self-Generated Documents for Enhancing Retrieval-Augmented Generation with Large Language Models
Li, Jiatao, Hu, Xinyu, Yin, Xunjian, Wan, Xiaojun
The integration of documents generated by LLMs themselves (Self-Docs) alongside retrieved documents has emerged as a promising strategy for retrieval-augmented generation systems. However, previous research primarily focuses on optimizing the use of Self-Docs, with their inherent properties remaining underexplored. To bridge this gap, we first investigate the overall effectiveness of Self-Docs, identifying key factors that shape their contribution to RAG performance (RQ1). Building on these insights, we develop a taxonomy grounded in Systemic Functional Linguistics to compare the influence of various Self-Docs categories (RQ2) and explore strategies for combining them with external sources (RQ3). Our findings reveal which types of Self-Docs are most beneficial and offer practical guidelines for leveraging them to achieve significant improvements in knowledge-intensive question answering tasks.
xLSTMTime : Long-term Time Series Forecasting With xLSTM
Alharthi, Musleh, Mahmood, Ausif
In recent years, transformer-based models have gained prominence in multivariate long-term time series forecasting (LTSF), demonstrating significant advancements despite facing challenges such as high computational demands, difficulty in capturing temporal dynamics, and managing long-term dependencies. The emergence of LTSF-Linear, with its straightforward linear architecture, has notably outperformed transformer-based counterparts, prompting a reevaluation of the transformer's utility in time series forecasting. In response, this paper presents an adaptation of a recent architecture termed extended LSTM (xLSTM) for LTSF. xLSTM incorporates exponential gating and a revised memory structure with higher capacity that has good potential for LTSF. Our adopted architecture for LTSF termed as xLSTMTime surpasses current approaches. We compare xLSTMTime's performance against various state-of-the-art models across multiple real-world da-tasets, demonstrating superior forecasting capabilities. Our findings suggest that refined recurrent architectures can offer competitive alternatives to transformer-based models in LTSF tasks, po-tentially redefining the landscape of time series forecasting.
TopicGPT: A Prompt-based Topic Modeling Framework
Pham, Chau Minh, Hoyle, Alexander, Sun, Simeng, Iyyer, Mohit
Topic modeling is a well-established technique for exploring text corpora. Conventional topic models (e.g., LDA) represent topics as bags of words that often require "reading the tea leaves" to interpret; additionally, they offer users minimal semantic control over topics. To tackle these issues, we introduce TopicGPT, a prompt-based framework that uses large language models (LLMs) to uncover latent topics within a provided text collection. TopicGPT produces topics that align better with human categorizations compared to competing methods: for example, it achieves a harmonic mean purity of 0.74 against human-annotated Wikipedia topics compared to 0.64 for the strongest baseline. Its topics are also more interpretable, dispensing with ambiguous bags of words in favor of topics with natural language labels and associated free-form descriptions. Moreover, the framework is highly adaptable, allowing users to specify constraints and modify topics without the need for model retraining. TopicGPT can be further extended to hierarchical topical modeling, enabling users to explore topics at various levels of granularity. By streamlining access to high-quality and interpretable topics, TopicGPT represents a compelling, human-centered approach to topic modeling.
Can ChatGPT Plan Your Vacation?
Powerful new artificial-intelligence software is already shaking up the travel industry, but it has a long way to go until it can plan a seamless trip. I want to hit a history museum and an amusement park -- and then I'd like 7 p.m. dinner reservations near the hotel at a restaurant with vegan options and a great wine list." But for now, travelers using ChatGPT -- the powerful new A.I. software that is already offering creative cocktail recipes and writing college papers -- may have to temper their expectations. Oded Battat, the general manager at Traveland, a travel agency in Bridgeport, Conn., asked ChatGPT for outings he might offer his clients going to Tuscany to see if it could help him with his work. He got a list of 14 activities, including winery tours and museum visits, with a stop for gelato in the town square of the medieval hill town San Gimignano.
How ChatGPT and Generative AI Could Change the Way We Travel - The New York Times
I want to hit a history museum and an amusement park -- and then I'd like 7 p.m. dinner reservations near the hotel at a restaurant with vegan options and a great wine list." But for now, travelers using ChatGPT -- the powerful new A.I. software that is already offering creative cocktail recipes and writing college papers -- may have to temper their expectations. Oded Battat, the general manager at Traveland, a travel agency in Bridgeport, Conn., asked ChatGPT for outings he might offer his clients going to Tuscany to see if it could help him with his work. He got a list of 14 activities, including winery tours and museum visits, with a stop for gelato in the town square of the medieval hill town San Gimignano. "I knew of all these things," Mr. Battat said, but, he added, ChatGPT saved him the hassle of collecting all the information and delivered it in a format he was able to email to one of the clients.
BudgetLongformer: Can we Cheaply Pretrain a SotA Legal Language Model From Scratch?
Niklaus, Joel, Giofrรฉ, Daniele
Pretrained transformer models have achieved state-of-the-art results in many tasks and benchmarks recently. Many state-of-the-art Language Models (LMs), however, do not scale well above the threshold of 512 input tokens. In specialized domains though (such as legal, scientific or biomedical), models often need to process very long text (sometimes well above 10000 tokens). Even though many efficient transformers have been proposed (such as Longformer, BigBird or FNet), so far, only very few such efficient models are available for specialized domains. Additionally, since the pretraining process is extremely costly in general - but even more so as the sequence length increases - it is often only in reach of large research labs. One way of making pretraining cheaper is the Replaced Token Detection (RTD) task, by providing more signal during training, since the loss can be computed over all tokens. In this work, we train Longformer models with the efficient RTD task on legal data to showcase that pretraining efficient LMs is possible using much less compute. We evaluate the trained models on challenging summarization tasks requiring the model to summarize long texts to show to what extent the models can achieve good performance on downstream tasks. We find that both the small and base models outperform their baselines on the in-domain BillSum and out-of-domain PubMed tasks in their respective parameter range. We publish our code and models for research purposes.
Lawyers of the world: Robots aren't replacing you--yet
ArtificiaI intelligence (AI) may soon render many jobs obsolete. Remember how popular one-hour photo shops were in the 1980s and into the mid-1990s? That's just the tip of the tech iceberg, as AI now seems to be gunning to take over the legal world. The UK-based Law Society noted in a study earlier this year: "Over the longer term, the number of jobs in the legal services sector will be increasingly affected by automation of legal services functions. This could mean that by 2038 total employment in the sector could be 20% less than it would otherwise have been, with a loss of 78,000 jobs -- equal to 67,000 full-time equivalent jobs -- compared to if productivity growth continued at its current rate."
California Inc.: Eclipse day is here, but be careful of some safety glasses
Welcome to California Inc., the weekly newsletter of the L.A. Times Business Section. Stocks took a pounding last week as the political turbulence in Washington and terror attacks in Spain caught up with the market. But closer to home employers statewide increased their payrolls by 82,600 jobs in July. Sectors that saw the most employment gains include government, which added 18,800 jobs; educational and health services, which saw an increase of 18,600; and leisure and hospitality, which was up 15,200 jobs. Dark day: The long-awaited solar eclipse sweeps across America on Monday.