Large Language Model
Large Language Models for Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering
As the field of automated machine learning (AutoML) advances, it becomes increasingly important to incorporate domain knowledge into these systems. We present an approach for doing so by harnessing the power of large language models (LLMs). Specifically, we introduce Context-Aware Automated Feature Engineering (CAAFE), a feature engineering method for tabular datasets that utilizes an LLM to iteratively generate additional semantically meaningful features for tabular datasets based on the description of the dataset. The method produces both Python code for creating new features and explanations for the utility of the generated features. Despite being methodologically simple, CAAFE improves performance on 11 out of 14 datasets - boosting mean ROCAUC performance from 0.798 to 0.822 across all dataset - similar to the improvement achieved by using a random forest instead of logistic regression on our datasets. Furthermore, CAAFE is interpretable by providing a textual explanation for each generated feature. CAAFE paves the way for more extensive semi-automation in data science tasks and emphasizes the significance of context-aware solutions that can extend the scope of AutoML systems to semantic AutoML. We release our code, a simple demo and a python package.
Sam Altman and Elon Musk Sure Dislike Each Other
The trial between the CEOs makes the AI boom seem sordid and small. Elon Musk and Sam Altman are two of the most influential people in Silicon Valley, if not the world. Between the two of them, Musk and Altman run technology companies worth many trillions of dollars that promise to reshape civilization. But this morning, both sat under fluorescent lights in a courthouse in downtown Oakland, suffering through all manner of technical glitches as their respective attorneys kicked off the long-awaited trial in . As Steven Molo, a lawyer for Musk, began his opening argument, confused looks swept the courtroom.
Self-Retrieval: End-to-End Information Retrieval with One Large Language Model
The rise of large language models (LLMs) has significantly transformed both the construction and application of information retrieval (IR) systems. However, current interactions between IR systems and LLMs remain limited, with LLMs merely serving as part of components within IR systems, and IR systems being constructed independently of LLMs. This separated architecture restricts knowledge sharing and deep collaboration between them.In this paper, we introduce Self-Retrieval, a novel end-to-end LLM-driven information retrieval architecture.Self-Retrieval unifies all essential IR functions within a single LLM, leveraging the inherent capabilities of LLMs throughout the IR process.Specifically, Self-Retrieval internalizes the retrieval corpus through self-supervised learning, transforms the retrieval process into sequential passage generation, and performs relevance assessment for reranking.Experimental results demonstrate that Self-Retrieval not only outperforms existing retrieval approaches by a significant margin, but also substantially enhances the performance of LLM-driven downstream applications like retrieval-augmented generation.
Musk testifies at OpenAI trial it's not OK to 'loot a charity'
Musk testifies at OpenAI trial it's not OK to'loot a charity' Elon Musk has taken the stand at a high-stakes trial over the future of OpenAI, casting his lawsuit against the ChatGPT maker as a defence of charitable giving. The world's richest person is suing OpenAI, its cofounder and chief executive officer, Sam Altman, and its president, Greg Brockman, and said on the stand on Tuesday that they betrayed him and the public by abandoning OpenAI's mission to be a benevolent steward of AI for humanity and transforming the nonprofit into a profit-seeking juggernaut. Musk, who founded carmaker Tesla and rocket company SpaceX, also said he is committed to serving the public by working 80-to 100-hour weeks and generally not taking vacations. "I like working and solving problems that make people's lives better," he said. Before Musk began testifying, Bill Savitt, a lawyer for OpenAI and Altman, told jurors during his opening statement it was Musk who saw dollar signs as he helped finance OpenAI's early growth and pushed it to become a for-profit business, one he might eventually lead as CEO.
HyenaDNA Long Range Sequence Modeling at Single Nucleotide Resolution
Similar to natural language models, researchers have proposed foundation models in genomics to learn generalizable features from unlabeled genome data that can then be fine-tuned for downstream tasks such as identifying regulatory elements. Due to the quadratic scaling of attention, previous Transformer-based genomic models have used 512 to 4k tokens as context (<0.001% of the human genome), significantly limiting the modeling of long-range interactions in DNA. In addition, these methods rely on tokenizers or fixed k-mers to aggregate meaningful DNA units, losing single nucleotide resolution (i.e. DNA "characters") where subtle genetic variations can completely alter protein function via single nucleotide polymorphisms (SNPs). Recently, Hyena, a large language model based on implicit convolutions was shown to match attention in quality while allowing longer context lengths and lower time complexity.
Elon Musk Testifies That He Started OpenAI to Prevent a 'Terminator Outcome'
Elon Musk Testifies That He Started OpenAI to Prevent a'Terminator Outcome' The judge also warned Musk and Sam Altman to curb their "propensity to use social media to make things worse outside the courtroom" after both sides traded attacks online. Elon Musk and Sam Altman appeared in a federal courtroom together for the first time on Tuesday as they fight over OpenAI's decade-long evolution and what it means for the company's future. The trial in Musk's lawsuit against Altman could result in financial damages and, more significantly, governance changes at OpenAI that may complicate its plans for an initial public offering as soon as this year. As the first witness on the stand, Musk immediately sought to frame his case as more than just about OpenAI. Siding with Altman "will give license to looting every charity in America" and shake the "entire foundation of charitable giving," Musk told a panel of nine jurors advising US District Judge Yvonne Gonzalez Rogers on how to rule.
Are Language Models Actually Useful for Time Series Forecasting?
Large language models (LLMs) are being applied to time series forecasting. But are language models actually useful for time series? In a series of ablation studies on three recent and popular LLM-based time series forecasting methods, we find that removing the LLM component or replacing it with a basic attention layer does not degrade forecasting performance---in most cases, the results even improve! We also find that despite their significant computational cost, pretrained LLMs do no better than models trained from scratch, do not represent the sequential dependencies in time series, and do not assist in few-shot settings. Additionally, we explore time series encoders and find that patching and attention structures perform similarly to LLM-based forecasters.
Musk says basis of charitable giving at stake in OpenAI lawsuit
A trial pitting two founders of OpenAI - Sam Altman and Elon Musk - against each other has opened in California, with the sides presenting duelling narratives about the company's history and obligations to consumers. Musk, wearing a dark suit and tie, was asked by one of his lawyers what the lawsuit was about when he took the stand. It's actually very simple, he said. It's not okay to steal a charity... If it's okay to loot a charity, the entire foundation of charitable giving will be destroyed.