apollo 11
- North America > United States > California (0.14)
- Asia > Bhutan (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- (3 more...)
- Banking & Finance > Economy (1.00)
- Energy (0.93)
- Government > Regional Government > North America Government > United States Government (0.47)
- Education > Educational Setting > Higher Education (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Communications > Social Media (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
- North America > United States > California (0.14)
- Asia > Bhutan (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- (3 more...)
- Banking & Finance > Economy (1.00)
- Energy (0.93)
- Government > Regional Government > North America Government > United States Government (0.47)
- Education > Educational Setting > Higher Education (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Communications > Social Media (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
A Practical Guide for Evaluating LLMs and LLM-Reliant Systems
Rudd, Ethan M., Andrews, Christopher, Tully, Philip
Recent advances in generative AI have led to remarkable interest in using systems that rely on large language models (LLMs) for practical applications. However, meaningful evaluation of these systems in real-world scenarios comes with a distinct set of challenges, which are not well-addressed by synthetic benchmarks and de-facto metrics that are often seen in the literature. We present a practical evaluation framework which outlines how to proactively curate representative datasets, select meaningful evaluation metrics, and employ meaningful evaluation methodologies that integrate well with practical development and deployment of LLM-reliant systems that must adhere to real-world requirements and meet user-facing needs.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Information Technology > Security & Privacy (1.00)
- Government (0.71)
DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation
Wanner, Miriam, Van Durme, Benjamin, Dredze, Mark
The decompose-then-verify strategy for verification of Large Language Model (LLM) generations decomposes claims that are then independently verified. Decontextualization augments text (claims) to ensure it can be verified outside of the original context, enabling reliable verification. While decomposition and decontextualization have been explored independently, their interactions in a complete system have not been investigated. Their conflicting purposes can create tensions: decomposition isolates atomic facts while decontextualization inserts relevant information. Furthermore, a decontextualized subclaim presents a challenge to the verification step: what part of the augmented text should be verified as it now contains multiple atomic facts? We conduct an evaluation of different decomposition, decontextualization, and verification strategies and find that the choice of strategy matters in the resulting factuality scores. Additionally, we introduce DnDScore, a decontextualization aware verification method which validates subclaims in the context of contextual information.
- North America > United States > Alabama (0.05)
- Asia > Singapore (0.04)
- Europe > United Kingdom > Northern Ireland (0.04)
- (7 more...)
- Personal (0.68)
- Research Report (0.64)
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources
Lupidi, Alisia, Gemmell, Carlos, Cancedda, Nicola, Dwivedi-Yu, Jane, Weston, Jason, Foerster, Jakob, Raileanu, Roberta, Lomeli, Maria
Large Language Models still struggle in challenging scenarios that leverage structured data, complex reasoning, or tool usage. In this paper, we propose Source2Synth: a new method that can be used for teaching LLMs new skills without relying on costly human annotations. Source2Synth takes as input a custom data source and produces synthetic data points with intermediate reasoning steps grounded in real-world sources. Source2Synth improves the dataset quality by discarding low-quality generations based on their answerability. We demonstrate the generality of this approach by applying it to two challenging domains: we test reasoning abilities in multi-hop question answering (MHQA), and tool usage in tabular question answering (TQA). Our method improves performance by 25.51% for TQA on WikiSQL and 22.57% for MHQA on HotPotQA compared to the fine-tuned baselines.
- North America > United States (0.47)
- North America > Mexico (0.04)
- North America > Canada (0.04)
- (5 more...)
Compositional Generalization for Data-to-Text Generation
Xu, Xinnuo, Titov, Ivan, Lapata, Mirella
Data-to-text generation involves transforming structured data, often represented as predicate-argument tuples, into coherent textual descriptions. Despite recent advances, systems still struggle when confronted with unseen combinations of predicates, producing unfaithful descriptions (e.g. hallucinations or omissions). We refer to this issue as compositional generalisation, and it encouraged us to create a benchmark for assessing the performance of different approaches on this specific problem. Furthermore, we propose a novel model that addresses compositional generalization by clustering predicates into groups. Our model generates text in a sentence-by-sentence manner, relying on one cluster of predicates at a time. This approach significantly outperforms T5~baselines across all evaluation metrics.Notably, it achieved a 31% improvement over T5 in terms of a metric focused on maintaining faithfulness to the input.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Middle East > Republic of Türkiye > İzmir Province > İzmir (0.05)
- (23 more...)
LIMA: Less Is More for Alignment
Zhou, Chunting, Liu, Pengfei, Xu, Puxin, Iyer, Srini, Sun, Jiao, Mao, Yuning, Ma, Xuezhe, Efrat, Avia, Yu, Ping, Yu, Lili, Zhang, Susan, Ghosh, Gargi, Lewis, Mike, Zettlemoyer, Luke, Levy, Omer
Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. LIMA demonstrates remarkably strong performance, learning to follow specific response formats from only a handful of examples in the training data, including complex queries that range from planning trip itineraries to speculating about alternate history. Moreover, the model tends to generalize well to unseen tasks that did not appear in the training data. In a controlled human study, responses from LIMA are either equivalent or strictly preferred to GPT-4 in 43% of cases; this statistic is as high as 58% when compared to Bard and 65% versus DaVinci003, which was trained with human feedback. Taken together, these results strongly suggest that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is necessary to teach models to produce high quality output.
- North America > United States > California (0.14)
- Asia > Bhutan (0.04)
- Africa > Sudan (0.04)
- Africa > Middle East > Egypt > Giza Governorate > Giza (0.04)
- Government (1.00)
- Banking & Finance > Economy (1.00)
- Energy (0.68)
- Education (0.68)
Meet the American who wrote the moon-landing software: Margaret Hamilton, computer whiz and mom
Computer prodigy Hamilton was just 32 years old when Apollo 11 put men on the moon, guided by her innovative software that saved the mission from being aborted minutes before landing on the lunar surface. The Apollo 11 moon landing was one giant leap for womankind. Credit Margaret Hamilton, a 32-year-old mother and computer whiz at the Massachusetts Institute of Technology, who wrote the software that placed Neil Armstrong and Buzz Aldrin on the moon on July 20, 1969. She also worked on the five moon-landing missions that followed. The director of software engineering at MIT's Instrumentation Laboratory, Hamilton was a pioneer of computer science in a transformative era, and on a transformative mission, in human history.
- North America > United States > Massachusetts (0.24)
- North America > United States > Michigan (0.05)
- North America > United States > District of Columbia > Washington (0.05)
- North America > United States > Indiana > Wayne County > Richmond (0.04)
- Government > Space Agency (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
Cybereum Newsletter Vol-4
The energy consumption from crypto mining has been increasingly exponentially with the increasing adoption of crypto. This increasing becoming of concern as it should be. Large parts of the world suffer from energy deprivation due to unaffordability and inadequate energy generation. At the same time climate change goals will require the world to reduce net emission much of which is produced from electricity generation. Supporting the world's growth and generating the and while reducing emissions when large populations suffer from energy deficiency is a very difficult issue requires trillions is capital over the coming 2 decades.
- Government > Regional Government > North America Government > United States Government (1.00)
- Energy (1.00)
- Banking & Finance > Trading (0.76)
- Government > Space Agency (0.75)
MIT deepfake shows Nixon sadly saying the Moon astronauts died
Because the mission succeeded, Nixon never delivered the speech, but MIT engineers used deepfake technology to create a news broadcast in which a digitally-reconstructed Nixon delivers the bad news, WBUR News reports. The deepfake, which will be presented at a film festival Friday, illustrates just how easy it is to make virtual puppets deliver convincing speeches, even if they're totally removed from history. Francesca Panetta, co-director of the larger film in which the deepfake appears, told WBUR that she had someone actually read the script while impersonating Nixon's intonation and then used software to make the recording sound even more like Nixon's voice. It's not the most advanced way to create deepfakes out there, but it still gets the job done. "I had one person say, 'Oh, so you got an impersonator to impersonate Nixon,'" she told WBUR.