Law
It's LIT! Reliability-Optimized LLMs with Inspectable Tools
Zhang, Ruixin, Donnelly, Jon, Guo, Zhicheng, Khalighinejad, Ghazal, Huang, Haiyang, Barnett, Alina Jade, Rudin, Cynthia
Large language models (LLMs) have exhibited remarkable capabilities across various domains. The ability to call external tools further expands their capability to handle real-world tasks. However, LLMs often follow an opaque reasoning process, which limits their usefulness in high-stakes domains where solutions need to be trustworthy to end users. LLMs can choose solutions that are unreliable and difficult to troubleshoot, even if better options are available. We address this issue by forcing LLMs to use external -- more reliable -- tools to solve problems when possible. We present a framework built on the tool-calling capabilities of existing LLMs to enable them to select the most reliable and easy-to-troubleshoot solution path, which may involve multiple sequential tool calls. We refer to this framework as LIT (LLMs with Inspectable Tools). In order to support LIT, we introduce a new and challenging benchmark dataset of 1,300 questions and a customizable set of reliability cost functions associated with a collection of specialized tools. These cost functions summarize how reliable each tool is and how easy it is to troubleshoot. For instance, a calculator is reliable across domains, whereas a linear prediction model is not reliable if there is distribution shift, but it is easy to troubleshoot. A tool that constructs a random forest is neither reliable nor easy to troubleshoot. These tools interact with the Harvard USPTO Patent Dataset and a new dataset of NeurIPS 2023 papers to solve mathematical, coding, and modeling problems of varying difficulty levels. We demonstrate that LLMs can achieve more reliable and informed problem-solving while maintaining task performance using our framework.
FinTRec: Transformer Based Unified Contextual Ads Targeting and Personalization for Financial Applications
Katariya, Dwipam, Varma, Snehita, Shreemali, Akshat, Wu, Benjamin, Mishra, Kalanand, Mohanty, Pranab
Transformer-based architectures are widely adopted in sequential recommendation systems, yet their application in Financial Services (FS) presents distinct practical and modeling challenges for real-time recommendation. These include:a) long-range user interactions (implicit and explicit) spanning both digital and physical channels generating temporally heterogeneous context, b) the presence of multiple interrelated products require coordinated models to support varied ad placements and personalized feeds, while balancing competing business goals. We propose FinTRec, a transformer-based framework that addresses these challenges and its operational objectives in FS. While tree-based models have traditionally been preferred in FS due to their explainability and alignment with regulatory requirements, our study demonstrate that FinTRec offers a viable and effective shift toward transformer-based architectures. Through historic simulation and live A/B test correlations, we show FinTRec consistently outperforms the production-grade tree-based baseline. The unified architecture, when fine-tuned for product adaptation, enables cross-product signal sharing, reduces training cost and technical debt, while improving offline performance across all products. To our knowledge, this is the first comprehensive study of unified sequential recommendation modeling in FS that addresses both technical and business considerations.
MelodySim: Measuring Melody-aware Music Similarity for Plagiarism Detection
Lu, Tongyu, Geist, Charlotta-Marlena, Melechovsky, Jan, Roy, Abhinaba, Herremans, Dorien
We propose MelodySim, a melody-aware music similarity model and dataset for plagiarism detection. First, we introduce a novel method to construct a dataset focused on melodic similarity. By augmenting Slakh2100, an existing MIDI dataset, we generate variations of each piece while preserving the melody through modifications such as note splitting, arpeggiation, minor track dropout, and re-instrumentation. A user study confirms that positive pairs indeed contain similar melodies, while other musical tracks are significantly changed. Second, we develop a segment-wise melodic-similarity detection model that uses a MERT encoder and applies a triplet neural network to capture melodic similarity. The resulting decision matrix highlights where plagiarism might occur. The experiments show that our model is able to outperform baseline models in detecting similar melodic fragments on the MelodySim test set.
Long-form factuality in large language models Jerry Wei 1 Chengrun Y ang 1 Xinying Song 1 Yifeng Lu
To benchmark a model's long-form factuality in open domains, we first use GPT -4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factuality through a method which we call Search-Augmented Factuality Evaluator (SAFE).
The Biggest AI Companies Met to Find a Better Path for Chatbot Companions
In a closed-door workshop led by Anthropic and Stanford, leading AI startups and researchers discussed guidelines for chatbot companions, especially for younger users. At Stanford for eight hours on Monday, representatives from Anthropic, Apple, Google, OpenAI, Meta, and Microsoft met in a closed-door workshop to discuss the use of chatbots as companions or in roleplay scenarios. Interactions with AI tools are often mundane, but they can also lead to dire outcomes. Users sometimes experience mental breakdowns during lengthy conversations with chatbots or confide in them about their suicidal ideations . "We need to have really big conversations across society about what role we want AI to play in our future as humans who are interacting with each other," says Ryn Linthicum, head of user well-being policy at Anthropic .
Ex Treasury boss Summers resigns from OpenAI after named in Epstein files
Does'America First' make the US weaker? Who is Marjorie Taylor Greene? Former United States Treasury Secretary Larry Summers has resigned from the OpenAI board, days after US President Donald Trump ordered the Justice Department to investigate his and other prominent Democrats' ties to convicted sex offender Jeffrey Epstein. The outlet Axios first reported the resignation on Wednesday. Anthropic's AI hacking claims divide experts We appreciate his many contributions and the perspective he brought to the Board," OpenAI's board of directors said in a statement. The move comes one day after the Republican-controlled US Congress voted almost unanimously to force the release of Department of Justice files on Epstein, an outcome Trump had fought for months before ending his opposition. He has served on the OpenAI board since late 2023, following the brief removal of the ChatGPT maker's CEO, Sam Altman. Other prominent companies with ties to Summers include edu-tech firm Skillsoft, where he has been a board member since 2021, and Santander, where he chairs the bank's international advisory board. He was also a former president of Harvard University. The resignation comes after Summers announced that he would step back from all other public commitments to "rebuild trust and repair relationships with the people closest to me". "Everyone in Washington has known who Larry Summers is for decades.
In Alex Karp's World, Palantir Is the Underdog
My parents didn't go to college, but his father was a pediatrician, Jewish American. His mother was an artist that still is an artist, and she's African American. So he is Black and Jewish parentage. He is dyslexic, and that's a big part of his identity. And when we talked about going to Central High School, which is kind of a magnet school, it's all academic, and it draws from all over the city.
Londoners are baffled as a huge AI-generated Christmas mural appears over Côte Brasserie in Kingston - so, can you see what's wrong with it?
Elon Musk caught playing with fire as he appears to makes explosive remark at Trump's Saudi banquet Clinton's private chat with'Hollywood' Gavin sets tongues wagging... and suddenly 2028 looks very different Wall Street hits'extreme fear' and stocks plunge. So we spoke to dozens of investment experts... and they all said exactly the same thing about your 401k: Read their urgent advice now Twist in cheerleader's mystery cruise ship death as FBI eyes shock suspect in criminal investigation Melania's subtle gesture to Saudi prince as she stuns in strapless green gown after Trump's extraordinary Oval Office defense sparked outrage NASA scientists are baffled to discover a rock on Mars that'doesn't belong there' This little-known skin condition ruined my life. It's not acne, eczema or even rosacea - but a combination of all three that appears out of nowhere and affects thousands. What really happened to Tati Westbrook: Her YouTube spat with James Charles backfired... then things took an even uglier turn. Gustav Klimt painting sells for $236.4 million as the most expensive piece of modern art ever sold at auction Carnage on America's roads as new deadly threat sparks widespread alarm: Read our full investigation Hakeem Jeffries becomes latest Democrat stung by Epstein files as he insists he'never met' billionaire Cristiano Ronaldo's touching moment with Barron Trump revealed as soccer star attends glitzy White House dinner'I can't listen to music any more.
Larry Summers resigns from OpenAI board after Epstein emails made public
Former US treasury secretary Larry Summers is stepping down from the board at OpenAI, a week after a tranche of emails between him and late convicted sex offender Jeffrey Epstein was released. Summers said in a statement to the BBC that he was grateful for the opportunity to have served, excited about the potential of the company, and look forward to following their progress. Summers, who was also once the president of Harvard University, said on Monday that he would be stepping back from public commitments over his ties to Epstein. The recently released emails showed Summers communicated with Epstein until the day before Epstein's 2019 arrest for the alleged sex trafficking of minors. In a statement, the artificial intelligence company said it respected Summers' decision to resign.