decision factor
Evaluating LLM Understanding via Structured Tabular Decision Simulations
Li, Sichao, Xu, Xinyue, Li, Xiaomeng
Large language models (LLMs) often achieve impressive predictive accuracy, yet correctness alone does not imply genuine understanding. True LLM understanding, analogous to human expertise, requires making consistent, well-founded decisions across multiple instances and diverse domains, relying on relevant and domain-grounded decision factors. We introduce Structured Tabular Decision Simulations (STaDS), a suite of expert-like decision settings that evaluate LLMs as if they were professionals undertaking structured decision ``exams''. In this context, understanding is defined as the ability to identify and rely on the correct decision factors, features that determine outcomes within a domain. STaDS jointly assesses understanding through: (i) question and instruction comprehension, (ii) knowledge-based prediction, and (iii) reliance on relevant decision factors. By analyzing 9 frontier LLMs across 15 diverse decision settings, we find that (a) most models struggle to achieve consistently strong accuracy across diverse domains; (b) models can be accurate yet globally unfaithful, and there are frequent mismatches between stated rationales and factors driving predictions. Our findings highlight the need for global-level understanding evaluation protocols and advocate for novel frameworks that go beyond accuracy to enhance LLMs' understanding ability.
- Asia (0.28)
- North America (0.27)
- Education (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.93)
- Banking & Finance (0.93)
Expertise Is What We Want
Ashworth, Alan, Al-Dajani, Munir, Duchicela, Keegan, Kafadarov, Kiril, Kurian, Allison, Laraki, Othman, Lazrak, Amina, Mandair, Divneet, McKennon, Wendy, Miksad, Rebecca, Sanghvi, Jayodita, Zack, Travis
Clinical decision-making depends on expert reasoning, which is guided by standardized, evidence-based guidelines. However, translating these guidelines into automated clinical decision support systems risks inaccuracy and importantly, loss of nuance. We share an application architecture, the Large Language Expert (LLE), that combines the flexibility and power of Large Language Models (LLMs) with the interpretability, explainability, and reliability of Expert Systems. LLMs help address key challenges of Expert Systems, such as integrating and codifying knowledge, and data normalization. Conversely, an Expert System-like approach helps overcome challenges with LLMs, including hallucinations, atomic and inexpensive updates, and testability. To highlight the power of the Large Language Expert (LLE) system, we built an LLE to assist with the workup of patients newly diagnosed with cancer. Timely initiation of cancer treatment is critical for optimal patient outcomes. However, increasing complexity in diagnostic recommendations has made it difficult for primary care physicians to ensure their patients have completed the necessary workup before their first visit with an oncologist. As with many real-world clinical tasks, these workups require the analysis of unstructured health records and the application of nuanced clinical decision logic. In this study, we describe the design & evaluation of an LLE system built to rapidly identify and suggest the correct diagnostic workup. The system demonstrated a high degree of clinical-level accuracy (>95%) and effectively addressed gaps identified in real-world data from breast and colon cancer patients at a large academic center.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Workflow (1.00)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.66)
Direct Heterogeneous Causal Learning for Resource Allocation Problems in Marketing
Zhou, Hao, Li, Shaoming, Jiang, Guibin, Zheng, Jiaqi, Wang, Dong
Marketing is an important mechanism to increase user engagement and improve platform revenue, and heterogeneous causal learning can help develop more effective strategies. Most decision-making problems in marketing can be formulated as resource allocation problems and have been studied for decades. Existing works usually divide the solution procedure into two fully decoupled stages, i.e., machine learning (ML) and operation research (OR) -- the first stage predicts the model parameters and they are fed to the optimization in the second stage. However, the error of the predicted parameters in ML cannot be respected and a series of complex mathematical operations in OR lead to the increased accumulative errors. Essentially, the improved precision on the prediction parameters may not have a positive correlation on the final solution due to the side-effect from the decoupled design. In this paper, we propose a novel approach for solving resource allocation problems to mitigate the side-effects. Our key intuition is that we introduce the decision factor to establish a bridge between ML and OR such that the solution can be directly obtained in OR by only performing the sorting or comparison operations on the decision factor. Furthermore, we design a customized loss function that can conduct direct heterogeneous causal learning on the decision factor, an unbiased estimation of which can be guaranteed when the loss converges. As a case study, we apply our approach to two crucial problems in marketing: the binary treatment assignment problem and the budget allocation problem with multiple treatments. Both large-scale simulations and online A/B Tests demonstrate that our approach achieves significant improvement compared with state-of-the-art.
Artificial Intelligence can Now Optimize Vibrations of Complex Systems
The dynamic conduct of a machine device (MT) plays an essential function in fulfilling the principle machining prerequisites, similar to high-speed operations, precision in axis positioning, and the ability to rapidly eliminate a high amount of workpiece material. These exhibitions are legitimately identified with the materials utilized in MT construction. Consequently, materials of MT establishments and moving parts should be chosen with high powerful qualities and the ability to dampen mechanical vibrations. Technical systems are getting progressively complex and simultaneously are turning out to be actually lighter. Considering these difficulties, the vibration optimization of lightweight structures can turn out to be mind-boggling to the point that it presently can don't be constrained by traditional techniques.
Artificial Intelligence can Now Optimize Vibrations of Complex Systems
The dynamic conduct of a machine device (MT) plays an essential function in fulfilling the principle machining prerequisites, similar to high-speed operations, precision in axis positioning, and the ability to rapidly eliminate a high amount of workpiece material. These exhibitions are legitimately identified with the materials utilized in MT construction. Consequently, materials of MT establishments and moving parts should be chosen with high powerful qualities and the ability to dampen mechanical vibrations. Technical systems are getting progressively complex and simultaneously are turning out to be actually lighter. Considering these difficulties, the vibration optimization of lightweight structures can turn out to be mind-boggling to the point that it presently can don't be constrained by traditional techniques.
Putting the Art in Smart … and the IoT in Idiot #03 : Connect the Dots
As we've talked about "Going Broad" and "Embracing Fuzziness," we've mentioned cause-and-effect relationships, and understanding upstream and downstream impacts, and correlating "fuzzy" inputs. So by this point, you understand that linking data is more important than just collecting data. So maybe this will be a very short chapter. There it is, 4 sentences and we're all done. Just within the last few weeks you've heard someone say "We'll collect the data, and then we'll figure out what to do with it."
That Space Cadet Glow No.29
One of the biggest risks of Artificial Intelligence is that no-one really understands how it works. By that I mean that when a complex algorithm works something out, it is extremely difficult to reverse engineer the process that was followed to get to that decision (a bit like recreating an egg from an omelette). This opaqueness of the inner workings of the algorithms is a worry to people who rely on the types of decisions it is making, such as credit approvals, medical diagnoses and financial trading. This allows it to test for discrimination, either in the underlying models used or in the way the AI has been trained - any adaptive system will inherently reflect the biases of the information (and people) used to train it. So, if a system to filter job applications is built, it could easily do that role with, say, gender bias if the training data and any subsequent reinforcement included that bias.
Predicting Suicide Attacks: A Fuzzy Soft Set Approach
This paper models a decision support system to predict the occurance of suicide attack in a given collection of cities. The system comprises two parts. First part analyzes and identifies the factors which affect the prediction. Admitting incomplete information and use of linguistic terms by experts, as two characteristic features of this peculiar prediction problem we exploit the Theory of Fuzzy Soft Sets. Hence the Part 2 of the model is an algorithm vz. FSP which takes the assessment of factors given in Part 1 as its input and produces a possibility profile of cities likely to receive the accident. The algorithm is of O(2^n) complexity. It has been illustrated by an example solved in detail. Simulation results for the algorithm have been presented which give insight into the strengths and weaknesses of FSP. Three different decision making measures have been simulated and compared in our discussion.
- Asia > Pakistan > Punjab > Lahore Division > Lahore (0.05)
- North America > United States (0.04)
- Asia > Pakistan > Islamabad Capital Territory > Islamabad (0.04)
- Asia > Pakistan > Sindh > Karachi Division > Karachi (0.04)