Goto

Collaborating Authors

 La Crosse County


LLM-Match: An Open-Sourced Patient Matching Model Based on Large Language Models and Retrieval-Augmented Generation

arXiv.org Artificial Intelligence

Patient matching is the process of linking patients to appropriate clinical trials by accurately identifying and matching their medical records with trial eligibility criteria. We propose LLM-Match, a novel framework for patient matching leveraging fine-tuned open-source large language models. Our approach consists of four key components. First, a retrieval-augmented generation (RAG) module extracts relevant patient context from a vast pool of electronic health records (EHRs). Second, a prompt generation module constructs input prompts by integrating trial eligibility criteria (both inclusion and exclusion criteria), patient context, and system instructions. Third, a fine-tuning module with a classification head optimizes the model parameters using structured prompts and ground-truth labels. Fourth, an evaluation module assesses the fine-tuned model's performance on the testing datasets. We evaluated LLM-Match on four open datasets - n2c2, SIGIR, TREC 2021, and TREC 2022 - using open-source models, comparing it against TrialGPT, Zero-Shot, and GPT-4-based closed models. LLM-Match outperformed all baselines.


Bi-Filtration and Stability of TDA Mapper for Point Cloud Data

arXiv.org Machine Learning

Carlsson, Singh and Memoli's TDA mapper takes a point cloud dataset and outputs a graph that depends on several parameter choices. Dey, Memoli, and Wang developed Multiscale Mapper for abstract topological spaces so that parameter choices can be analyzed via persistent homology. However, when applied to actual data, one does not always obtain filtrations of mapper graphs. DBSCAN, one of the most common clustering algorithms used in the TDA mapper software, has two parameters, \textbf{$\epsilon$} and \textbf{MinPts}. If \textbf{MinPts = 1} then DBSCAN is equivalent to single linkage clustering with cutting height \textbf{$\epsilon$}. We show that if DBSCAN clustering is used with \textbf{MinPts $>$ 2}, a filtration of mapper graphs may not exist except in the absence of free-border points; but such filtrations exist if DBSCAN clustering is used with \textbf{MinPts = 1} or \textbf{2} as the cover size increases, \textbf{$\epsilon$} increases, and/or \textbf{MinPts} decreases. However, the 1-dimensional filtration is unstable. If one adds noise to a data set so that each data point has been perturbed by a distance at most \textbf{$\delta$}, the persistent homology of the mapper graph of the perturbed data set can be significantly different from that of the original data set. We show that we can obtain stability by increasing both the cover size and \textbf{$\epsilon$} at the same time. In particular, we show that the bi-filtrations of the homology groups with respect to cover size and $\epsilon$ between these two datasets are \textbf{2$\delta$}-interleaved.


Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC) -- an end-to-end model for characterizing severity and diagnosis

arXiv.org Artificial Intelligence

Automated classification of cancer pathology reports can extract information from unstructured reports and categorize each report into structured diagnosis and severity categories. Thus, such system can reduce the burden for populating tumor registries, help registration for clinical trial as well as developing large dataset for deep learning model development using true pathologic ground truth. However, the content of breast pathology reports can be difficult for categorize due to the high linguistic variability in content and wide variety of potential diagnoses >50. Existing NLP models are primarily focused on developing classifier for primary breast cancer types (e.g. IDC, DCIS, ILC) and tumor characteristics, and ignore the rare diagnosis of cancer subtypes. We then developed a hierarchical hybrid transformer-based pipeline (59 labels) - Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC), which utilizes the potential of the transformer context-preserving NLP technique and compared our model to several state of the art ML and DL models. We trained the model on the EUH data and evaluated our model's performance on two external datasets - MGH and Mayo Clinic. We publicly release the code and a live application under Huggingface spaces repository


Sketching Robot Programs On the Fly

arXiv.org Artificial Intelligence

Service robots for personal use in the home and the workplace require end-user development solutions for swiftly scripting robot tasks as the need arises. Many existing solutions preserve ease, efficiency, and convenience through simple programming interfaces or by restricting task complexity. Others facilitate meticulous task design but often do so at the expense of simplicity and efficiency. There is a need for robot programming solutions that reconcile the complexity of robotics with the on-the-fly goals of end-user development. In response to this need, we present a novel, multimodal, and on-the-fly development system, Tabula. Inspired by a formative design study with a prototype, Tabula leverages a combination of spoken language for specifying the core of a robot task and sketching for contextualizing the core. The result is that developers can script partial, sloppy versions of robot programs to be completed and refined by a program synthesizer. Lastly, we demonstrate our anticipated use cases of Tabula via a set of application scenarios.


AI reduces miss rate of precancerous polyps in colorectal cancer screening

#artificialintelligence

Most colon polyps are harmless, but some over time develop into colon or rectal cancer, which can be fatal if found in its later stages. Colorectal cancer is the second most deadly cancer in the world, with an estimated 1.9 million cases and 916,000 deaths worldwide in 2020, according to the World Health Organization. A colonoscopy is an exam used to detect changes or abnormalities in the large intestine (colon) and rectum. Between February 2020 and May 2021, 230 study participants each underwent two back-to-back colonoscopies on the same day at eight hospitals and community clinics in the U.S., U.K. and Italy. One colonoscopy used AI; the other, a standard colonoscopy, did not. The rate at which precancerous colorectal polyps is missed has been estimated to be 25%.


String Shuffling over a Gap between Parsing and Plan Recognition

AAAI Conferences

We propose a new probabilistic plan recognition algorithm YR based onan extension of Tomita's Generalized LR (GLR) parser for grammarsenriched with the shuffle operator. YR significantly outperformsprevious approaches based on top down parsers, shows more consistentrun times among similar libraries, and degrades more gracefully asplan library complexity increases. YR also lifts the restrictions onleft-recursion imposed by approaches based on top-down parsingalgorithms. We further propose that context-free shuffle grammars,more than traditional context-free grammars, should be seen as theappropriate analogue of HTN plan libraries in the correspondence ofplan recognition and parsing.


Machine Bias: There's Software Used Across the Country to Predict Future Criminals. And it's Biased Against Blacks.

#artificialintelligence

On a spring afternoon in 2014, Brisha Borden was running late to pick up her god-sister from school when she spotted an unlocked kid's blue Huffy bicycle and a silver Razor scooter. Borden and a friend grabbed the bike and scooter and tried to ride them down the street in the Fort Lauderdale suburb of Coral Springs. Just as the 18-year-old girls were realizing they were too big for the tiny conveyances -- which belonged to a 6-year-old boy -- a woman came running after them saying, "That's my kid's stuff." Borden and her friend immediately dropped the bike and scooter and walked away. But it was too late -- a neighbor who witnessed the heist had already called the police. Borden and her friend were arrested and charged with burglary and petty theft for the items, which were valued at a total of 80. Compare their crime with a similar one: The previous summer, 41-year-old Vernon Prater was picked up for shoplifting 86.35 worth of tools from a nearby Home Depot store. Prater was the more seasoned criminal. He had already been convicted of armed robbery and attempted armed robbery, for which he served five years in prison, in addition to another armed robbery charge. Borden had a record, too, but it was for misdemeanors committed when she was a juvenile.


The Legal System Uses an Algorithm to Predict If People Might Be Future Criminals. It's Biased Against Blacks.

Mother Jones

On a spring afternoon in 2014, Brisha Borden was running late to pick up her god-sister from school when she spotted an unlocked kid's blue Huffy bicycle and a silver Razor scooter. Borden and a friend grabbed the bike and scooter and tried to ride them down the street in the Fort Lauderdale suburb of Coral Springs. Just as the 18-year-old girls were realizing they were too big for the tiny conveyances--which belonged to a 6-year-old boy--a woman came running after them saying, "That's my kid's stuff." Borden and her friend immediately dropped the bike and scooter and walked away. But it was too late--a neighbor who witnessed the heist had already called the police. Borden and her friend were arrested and charged with burglary and petty theft for the items, which were valued at a total of 80. Compare their crime with a similar one: The previous summer, 41-year-old Vernon Prater was picked up for shoplifting 86.35 worth of tools from a nearby Home Depot store. Prater was the more seasoned criminal. He had already been convicted of armed robbery and attempted armed robbery, for which he served five years in prison, in addition to another armed robbery charge. Borden had a record, too, but it was for misdemeanors committed when she was a juvenile.


Complexity of Self-Preserving, Team-Based Competition in Partially Observable Stochastic Games

AAAI Conferences

Partially observable stochastic games (POSGs) are a robust and precise model for decentralized decision making under conditions of imperfect information, and extend popular Markov decision problem models. Complexity results for a wide range of such problems are known when agents work cooperatively to pursue common interests. When agents compete, things are less well understood. We show that under one understanding of rational competition, such problems are complete for the class NEXP^NP. This result holds for any such problem comprised of two competing teams of agents, where teams may be of any size whatsoever.