Goto

Collaborating Authors

 administrative data


Recent Advances, Applications and Open Challenges in Machine Learning for Health: Reflections from Research Roundtables at ML4H 2024 Symposium

Adibi, Amin, Cao, Xu, Ji, Zongliang, Kaur, Jivat Neet, Chen, Winston, Healey, Elizabeth, Nuwagira, Brighton, Ye, Wenqian, Woollard, Geoffrey, Xu, Maxwell A, Cui, Hejie, Xi, Johnny, Chang, Trenton, Bikia, Vasiliki, Zhang, Nicole, Noori, Ayush, Xia, Yuan, Hossain, Md. Belal, Frank, Hanna A., Peluso, Alina, Pu, Yuan, Shen, Shannon Zejiang, Wu, John, Fallahpour, Adibvafa, Mahbub, Sazan, Duncan, Ross, Zhang, Yuwei, Cao, Yurui, Xu, Zuheng, Craig, Michael, Krishnan, Rahul G., Beheshti, Rahmatollah, Rehg, James M., Karim, Mohammad Ehsanul, Coffee, Megan, Celi, Leo Anthony, Fries, Jason Alan, Sadatsafavi, Mohsen, Shung, Dennis, McWeeney, Shannon, Dafflon, Jessica, Jabbour, Sarah

arXiv.org Artificial Intelligence

The fourth Machine Learning for Health (ML4H) symposium was held in person on December 15th and 16th, 2024, in the traditional, ancestral, and unceded territories of the Musqueam, Squamish, and Tsleil-Waututh Nations in Vancouver, British Columbia, Canada. The symposium included research roundtable sessions to foster discussions between participants and senior researchers on timely and relevant topics for the ML4H community. The organization of the research roundtables at the conference involved 13 senior and 27 junior chairs across 13 tables. Each roundtable session included an invited senior chair (with substantial experience in the field), junior chairs (responsible for facilitating the discussion), and attendees from diverse backgrounds with an interest in the session's topic.


Temporal and Between-Group Variability in College Dropout Prediction

Glandorf, Dominik, Lee, Hye Rin, Orona, Gabe Avakian, Pumptow, Marina, Yu, Renzhe, Fischer, Christian

arXiv.org Artificial Intelligence

Large-scale administrative data is a common input in early warning systems for college dropout in higher education. Still, the terminology and methodology vary significantly across existing studies, and the implications of different modeling decisions are not fully understood. This study provides a systematic evaluation of contributing factors and predictive performance of machine learning models over time and across different student groups. Drawing on twelve years of administrative data at a large public university in the US, we find that dropout prediction at the end of the second year has a 20% higher AUC than at the time of enrollment in a Random Forest model. Also, most predictive factors at the time of enrollment, including demographics and high school performance, are quickly superseded in predictive importance by college performance and in later stages by enrollment behavior. Regarding variability across student groups, college GPA has more predictive value for students from traditionally disadvantaged backgrounds than their peers. These results can help researchers and administrators understand the comparative value of different data sources when building early warning systems and optimizing decisions under specific policy goals.


Harnessing Administrative Data Inventories to Create a Reliable Transnational Reference Database for Crop Type Monitoring

Schneider, Maja, Körner, Marco

arXiv.org Artificial Intelligence

With leaps in machine learning techniques and their applicationon Earth observation challenges has unlocked unprecedented performance across the domain. While the further development of these methods was previously limited by the availability and volume of sensor data and computing resources, the lack of adequate reference data is now constituting new bottlenecks. Since creating such ground-truth information is an expensive and error-prone task, new ways must be devised to source reliable, high-quality reference data on large scales. As an example, we showcase E URO C ROPS, a reference dataset for crop type classification that aggregates and harmonizes administrative data surveyed in different countries with the goal of transnational interoperability.

  Country:
  Genre: Research Report (0.40)
  Industry:

Big Data is not the New Oil: Common Misconceptions about Population Data

Christen, Peter, Schnell, Rainer

arXiv.org Artificial Intelligence

Databases covering all individuals of a population are increasingly used for research and decision-making. The massive size of such databases is often mistaken as a guarantee for valid inferences. However, population data have characteristics that make them challenging to use. Various assumptions on population coverage and data quality are commonly made, including how such data were captured and what types of processing have been applied to them. Furthermore, the full potential of population data can often only be unlocked when such data are linked to other databases. Record linkage often implies subtle technical problems, which are easily missed. We discuss a diverse range of misconceptions relevant for anybody capturing, processing, linking, or analysing population data. Remarkably many of these misconceptions are due to the social nature of data collections and are therefore missed by purely technical accounts of data processing. Many of these misconceptions are also not well documented in scientific publications. We conclude with a set of recommendations for using population data.


How AI Can Unlock the Full Potential of Clinical, Administrative Data

#artificialintelligence

Consensus Cloud Solutions, Inc. (NASDAQ: CCSI) is a global leader of digital technology for secure information transport. The company leverages its technology heritage to provide secure solutions that transform simple digital documents into actionable information, including advanced healthcare standards HL7 and FHIR for secure data exchange. Consensus offers eFax Corporate, a leading global cloud faxing solution; Consensus Signal for automatic real-time healthcare communications; Consensus Clarity, a Natural Language Processing and Artificial Intelligence solution; Consensus Unite and Consensus Harmony interoperability solutions; and jSign for secure digital signatures built on blockchain.


Machine learning and health care mean $6M for Predilytics

AITopics Original Links

Sometimes, two things just go together, such as peanut butter and jelly or, in the case of Boston-based startup Predilytics, machine learning and health care. The company announced on Tuesday afternoon it has closed a $6 million Series A round with investment from Flybridge Capital Partners, Highland Capital Partners and Google Ventures. It's not the first application of big data to health care, and it certainly won't be the last, but it's application of machine learning to health care providers' administrative data might be unique. As we've reported before, health care is a major focus for big data companies and data scientists because there's so much data involved and the problems are so heavy. The right analytic tools could end up saving lives or saving billions of dollars in an industry where just about everyone agrees that costs are out of control.


Market Segmentation with Novel Machine Learning

#artificialintelligence

While pharmaceutical marketers have long used attitudinal and behavioral segmentation approaches to identify potential customers and tailor marketing activities, traditional methods lack utility for healthcare-level commercial operations and tactics. Behavioral segmentation uses administrative data to segment physicians. So, for example, you might learn which physicians are early adopters, who prescribes what treatment options most, or whether a physician is based in a hospital or clinic setting – but segmentation factors are ultimately limited to data constructs available in administrative data. These data sources provide an accurate representation of certain specific behaviors, but cannot provide insights into motivations or triggers of their behavior, meaning the "why" and/or thought processes remain indeterminable. In addition, results are not always data-driven, because researchers often inject their own personal biases when defining healthcare provider characteristics and segments. Attitudinal segmentation uses surveys tailored to the exact business need -- measuring such healthcare provider characteristics as peer influence, industry friendliness, perception of safety signals, mechanics of decision making and therapy choice, or receptiveness to channels of communication, and treatment selection making, for example -- to understand what messages or information are most likely to resonate with a physician.


Unlocking Data to Improve Public Policy

Communications of the ACM

There is a growing consensus among policymakers that bringing high-quality evidence to bear on public policy decisions is essential to supporting the effective and efficient government their constituencies want and need. At the U.S. federal level, this view is reflected in a recent Congressional report by the Commission on Evidence-Based Policymaking, which recommends creating a data infrastructure that enables "a future in which rigorous evidence is created efficiently, as a routine part of government operations, and used to construct effective public policy."4 This article describes a new approach to data infrastructure for fact-based policy, developed through a partnership between our interdisciplinary organization Research Improving People's Livesa and the State of Rhode Island.13 Together, we constructed RI 360, an anonymized database that integrates administrative records from siloed databases across nearly every Rhode Island state agency. The comprehensive scope of RI 360 has enabled new insights across a wide range of policy areas, and supports ongoing research into improving policies to alleviate poverty and increase economic opportunity for all Rhode Island residents (see the sidebar "Policy Areas in which RI 360 Has Contributed Insights").


Predicting 72-hour and 9-day return to the emergency department using machine learning

#artificialintelligence

To predict 72-h and 9-day emergency department (ED) return by using gradient boosting on an expansive set of clinical variables from the electronic health record. This retrospective study included all adult discharges from a level 1 trauma center ED and a community hospital ED covering the period of March 2013 to July 2017. A total of 1500 variables were extracted for each visit, and samples split randomly into training, validation, and test sets (80%, 10%, and 10%). Gradient boosting models were fit on 3 selections of the data: administrative data (demographics, prior hospital usage, and comorbidity categories), data available at triage, and the full set of data available at discharge. A logistic regression (LR) model built on administrative data was used for baseline comparison. Finally, the top 20 most informative variables identified from the full gradient boosting models were used to build a reduced model for each outcome.


Recent Breakthrough Research Papers In AI Ethics

#artificialintelligence

Researchers, practitioners, and ethicists have sounded the alarm for years over potential malicious applications of AI technologies as well as unintended consequences from flawed or biased systems. The increased attention from these warnings have led to a proliferation of new and promising research highlighting the issues in existing AI approaches and ideating solutions to address them. We summarized 10 research papers covering different aspects of AI ethics in order to give you a preliminary overview of important work done in the space last year. There are many more papers, tools, and contributions in ethical AI which we didn't cover in this article, but are also worth your time to study and learn from. This article is simply a useful starting point.