Goto

Collaborating Authors

 Accuracy


COVID-19 testing: One size does not fit all

Science

Tests for detecting severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) were developed within days of the release of the virus genome ([ 1 ][1]). Multiple countries have been successful at controlling SARS-CoV-2 transmission by investing in large-scale testing capacity ([ 2 ][2]). Most testing has focused on quantitative polymerase chain reaction (qPCR) assays, which are capable of detecting minute amounts of viral RNA. Although powerful, these molecular tools cannot be scaled to meet demands for more extensive public health testing. To combat COVID-19, the โ€œone-size-fits-allโ€ approach that has dominated and confused decision-making with regard to testing and the evaluation of tests is unsuitable: Diagnostics, screening, and surveillance serve different purposes, demand distinct strategies, and require separate approval mechanisms. By supporting the innovation, approval, manufacturing, and distribution of simpler and cheaper screening and surveillance tools, it will be possible to more effectively limit the spread of COVID-19 and respond to future pandemics. Many types of tests are available for COVID-19 for clinical and public health use (see the figure). Testing can be performed in a central laboratory, at the point of care (POC), or in the community at the workplace, school, or home. COVID-19 testing begins with specimen collection. For medical use, a nasopharyngeal swab collected by a health care professional has been used for detection of virus infections. Demands on testing throughput for COVID-19, however, have driven new collection approaches, including saliva and less invasive nasal swabs. COVID-19 tests include molecular tests such as qPCR, isothermal amplification, and CRISPR, as well as antigen tests that detect SARS-CoV-2 proteins directly. Although rapid antigen tests have lower analytical sensitivity (i.e., require greater amounts of virus material to turn positive) than qPCR-based tests, their ability to detect infectious individuals with culturable virus is as high as for qPCR ([ 3 ][3]). Specificity (i.e., correctly identifying those not infected with SARS-CoV-2) of antigen tests achieves comparable results to molecular tests ([ 4 ][4]). Diagnostic testing for COVID-19 focuses on accurately identifying patients who are infected with SARS-CoV-2 to establish the presence or absence of disease and is performed on symptomatic patients or asymptomatic individuals who are at high risk of infection. This type of testing requires assays that are highly sensitive, so as to not miss COVID-19 patients (false negatives), and specific, so as to not wrongly diagnose SARS-CoV-2โ€“negative individuals as having COVID-19 (false positives). These tests are typically performed by centralized high-complexity laboratories with specialized equipment using qPCR assays, with results that can be reported within 12 to 48 hours. Major bottlenecks in testing, however, have led to turnaround times exceeding 5 to 10 days in some regions, making such tests useless to prevent transmission. POC diagnostic testing at medical facilities can be qPCR assays, isothermal amplification, or antigen-based ([ 4 ][4]). These POC tests often require instruments that run a limited number of tests and can return results in under an hour. The need for an instrument limits the number of tests that can be performed and where they can be used. However, newer antigen tests are becoming available that do not require instruments or skilled operators, potentially allowing for much more distributed POC testing. Surveillance testing of populations can be used both as a tool for understanding historical exposures and as a measure of ongoing community transmission. For the former, serological testing of individuals for the presence of SARS-CoV-2โ€“specific antibodies is used to identify those previously infected. For the latter, surveillance testing can be an effective way to monitor real-time SARS-CoV-2 spread in communities. One promising method is wastewater surveillance, which has been used to assess community transmission of poliovirus ([ 5 ][5]) and has shown potential for COVID-19 ([ 6 ][6]). qPCR testing of wastewater is used to detect SARS-CoV-2, and frequency dynamics of viral genetic material indicate COVID-19 infections in a community. Surveillance can also be performed from swab or saliva samples taken directly from individuals, and, in populations with low COVID-19 prevalence, pooling can be used to increase capacity and lower cost. For surveillance testing, the goal is not identification of every case but rather the collection of data from representative samples that accurately measure prevalence and serve to inform public health policy and resource allocation. Because the focus is on extrapolations to the population and not the individual, tests with known deviations from 100% sensitivity and specificity are still appropriate when the variance can be statistically corrected ([ 7 ][7]). To be most effective, results should include reported qPCR cycle thresholds, which is an estimate of viral load ([ 7 ][7]), to model epidemic trajectory and allow for real-time evaluation of mitigation programs ([ 8 ][8]), including once vaccination programs have begun. Screening testing of asymptomatic individuals to detect people who are likely infectious has been critically underused yet is one of the most promising tools to combat the COVID-19 pandemic ([ 9 ][9]). Infection with SARS-CoV-2 does not lead to symptoms in โˆผ20 to 40% of cases, and symptomatic disease is preceded by a presymptomatic incubation period ([ 10 ][10]). However, asymptomatic and presymptomatic cases are key contributors to virus spread, complicating our ability to break transmission chains ([ 10 ][10]). Entry screening to detect infectious individuals before accessing facilities (e.g., nursing homes, restaurants, and airports), along with symptom screening and temperature checks, can be beneficial, particularly in high-risk facilities such as skilled nursing facilities. When used strategically, entry-screening measures can be effective at suppressing transmission. Entry screening requires testing that provides rapid resultsโ€”ideally within 15 minโ€”to be most effective. The required sensitivity and specificity of entry-screening tests are, like all tests, context dependent. Entry-screening tests for a nursing home, for example, must be highly sensitive because the consequences of bringing SARS-CoV-2 into a nursing home can be devastating. Such tests must also be highly specific because the consequences of grouping a false-positive person with COVID-19โ€“positive individuals could be deadly. Conversely, because children have substantially reduced mortality from COVID-19, entry screening into schools might require greater compromise that balances resources and sensitivity to test as many individuals as possible with a need to minimize disruptive false positives. Key to use of tests for entrance screening is that a negative test alone should not be considered sufficient to enterโ€”that should be based on satisfying other requirements, including masks and physical distancing. Conversely, a positive test should be sufficient to bar entry in most settings. Public health screening is potentially the most powerful form of COVID-19 testing, aimed at outbreak suppression through maximizing detection of infectious individuals. This type of screening entails frequent serial testing of large fractions of the population, through self-administered at-home rapid tests, or in the community at high-contact settings, such as schools and workplaces ([ 9 ][9]). Public health screening can achieve herd effects by stopping onward spread through detection of asymptomatic or presymptomatic cases (fig. S1). Notably, not every transmission chain needs to be severed to achieve herd effects. Mathematical models that incorporate relevant variation in viral loads and test accuracy suggest that with frequent testing of a large fraction of a population, a sufficient number of cases could be detected to create herd effects ([ 11 ][11]). For example, Slovakia undertook public health screening to address COVID-19 ([ 12 ][12]): During a 2-week period, โˆผ80% of the population was screened using rapid antigen tests. With 50,000 cases identified, combined with other public health measures, it reduced incidence by 82% within 2 weeks ([ 12 ][12]). An important feature of large-scale public health screening is that centrally controlled reporting and contact tracing programs are not essential to induce herd effects as they are for surveillance testing. In a robust public health screening program, sufficient numbers of people are routinely testing themselves, such that contact tracing is subsumed by the screening program ([ 11 ][11]). Similar to home pregnancy tests, screening tests should be easy to obtain and administer, fast, and cheap. Like diagnostic tests, these tests must produce very low false-positive rates. If a screening test does not achieve high-enough specificity (e.g., >99.9%), screening programs can be paired with secondary confirmatory testing. Unlike diagnostic tests, however, the sensitivity of screening tests should not be determined based on their ability to diagnose patients but rather by their ability to accurately identify people who are most at risk of transmitting SARS-CoV-2. Such individuals tend to have higher viral loads ([ 13 ][13]), which makes the virus easier to detect ([ 14 ][14]). A focus on identifying infectious people means that frequency and abundance of tests should be prioritized above achieving high analytical sensitivity ([ 11 ][11]). Indeed, loss in sensitivity of individual tests, within reason, can be compensated for by frequency of testing and wider dissemination of tests ([ 9 ][9]). In addition, public health messaging should ensure appropriate expectations of screening, particularly around sensitivity and specificity so that false negatives and false positives do not erode public trust. ![Figure][15] COVID-19 testing strategies Testing for SARS-CoV-2 can be for personal or population health. Collection can be from symptomatic or asymptomatic individuals, as well as from wastewater and swabs of surfaces. The tests may be performed in central laboratories, at the POC, or using rapid tests. Attributes of tests differ according to application. GRAPHIC: KELLIE HOLOSKI/ SCIENCE Tests for public health screening require rapid, decentralized solutions that can be scaled for frequent screening of large numbers of asymptomatic individuals. Lateral-flow antigen tests and upcoming paper-based synthetic biology and CRISPR-based assays fit these needs and could be scaled to tens of millions of daily tests ([ 9 ][9]). These tests are simple and cheap, can be self-administered, and do not require machines to run and return results. The Abbott BinaxNOW rapid antigen test, which recently received an Emergency Use Authorization (EUA) in the United States as a diagnostic device, also comes with a smartphone app, allowing self-reporting of COVID-19 status that could be used instead of centralized reporting by public health agencies. Critically, despite being shown to be highly effective at detecting infectious individuals ([ 14 ][14]), very few of these tests are currently approved for screening of asymptomatic individuals, substantially limiting their utility. If such tests were made available direct to consumer (priced to allow equitable access) or produced and provided free of charge by governments, individuals could obtain their COVID-19 status at their own choosing and without complex medical decisions. Testing is a central pillar of clinical and public health response to global health emergencies, including the COVID-19 pandemic. Nearly all testing modalities have a role, and the one-size-fits-all approach to testing by many Western countries has failed. Many lower- and middle-income countriesโ€”including Senegal, Vietnam, and Ghanaโ€”have fared far better in their COVID-19 response, often using strong testing programs. The focus on diagnostic tests and the use of preexisting authorization pathways focused on qPCR-based clinical diagnostics not only slows the development and deployment of new surveillance and screening tests but also confuses the picture of what metrics effective public health tools should achieve. Testing to diagnose a patient with COVID-19 is fundamentally different from testing a person to prevent onward transmission. Regulatory pathways should be modified to incorporate these differences so that public health and screening tests are appropriately evaluated. It is necessary to be innovative and produce, distribute, and continuously improve the tests that exist to save lives and gain control of the COVID-19 pandemic. [science.sciencemag.org/content/371/6525/126/suppl/DC1][16] 1. [โ†ต][17]1. V. M. Corman et al ., Euro. Surveill. 25, 2000045 (2020). [OpenUrl][18][CrossRef][19][PubMed][20] 2. [โ†ต][21]1. M. G. Baker et al ., N. Engl. J. Med. 383, e56 (2020). [OpenUrl][22][CrossRef][23][PubMed][24] 3. [โ†ต][25]1. A. Pekosz et al ., medRxiv 10.1101/2020.10.02.20205708 (2020). 4. [โ†ต][26]1. R. Weissleder et al ., Sci. Transl. Med. 12, abc1931 (2020). [OpenUrl][27][CrossRef][28] 5. [โ†ต][29]1. H. Asghar et al ., J. Infect. Dis. 210, S294 (2014). [OpenUrl][30][CrossRef][31][PubMed][32] 6. [โ†ต][33]1. A. Nemudryi et al ., Cell Rep. Med. 1, 100098 (2020). [OpenUrl][34][CrossRef][35][PubMed][36] 7. [โ†ต][37]1. R. Kahn et al ., medRxiv 10.1101/2020.05.02.20088765 (2020). 8. [โ†ต][38]1. J. A. Hay et al ., medRxiv 10.1101/2020.10.08.20204222 (2020). 9. [โ†ต][39]1. M. J. Mina et al ., N. Engl. J. Med. 383, e120 (2020). [OpenUrl][40][PubMed][41] 10. [โ†ต][42]1. X. He et al ., Nat. Med. 26, 672 (2020). [OpenUrl][43][CrossRef][44][PubMed][41] 11. [โ†ต][45]1. D. B. Larremore et al ., Sci. Adv. 10.1126/sciadv.abd5393 (2020). 12. [โ†ต][46]1. M. Pavelka et al ., โ€œThe effectiveness of population-wide, rapid antigen test based screening in reducing SARS-CoV-2 infection prevalence in Slovakia,โ€ CMMID Repository, 11 November 2020; . 13. [โ†ต][47]1. E. A. Meyerowitz et al ., Ann. Intern. Med. 10.7326/M20-5008 (2020). 14. [โ†ต][48]1. V. M. Corman et al ., medRxiv 10.1101/2020.11.12.20230292 (2020). [1]: #ref-1 [2]: #ref-2 [3]: #ref-3 [4]: #ref-4 [5]: #ref-5 [6]: #ref-6 [7]: #ref-7 [8]: #ref-8 [9]: #ref-9 [10]: #ref-10 [11]: #ref-11 [12]: #ref-12 [13]: #ref-13 [14]: #ref-14 [15]: pending:yes [16]: http://science.sciencemag.org/content/371/6525/126/suppl/DC1 [17]: #xref-ref-1-1 "View reference 1 in text" [18]: {openurl}?query=rft.jtitle%253DEuro.%2BSurveill.%26rft.volume%253D25%26rft.spage%253D2000045%26rft_id%253Dinfo%253Adoi%252F10.2807%252F1560-107917.ES.2020.25.3.2000045%26rft_id%253Dinfo%253Apmid%252F31992387%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [19]: /lookup/external-ref?access_num=10.2807/1560-107917.ES.2020.25.3.2000045&link_type=DOI [20]: /lookup/external-ref?access_num=31992387&link_type=MED&atom=%2Fsci%2F371%2F6525%2F126.atom [21]: #xref-ref-2-1 "View reference 2 in text" [22]: {openurl}?query=rft.jtitle%253DN.%2BEngl.%2BJ.%2BMed.%26rft.volume%253D383%26rft.spage%253De56%26rft_id%253Dinfo%253Adoi%252F10.1056%252FNEJMc2025203%26rft_id%253Dinfo%253Apmid%252F32767891%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [23]: /lookup/external-ref?access_num=10.1056/NEJMc2025203&link_type=DOI [24]: /lookup/external-ref?access_num=32767891&link_type=MED&atom=%2Fsci%2F371%2F6525%2F126.atom [25]: #xref-ref-3-1 "View reference 3 in text" [26]: #xref-ref-4-1 "View reference 4 in text" [27]: {openurl}?query=rft.jtitle%253DSci.%2BTransl.%2BMed.%26rft.volume%253D12%26rft.spage%253Dabc1931%26rft_id%253Dinfo%253Adoi%252F10.1126%252Fscitranslmed.abc1931%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [28]: /lookup/external-ref?access_num=10.1126/scitranslmed.abc1931&link_type=DOI [29]: #xref-ref-5-1 "View reference 5 in text" [30]: {openurl}?query=rft.jtitle%253DJ.%2BInfect.%2BDis.%26rft_id%253Dinfo%253Adoi%252F10.1093%252Finfdis%252Fjiu384%26rft_id%253Dinfo%253Apmid%252F25316848%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [31]: /lookup/external-ref?access_num=10.1093/infdis/jiu384&link_type=DOI [32]: /lookup/external-ref?access_num=25316848&link_type=MED&atom=%2Fsci%2F371%2F6525%2F126.atom [33]: #xref-ref-6-1 "View reference 6 in text" [34]: {openurl}?query=rft.jtitle%253DCell%2BRep.%2BMed.%26rft.volume%253D1%26rft.spage%253D100098%26rft_id%253Dinfo%253Adoi%252F10.1016%252Fj.xcrm.2020.100098%26rft_id%253Dinfo%253Apmid%252F32904687%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [35]: /lookup/external-ref?access_num=10.1016/j.xcrm.2020.100098&link_type=DOI [36]: /lookup/external-ref?access_num=32904687&link_type=MED&atom=%2Fsci%2F371%2F6525%2F126.atom [37]: #xref-ref-7-1 "View reference 7 in text" [38]: #xref-ref-8-1 "View reference 8 in text" [39]: #xref-ref-9-1 "View reference 9 in text" [40]: {openurl}?query=rft.jtitle%253DN.%2BEngl.%2BJ.%2BMed.%26rft.volume%253D383%26rft.spage%253De120%26rft_id%253Dinfo%253Apmid%252Fhttp%253A%252F%252Fwww.n%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [41]: /lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fsci%2F371%2F6525%2F126.atom [42]: #xref-ref-10-1 "View reference 10 in text" [43]: {openurl}?query=rft.jtitle%253DNat.%2BMed.%26rft.volume%253D26%26rft.spage%253D672%26rft_id%253Dinfo%253Adoi%252F10.7326%252FM20-3012%26rft_id%253Dinfo%253Apmid%252Fhttp%253A%252F%252Fwww.n%26rft.genre%253Darticle%26rft_val_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Ajournal%26ctx_ver%253DZ39.88-2004%26url_ver%253DZ39.88-2004%26url_ctx_fmt%253Dinfo%253Aofi%252Ffmt%253Akev%253Amtx%253Actx [44]: /lookup/external-ref?access_num=10.7326/M20-3012&link_type=DOI [45]: #xref-ref-11-1 "View reference 11 in text" [46]: #xref-ref-12-1 "View reference 12 in text" [47]: #xref-ref-13-1 "View reference 13 in text" [48]: #xref-ref-14-1 "View reference 14 in text"


SHARKS: Smart Hacking Approaches for RisK Scanning in Internet-of-Things and Cyber-Physical Systems based on Machine Learning

arXiv.org Artificial Intelligence

Cyber-physical systems (CPS) and Internet-of-Things (IoT) devices are increasingly being deployed across multiple functionalities, ranging from healthcare devices and wearables to critical infrastructures, e.g., nuclear power plants, autonomous vehicles, smart cities, and smart homes. These devices are inherently not secure across their comprehensive software, hardware, and network stacks, thus presenting a large attack surface that can be exploited by hackers. In this article, we present an innovative technique for detecting unknown system vulnerabilities, managing these vulnerabilities, and improving incident response when such vulnerabilities are exploited. The novelty of this approach lies in extracting intelligence from known real-world CPS/IoT attacks, representing them in the form of regular expressions, and employing machine learning (ML) techniques on this ensemble of regular expressions to generate new attack vectors and security vulnerabilities. Our results show that 10 new attack vectors and 122 new vulnerability exploits can be successfully generated that have the potential to exploit a CPS or an IoT ecosystem. The ML methodology achieves an accuracy of 97.4% and enables us to predict these attacks efficiently with an 87.2% reduction in the search space. We demonstrate the application of our method to the hacking of the in-vehicle network of a connected car. To defend against the known attacks and possible novel exploits, we discuss a defense-in-depth mechanism for various classes of attacks and the classification of data targeted by such attacks. This defense mechanism optimizes the cost of security measures based on the sensitivity of the protected resource, thus incentivizing its adoption in real-world CPS/IoT by cybersecurity practitioners.


Bridging In- and Out-of-distribution Samples for Their Better Discriminability

arXiv.org Artificial Intelligence

This paper proposes a method for OOD detection. Questioning the premise of previous studies that ID and OOD samples are separated distinctly, we consider samples lying in the intermediate of the two and use them for training a network. We generate such samples using multiple image transformations that corrupt inputs in various ways and with different severity levels. We estimate where the generated samples by a single image transformation lie between ID and OOD using a network trained on clean ID samples. To be specific, we make the network classify the generated samples and calculate their mean classification accuracy, using which we create a soft target label for them. We train the same network from scratch using the original ID samples and the generated samples with the soft labels created for them. We detect OOD samples by thresholding the entropy of the predicted softmax probability. The experimental results show that our method outperforms the previous state-of-the-art in the standard benchmark tests. We also analyze the effect of the number and particular combinations of image corrupting transformations on the performance.


Multimodal Gait Recognition for Neurodegenerative Diseases

arXiv.org Artificial Intelligence

In recent years, single modality based gait recognition has been extensively explored in the analysis of medical images or other sensory data, and it is recognised that each of the established approaches has different strengths and weaknesses. As an important motor symptom, gait disturbance is usually used for diagnosis and evaluation of diseases; moreover, the use of multi-modality analysis of the patient's walking pattern compensates for the one-sidedness of single modality gait recognition methods that only learn gait changes in a single measurement dimension. The fusion of multiple measurement resources has demonstrated promising performance in the identification of gait patterns associated with individual diseases. In this paper, as a useful tool, we propose a novel hybrid model to learn the gait differences between three neurodegenerative diseases, between patients with different severity levels of Parkinson's disease and between healthy individuals and patients, by fusing and aggregating data from multiple sensors. A spatial feature extractor (SFE) is applied to generating representative features of images or signals. In order to capture temporal information from the two modality data, a new correlative memory neural network (CorrMNN) architecture is designed for extracting temporal features. Afterwards, we embed a multi-switch discriminator to associate the observations with individual state estimations. Compared with several state-of-the-art techniques, our proposed framework shows more accurate classification results.


Associated Spatio-Temporal Capsule Network for Gait Recognition

arXiv.org Artificial Intelligence

It is a challenging task to identify a person based on her/his gait patterns. State-of-the-art approaches rely on the analysis of temporal or spatial characteristics of gait, and gait recognition is usually performed on single modality data (such as images, skeleton joint coordinates, or force signals). Evidence has shown that using multi-modality data is more conducive to gait research. Therefore, we here establish an automated learning system, with an associated spatio-temporal capsule network (ASTCapsNet) trained on multi-sensor datasets, to analyze multimodal information for gait recognition. Specifically, we first design a low-level feature extractor and a high-level feature extractor for spatio-temporal feature extraction of gait with a novel recurrent memory unit and a relationship layer. Subsequently, a Bayesian model is employed for the decision-making of class labels. Extensive experiments on several public datasets (normal and abnormal gait) validate the effectiveness of the proposed ASTCapsNet, compared against several state-of-the-art methods.


DICE: Deep Significance Clustering for Outcome-Aware Stratification

arXiv.org Artificial Intelligence

We present deep significance clustering (DICE), a framework for jointly performing representation learning and clustering for "outcome-aware" stratification. DICE is intended to generate cluster membership that may be used to categorize a population by individual risk level for a targeted outcome. Following the representation learning and clustering steps, we embed the objective function in DICE with a constraint which requires a statistically significant association between the outcome and cluster membership of learned representations. DICE further includes a neural architecture search step to maximize both the likelihood of representation learning and outcome classification accuracy with cluster membership as the predictor. To demonstrate its utility in medicine for patient risk-stratification, the performance of DICE was evaluated using two datasets with different outcome ratios extracted from real-world electronic health records. Outcomes are defined as acute kidney injury (30.4\%) among a cohort of COVID-19 patients, and discharge disposition (36.8\%) among a cohort of heart failure patients, respectively. Extensive results demonstrate that DICE has superior performance as measured by the difference in outcome distribution across clusters, Silhouette score, Calinski-Harabasz index, and Davies-Bouldin index for clustering, and Area under the ROC Curve (AUC) for outcome classification compared to several baseline approaches.


Analyzing movies to predict their commercial viability for producers

arXiv.org Artificial Intelligence

Upon film premiere, a major form of speculation concerns the relative success of the film. This relativity is in particular regards to the film's original budget, as many a time have big-budget blockbusters been met with exceptional success as met with abject failure. So how does one predict the success of an upcoming film? In this paper, we explored a vast array of film data in an attempt to develop a model that could predict the expected return of an upcoming film. The approach to this development is as follows: First, we began with the MovieLens dataset having common movie attributes along with genome tags per each film. Genome tags give insight into what particular characteristics of the film are most salient. We then included additional features regarding film content, cast/crew, audience perception, budget, and earnings from TMDB, IMDB, and Metacritic websites. Next, we performed exploratory data analysis and engineered a wide range of new features capturing historical information for the available features. Thereafter, we used singular value decomposition (SVD) for dimensionality reduction of the high dimensional features (ex. genome tags). Finally, we built a Random Forest Classifier and performed hyper-parameter tuning to optimize for model accuracy. A future application of our model could be seen in the film industry, allowing production companies to better predict the expected return of their projects based on their envisioned outline for their production procedure, thereby allowing them to revise their plan in an attempt to achieve optimal returns.


Theory-based Habit Modeling for Enhancing Behavior Prediction

arXiv.org Artificial Intelligence

Psychological theories of habit posit that when a strong habit is formed through behavioral repetition, it can trigger behavior automatically in the same environment. Given the reciprocal relationship between habit and behavior, changing lifestyle behaviors (e.g., toothbrushing) is largely a task of breaking old habits and creating new and healthy ones. Thus, representing users' habit strengths can be very useful for behavior change support systems (BCSS), for example, to predict behavior or to decide when an intervention reaches its intended effect. However, habit strength is not directly observable and existing self-report measures are taxing for users. In this paper, built on recent computational models of habit formation, we propose a method to enable intelligent systems to compute habit strength based on observable behavior. The hypothesized advantage of using computed habit strength for behavior prediction was tested using data from two intervention studies, where we trained participants to brush their teeth twice a day for three weeks and monitored their behaviors using accelerometers. Through hierarchical cross-validation, we found that for the task of predicting future brushing behavior, computed habit strength clearly outperformed self-reported habit strength (in both studies) and was also superior to models based on past behavior frequency (in the larger second study). Our findings provide initial support for our theory-based approach of modeling user habits and encourages the use of habit computation to deliver personalized and adaptive interventions.


Explainable AI for Robot Failures: Generating Explanations that Improve User Assistance in Fault Recovery

arXiv.org Artificial Intelligence

With the growing capabilities of intelligent systems, the integration of robots in our everyday life is increasing. However, when interacting in such complex human environments, the occasional failure of robotic systems is inevitable. The field of explainable AI has sought to make complex-decision making systems more interpretable but most existing techniques target domain experts. On the contrary, in many failure cases, robots will require recovery assistance from non-expert users. In this work, we introduce a new type of explanation, that explains the cause of an unexpected failure during an agent's plan execution to non-experts. In order for error explanations to be meaningful, we investigate what types of information within a set of hand-scripted explanations are most helpful to non-experts for failure and solution identification. Additionally, we investigate how such explanations can be autonomously generated, extending an existing encoder-decoder model, and generalized across environments. We investigate such questions in the context of a robot performing a pick-and-place manipulation task in the home environment. Our results show that explanations capturing the context of a failure and history of past actions, are the most effective for failure and solution identification among non-experts. Furthermore, through a second user evaluation, we verify that our model-generated explanations can generalize to an unseen office environment, and are just as effective as the hand-scripted explanations.


Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data

arXiv.org Machine Learning

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.