Goto

Collaborating Authors

 ontario


AI for pRedicting Exacerbations in KIDs with aSthma (AIRE-KIDS)

Ooi, Hui-Lee, Mitsakakis, Nicholas, Dastarac, Margerie Huet, Zemek, Roger, Plint, Amy C., Gilchrist, Jeff, Emam, Khaled El, Radhakrishnan, Dhenuka

arXiv.org Artificial Intelligence

Recurrent exacerbations remain a common yet preventable outcome for many children with asthma. Machine learning (ML) algorithms using electronic medical records (EMR) could allow accurate identification of children at risk for exacerbations and facilitate referral for preventative comprehensive care to avoid this morbidity. We developed ML algorithms to predict repeat severe exacerbations (i.e. asthma-related emergency department (ED) visits or future hospital admissions) for children with a prior asthma ED visit at a tertiary care children's hospital. Retrospective pre-COVID19 (Feb 2017 - Feb 2019, N=2716) Epic EMR data from the Children's Hospital of Eastern Ontario (CHEO) linked with environmental pollutant exposure and neighbourhood marginalization information was used to train various ML models. We used boosted trees (LGBM, XGB) and 3 open-source large language model (LLM) approaches (DistilGPT2, Llama 3.2 1B and Llama-8b-UltraMedical). Models were tuned and calibrated then validated in a second retrospective post-COVID19 dataset (Jul 2022 - Apr 2023, N=1237) from CHEO. Models were compared using the area under the curve (AUC) and F1 scores, with SHAP values used to determine the most predictive features. The LGBM ML model performed best with the most predictive features in the final AIRE-KIDS_ED model including prior asthma ED visit, the Canadian triage acuity scale, medical complexity, food allergy, prior ED visits for non-asthma respiratory diagnoses, and age for an AUC of 0.712, and F1 score of 0.51. This is a nontrivial improvement over the current decision rule which has F1=0.334. While the most predictive features in the AIRE-KIDS_HOSP model included medical complexity, prior asthma ED visit, average wait time in the ED, the pediatric respiratory assessment measure score at triage and food allergy.


ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation

Lee, Zhicheng, Cao, Shulin, Liu, Jinxin, Zhang, Jiajie, Liu, Weichuan, Che, Xiaoyin, Hou, Lei, Li, Juanzi

arXiv.org Artificial Intelligence

Large Reasoning Models (LRMs) exhibit remarkable reasoning abilities but rely primarily on parametric knowledge, limiting factual accuracy. While recent works equip reinforcement learning (RL)-based LRMs with retrieval capabilities, they suffer from overthinking and lack robustness in reasoning, reducing their effectiveness in question answering (QA) tasks. To address this, we propose ReaRAG, a factuality-enhanced reasoning model that explores diverse queries without excessive iterations. Our solution includes a novel data construction framework with an upper bound on the reasoning chain length. Specifically, we first leverage an LRM to generate deliberate thinking, then select an action from a predefined action space (Search and Finish). For Search action, a query is executed against the RAG engine, where the result is returned as observation to guide reasoning steps later. This process iterates until a Finish action is chosen. Benefiting from ReaRAG's strong reasoning capabilities, our approach outperforms existing baselines on multi-hop QA. Further analysis highlights its strong reflective ability to recognize errors and refine its reasoning trajectory. Our study enhances LRMs' factuality while effectively integrating robust reasoning for Retrieval-Augmented Generation (RAG).


Lattice Protein Folding with Variational Annealing

Khandoker, Shoummo Ahsan, Inack, Estelle M., Hibat-Allah, Mohamed

arXiv.org Artificial Intelligence

Understanding the principles of protein folding is a cornerstone of computational biology, with implications for drug design, bioengineering, and the understanding of fundamental biological processes. Lattice protein folding models offer a simplified yet powerful framework for studying the complexities of protein folding, enabling the exploration of energetically optimal folds under constrained conditions. However, finding these optimal folds is a computationally challenging combinatorial optimization problem. In this work, we introduce a novel upper-bound training scheme that employs masking to identify the lowest-energy folds in two-dimensional Hydrophobic-Polar (HP) lattice protein folding. By leveraging Dilated Recurrent Neural Networks (RNNs) integrated with an annealing process driven by temperature-like fluctuations, our method accurately predicts optimal folds for benchmark systems of up to 60 beads. Our approach also effectively masks invalid folds from being sampled without compromising the autoregressive sampling properties of RNNs. This scheme is generalizable to three spatial dimensions and can be extended to lattice protein models with larger alphabets. Our findings emphasize the potential of advanced machine learning techniques in tackling complex protein folding problems and a broader class of constrained combinatorial optimization challenges.


The Morning After: Ontario cancels then un-cancels its Starlink contract over tariff trade war

Engadget

After President Trump announced a 25 percent tariff on nearly all Canadian imported goods (and Canada announced its own 25 percent tariff on American imported goods), Doug Ford, the premier of Ontario -- and a former supporter of President Trump -- announced the Canadian territory would be "ripping up" a 100 million contract with Elon Musk's Starlink. The contract was signed in November last year. Musk, boss of Starlink and the richest man in the world, is a close confidant of Trump and has control over the so-called Department of Government Efficiency, or DOGE (urgh), tasked with cost-cutting and deregulation in government. Ford believed this was enough to link Musk (and his businesses) to Trump's tariffs. He said Ontario "won't do business with people hellbent on destroying our economy" and that Musk wants to "take food off the table" of hard-working Canadians.


Website visits can predict angler presence using machine learning

Schmid, Julia S., Simmons, Sean, Lewis, Mark A., Poesch, Mark S., Ramazi, Pouria

arXiv.org Artificial Intelligence

Understanding and predicting recreational fishing activity is important for sustainable fisheries management. However, traditional methods of measuring fishing pressure, such as surveys, can be costly and limited in both time and spatial extent. Predictive models that relate fishing activity to environmental or economic factors typically rely on historical data, which often restricts their spatial applicability due to data scarcity. In this study, high-resolution angler-generated data from an online platform and easily accessible auxiliary data were tested to predict daily boat presence and aerial counts of boats at almost 200 lakes over five years in Ontario, Canada. Lake-information website visits alone enabled predicting daily angler boat presence with 78% accuracy. While incorporating additional environmental, socio-ecological, weather and angler-generated features into machine learning models did not remarkably improve prediction performance of boat presence, they were substantial for the prediction of boat counts. Models achieved an R2 of up to 0.77 at known lakes included in the model training, but they performed poorly for unknown lakes (R2 = 0.21). The results demonstrate the value of integrating angler-generated data from online platforms into predictive models and highlight the potential of machine learning models to enhance fisheries management.


OtterROS: Picking and Programming an Uncrewed Surface Vessel for Experimental Field Robotics Research with ROS 2

Sears, Thomas M. C., Cooper, M. Riley, Button, Sabrina R., Marshall, Joshua A.

arXiv.org Artificial Intelligence

There exist a wide range of options for field robotics research using ground and aerial mobile robots, but there are comparatively few robust and research-ready uncrewed surface vessels (USVs). This workshop paper starts with a snapshot of USVs currently available to the research community and then describes "OtterROS", an open source ROS 2 solution for the Otter USV. Field experiments using OtterROS are described, which highlight the utility of the Otter USV and the benefits of using ROS 2 in aquatic robotics research. For those interested in USV research, the paper details recommended hardware to run OtterROS and includes an example ROS 2 package using OtterROS, removing unnecessary non-recurring engineering from field robotics research activities.


Automatic Construction of a Large-Scale Corpus for Geoparsing Using Wikipedia Hyperlinks

Ohno, Keyaki, Kameko, Hirotaka, Shirai, Keisuke, Nishimura, Taichi, Mori, Shinsuke

arXiv.org Artificial Intelligence

Geoparsing is the task of estimating the latitude and longitude (coordinates) of location expressions in texts. Geoparsing must deal with the ambiguity of the expressions that indicate multiple locations with the same notation. For evaluating geoparsing systems, several corpora have been proposed in previous work. However, these corpora are small-scale and suffer from the coverage of location expressions on general domains. In this paper, we propose Wikipedia Hyperlink-based Location Linking (WHLL), a novel method to construct a large-scale corpus for geoparsing from Wikipedia articles. WHLL leverages hyperlinks in Wikipedia to annotate multiple location expressions with coordinates. With this method, we constructed the WHLL corpus, a new large-scale corpus for geoparsing. The WHLL corpus consists of 1.3M articles, each containing about 7.8 unique location expressions. 45.6% of location expressions are ambiguous and refer to more than one location with the same notation. In each article, location expressions of the article title and those hyperlinks to other articles are assigned with coordinates. By utilizing hyperlinks, we can accurately assign location expressions with coordinates even with ambiguous location expressions in the texts. Experimental results show that there remains room for improvement by disambiguating location expressions.


"This is not a data problem": Algorithms and Power in Public Higher Education in Canada

McConvey, Kelly, Guha, Shion

arXiv.org Artificial Intelligence

Algorithmic decision-making is increasingly being adopted across public higher education. The expansion of data-driven practices by post-secondary institutions has occurred in parallel with the adoption of New Public Management approaches by neoliberal administrations. In this study, we conduct a qualitative analysis of an in-depth ethnographic case study of data and algorithms in use at a public college in Ontario, Canada. We identify the data, algorithms, and outcomes in use at the college. We assess how the college's processes and relationships support those outcomes and the different stakeholders' perceptions of the college's data-driven systems. In addition, we find that the growing reliance on algorithmic decisions leads to increased student surveillance, exacerbation of existing inequities, and the automation of the faculty-student relationship. Finally, we identify a cycle of increased institutional power perpetuated by algorithmic decision-making, and driven by a push towards financial sustainability.


Pyclipse, a library for deidentification of free-text clinical notes

Moore, Callandra, Ranisau, Jonathan, Nelson, Walter, Petch, Jeremy, Johnson, Alistair

arXiv.org Artificial Intelligence

Automated deidentification of clinical text data is crucial due to the high cost of manual deidentification, which has been a barrier to sharing clinical text and the advancement of clinical natural language processing. However, creating effective automated deidentification tools faces several challenges, including issues in reproducibility due to differences in text processing, evaluation methods, and a lack of consistency across clinical domains and institutions. To address these challenges, we propose the pyclipse framework, a unified and configurable evaluation procedure to streamline the comparison of deidentification algorithms. Pyclipse serves as a single interface for running open-source deidentification algorithms on local clinical data, allowing for context-specific evaluation. To demonstrate the utility of pyclipse, we compare six deidentification algorithms across four public and two private clinical text datasets. We find that algorithm performance consistently falls short of the results reported in the original papers, even when evaluated on the same benchmark dataset. These discrepancies highlight the complexity of accurately assessing and comparing deidentification algorithms, emphasizing the need for a reproducible, adjustable, and extensible framework like pyclipse. Our framework lays the foundation for a unified approach to evaluate and improve deidentification tools, ultimately enhancing patient protection in clinical natural language processing.


Causal Models Applied to the Patterns of Human Migration due to Climate Change

Lai, Kenneth, Yanushkevich, Svetlana

arXiv.org Artificial Intelligence

The impacts of mass migration, such as crisis induced by climate change, extend beyond environmental concerns and can greatly affect social infrastructure and public services, such as education, healthcare, and security. These crises exacerbate certain elements like cultural barriers, and discrimination by amplifying the challenges faced by these affected communities. This paper proposes an innovative approach to address migration crises in the context of crisis management through a combination of modeling and imbalance assessment tools. By employing deep learning for forecasting and integrating causal reasoning via Bayesian networks, this methodology enables the evaluation of imbalances and risks in the socio-technological landscape, providing crucial insights for informed decision-making. Through this framework, critical systems can be analyzed to understand how fluctuations in migration levels may impact them, facilitating effective crisis governance strategies.