Goto

Collaborating Authors

 current practice


Perspective Chapter: MOOCs in India: Evolution, Innovation, Impact, and Roadmap

arXiv.org Artificial Intelligence

With the largest population of the world and one of the highest enrolments in higher education, India needs efficient and effective means to educate its learners. India started focusing on open and digital education in 1980's and its efforts were escalated in 2009 through the NMEICT program of the Government of India. A study by the Government and FICCI in 2014 noted that India cannot meet its educational needs just by capacity building in brick and mortar institutions. It was decided that ongoing MOOCs projects under the umbrella of NMEICT will be further strengthened over its second (2017-21) and third (2021-26) phases. NMEICT now steers NPTEL or SWAYAM (India's MOOCs) and several digital learning projects including Virtual Labs, e-Yantra, Spoken Tutorial, FOSSEE, and National Digital Library on India - the largest digital education library in the world. Further, India embraced its new National Education Policy in 2020 to strongly foster online education. In this chapter, we take a deep look into the evolution of MOOCs in India, its innovations, its current status and impact, and the roadmap for the next decade to address its challenges and grow. AI-powered MOOCs is an emerging opportunity for India to lead MOOCs worldwide.


Investigating Public Fine-Tuning Datasets: A Complex Review of Current Practices from a Construction Perspective

arXiv.org Artificial Intelligence

With the rapid development of the large model domain, research related to fine-tuning has concurrently seen significant advancement, given that fine-tuning is a constituent part of the training process for large-scale models. Data engineering plays a fundamental role in the training process of models, which includes data infrastructure, data processing, etc. Data during fine-tuning likewise forms the base for large models. In order to embrace the power and explore new possibilities of fine-tuning datasets, this paper reviews current public fine-tuning datasets from the perspective of data construction. An overview of public fine-tuning datasets from two sides: evolution and taxonomy, is provided in this review, aiming to chart the development trajectory. Construction techniques and methods for public fine-tuning datasets of Large Language Models (LLMs), including data generation and data augmentation among others, are detailed. This elaboration follows the aforementioned taxonomy, specifically across demonstration, comparison, and generalist categories. Additionally, a category tree of data generation techniques has been abstracted in our review to assist researchers in gaining a deeper understanding of fine-tuning datasets from the construction dimension. Our review also summarizes the construction features in different data preparation phases of current practices in this field, aiming to provide a comprehensive overview and inform future research. Fine-tuning dataset practices, encompassing various data modalities, are also discussed from a construction perspective in our review. Towards the end of the article, we offer insights and considerations regarding the future construction and developments of fine-tuning datasets.


Supervised machine learning for microbiomics: bridging the gap between current and best practices

arXiv.org Artificial Intelligence

Machine learning (ML) is set to accelerate innovations in clinical microbiomics, such as in disease diagnostics and prognostics. This will require high-quality, reproducible, interpretable workflows whose predictive capabilities meet or exceed the high thresholds set for clinical tools by regulatory agencies. Here, we capture a snapshot of current practices in the application of supervised ML to microbiomics data, through an in-depth analysis of 100 peer-reviewed journal articles published in 2021-2022. We apply a data-driven approach to steer discussion of the merits of varied approaches to experimental design, including key considerations such as how to mitigate the effects of small dataset size while avoiding data leakage. We further provide guidance on how to avoid common experimental design pitfalls that can hurt model performance, trustworthiness, and reproducibility. Discussion is accompanied by an interactive online tutorial that demonstrates foundational principles of ML experimental design, tailored to the microbiomics community. Formalizing community best practices for supervised ML in microbiomics is an important step towards improving the success and efficiency of clinical research, to the benefit of patients and other stakeholders.


Too Good To Be True: performance overestimation in (re)current practices for Human Activity Recognition

arXiv.org Artificial Intelligence

Today, there are standard and well established procedures within the Human Activity Recognition (HAR) pipeline. However, some of these conventional approaches lead to accuracy overestimation. In particular, sliding windows for data segmentation followed by standard random k-fold cross validation, produce biased results. An analysis of previous literature and present-day studies, surprisingly, shows that these are common approaches in state-of-the-art studies on HAR. It is important to raise awareness in the scientific community about this problem, whose negative effects are being overlooked. Otherwise, publications of biased results lead to papers that report lower accuracies, with correct unbiased methods, harder to publish. Several experiments with different types of datasets and different types of classification models allow us to exhibit the problem and show it persists independently of the method or dataset.


Council Post: AI And The Disruption Of Healthcare

#artificialintelligence

Jacob Kupietzky is President of HealthCare Transformation, a company dedicated to providing hospitals with experienced interim executives. Not long ago, if you wanted to explore the intersection of healthcare and artificial intelligence (AI), you'd be confined to the pages of science fiction. Not anymore: In recent years, AI has evolved from what's possible to what's practical, and consumers and practitioners alike have been increasingly drawn to the possibilities of how AI can revolutionize healthcare. While the promise of AI in this field is just starting to be realized, we are already seeing it have a real impact on patients' lives right now. Here are three ways AI is disrupting the practice of healthcare today.


SQAPlanner: Generating Data-Informed Software Quality Improvement Plans

arXiv.org Artificial Intelligence

Software Quality Assurance (SQA) planning aims to define proactive plans, such as defining maximum file size, to prevent the occurrence of software defects in future releases. To aid this, defect prediction models have been proposed to generate insights as the most important factors that are associated with software quality. Such insights that are derived from traditional defect models are far from actionable-i.e., practitioners still do not know what they should do or avoid to decrease the risk of having defects, and what is the risk threshold for each metric. A lack of actionable guidance and risk threshold can lead to inefficient and ineffective SQA planning processes. In this paper, we investigate the practitioners' perceptions of current SQA planning activities, current challenges of such SQA planning activities, and propose four types of guidance to support SQA planning. We then propose and evaluate our AI-Driven SQAPlanner approach, a novel approach for generating four types of guidance and their associated risk thresholds in the form of rule-based explanations for the predictions of defect prediction models. Finally, we develop and evaluate an information visualization for our SQAPlanner approach. Through the use of qualitative survey and empirical evaluation, our results lead us to conclude that SQAPlanner is needed, effective, stable, and practically applicable. We also find that 80% of our survey respondents perceived that our visualization is more actionable. Thus, our SQAPlanner paves a way for novel research in actionable software analytics-i.e., generating actionable guidance on what should practitioners do and not do to decrease the risk of having defects to support SQA planning.


Dynamic Redeployment to Counter Congestion or Starvation in Vehicle Sharing Systems

AAAI Conferences

Vehicle sharing (ex: bike sharing, car sharing) systems, an attractive alternative of private transportation, are widely adopted in major cities around the world. In vehicle-sharing systems, base stations (ex: docking stations for bikes) are strategically placed throughout a city and each of the base stations contain a pre-determined number of vehicles at the beginning of each day. Due to the stochastic and individualistic movement of customers, there is typically either congestion (more than required) or starvation (fewer than required) of vehicles at certain base stations, which causes a significant loss in demand. We propose to dynamically redeploy idle vehicles using carriers so as to minimize lost demand or alternatively maximize revenue for the vehicle sharing company. To that end, we contribute an optimization formulation to jointly address the redeployment (of vehicles) and routing (of carriers) problems and provide two approaches that rely on decomposability and abstraction of problem domains to reduce the computation time significantly.


Risk Based Optimization for Improving Emergency Medical Systems

AAAI Conferences

In emergency medical systems, arriving at the incident locationa few seconds early can save a human life. Thus, this paper is motivated by the need to reduce the response time– time taken to arrive at the incident location after receivingthe emergency call — of Emergency Response Vehicles, ERVs(ex: ambulances, fire rescue vehicles) for as many requests as possible. We expect to achieve this primarily by positioning the ”right” number of ERVs at the ”right” places and at the ”right” times. Given the exponentially large action space(with respect to number of ERVs and their placement) and the stochasticity in location and timing of emergency incidents,this problem is computationally challenging. To that end, ourcontributions building on existing data-driven approaches are three fold:1. Based on real world evaluation metrics, we provide a riskbased optimization criterion to learn from past incident data. Instead of minimizing expected response time, we minimize the largest value of response time such that the risk of finding requests that have a higher value is bounded(ex: Only 10% of requests should have a response time greater than 8 minutes).2. We develop a mixed integer linear optimization formulation to learn and compute an allocation from a set of inputrequests while considering the risk criterion.3. To allow for ”live” reallocation of ambulances, we provide a decomposition method based on Lagrangian Relaxation to significantly reduce the run-time of the optimization formulation.Finally, we provide an exhaustive evaluation on real-world datasets from two asian cities that demonstrates the improvement provided by our approach over current practice and the best known approach from literature.


Dynamic Redeployment to Counter Congestion or Starvation in Vehicle Sharing Systems

AAAI Conferences

Vehicle-sharing (ex: bike sharing, car sharing) is widelyadopted in many cities of the world due to concernsassociated with extensive private vehicle usage, whichhas led to increased carbon emissions, traffic conges-tion and usage of non-renewable resources. In vehicle-sharing systems, base stations are strategically placedthroughout a city and each of the base stations containa pre-determined number of vehicles at the beginningof each day. Due to the stochastic and individualisticmovement of customers, typically, there is either con-gestion (more than required) or starvation (fewer thanrequired) of vehicles at certain base stations. As demon-strated in our experimental results, this happens oftenand can cause a significant loss in demand. We proposeto dynamically redeploy idle vehicles using carriers soas to minimize lost demand or alternatively maximizerevenue of the vehicle sharing company. To that end,we contribute an optimization formulation to jointly ad-dress the redeployment (of vehicles) and routing (of car-riers) problems and provide two approaches that rely ondecomposability and abstraction of problem domains toreduce the computation time significantly. Finally, wedemonstrate the utility of our approaches on two realworld data sets of bike-sharing companies.