Goto

Collaborating Authors

 Oceania


Text-to-Image Generation for Vocabulary Learning Using the Keyword Method

arXiv.org Artificial Intelligence

The 'keyword method' is an effective technique for learning vocabulary of a foreign language. It involves creating a memorable visual link between what a word means and what its pronunciation in a foreign language sounds like in the learner's native language. However, these memorable visual links remain implicit in the people's mind and are not easy to remember for a large set of words. To enhance the memorisation and recall of the vocabulary, we developed an application that combines the keyword method with text-to-image generators to externalise the memorable visual links into visuals. These visuals represent additional stimuli during the memorisation process. To explore the effectiveness of this approach we first run a pilot study to investigate how difficult it is to externalise the descriptions of mental visualisations of memorable links, by asking participants to write them down. We used these descriptions as prompts for text-to-image generator (DALL-E2) to convert them into images and asked participants to select their favourites. Next, we compared different text-to-image generators (DALL-E2, Midjourney, Stable and Latent Diffusion) to evaluate the perceived quality of the generated images by each. Despite heterogeneous results, participants mostly preferred images generated by DALL-E2, which was used also for the final study. In this study, we investigated whether providing such images enhances the retention of vocabulary being learned, compared to the keyword method only. Our results indicate that people did not encounter difficulties describing their visualisations of memorable links and that providing corresponding images significantly improves memory retention.


Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding

arXiv.org Artificial Intelligence

Applying Multimodal Large Language Models (MLLMs) to video understanding presents significant challenges due to the need to model temporal relations across frames. Existing approaches adopt either implicit temporal modeling, relying solely on the LLM decoder, or explicit temporal modeling, employing auxiliary temporal encoders. To investigate this debate between the two paradigms, we propose the Stackable Temporal Encoder (STE). STE enables flexible explicit temporal modeling with adjustable temporal receptive fields and token compression ratios. Using STE, we systematically compare implicit and explicit temporal modeling across dimensions such as overall performance, token compression effectiveness, and temporal-specific understanding. We also explore STE's design considerations and broader impacts as a plug-in module and in image modalities. Our findings emphasize the critical role of explicit temporal modeling, providing actionable insights to advance video MLLMs.


Online-BLS: An Accurate and Efficient Online Broad Learning System for Data Stream Classification

arXiv.org Artificial Intelligence

The state-of-the-art online learning models generally conduct a single online gradient descent when a new sample arrives and thus suffer from suboptimal model weights. To this end, we introduce an online broad learning system framework with closed-form solutions for each online update. Different from employing existing incremental broad learning algorithms for online learning tasks, which tend to incur degraded accuracy and expensive online update overhead, we design an effective weight estimation algorithm and an efficient online updating strategy to remedy the above two deficiencies, respectively. Specifically, an effective weight estimation algorithm is first developed by replacing notorious matrix inverse operations with Cholesky decomposition and forward-backward substitution to improve model accuracy. Second, we devise an efficient online updating strategy that dramatically reduces online update time. Theoretical analysis exhibits the splendid error bound and low time complexity of our model. The most popular test-then-training evaluation experiments on various real-world datasets prove its superiority and efficiency. Furthermore, our framework is naturally extended to data stream scenarios with concept drift and exceeds state-of-the-art baselines.


It pays to be pretty! Attractive people earn up to 11% MORE than their ugly colleagues, study finds

Daily Mail - Science & tech

Whether it's taking on more responsibilities or staying late in the office, many employees will go above and beyond to try to get a pay rise. But now a study suggests that if you're not good looking, your efforts may be futile. Researchers from the Institute for Operations Research and the Management Sciences in Baltimore have uncovered a'striking' link between physical attractiveness and career success. In their study, the team analysed the careers of more than 40,000 graduates who had completed MBAs. They found attractive respondents earned up to 11 per cent more than their colleagues who were seen as less good looking.


Why We're in Love with Apocalypse

The New Yorker

It's a mite soon to start grieving, but scientists now project that life on Earth will probably end in about a billion years. A Monday in February, 1,000,002,025, would be my guess. On that inhospitable day, give or take a few million years, the sun will become so hot that the oceans will boil, Earth's oxygen will disappear, and photosynthesis will cease, as will all living things. We should be so lucky. There's a pretty fair chance that life could be wiped out well before then--say, in early June, 2034, or on a cloudy Sunday in November, 3633. Plenty of people do, as it turns out, and, if you want to know who they are, Dorian Lynskey's "Everything Must Go: The Stories We Tell About the End of the World" (Pantheon) is a good place to start. Lynskey, a British journalist and podcaster, has assembled biological, geological, archeological, literary, and cinematic permutations of existential finales, leaving no stone unturned, be it meteor, comet, or asteroid. If a book, a song, a story, a film, a headline, a title, or a study has "world" and "end" in it, Lynskey has unearthed it.


STAR: Stepwise Task Augmentation and Relation Learning for Aspect Sentiment Quad Prediction

arXiv.org Artificial Intelligence

Aspect-based sentiment analysis (ABSA) aims to identify four sentiment elements, including aspect term, aspect category, opinion term, and sentiment polarity. These elements construct the complete picture of sentiments. The most challenging task, aspect sentiment quad prediction (ASQP), predicts these elements simultaneously, hindered by difficulties in accurately coupling different sentiment elements. A key challenge is insufficient annotated data that limits the capability of models in semantic understanding and reasoning about quad prediction. To address this, we propose stepwise task augmentation and relation learning (STAR), a strategy inspired by human reasoning. STAR constructs auxiliary data to learn quadruple relationships incrementally by augmenting with pairwise and overall relation tasks derived from training data. By encouraging the model to infer causal relationships among sentiment elements without requiring additional annotations, STAR effectively enhances quad prediction. Extensive experiments demonstrate the proposed STAR exhibits superior performance on four benchmark datasets.


Regulatory Science Innovation for Generative AI and Large Language Models in Health and Medicine: A Global Call for Action

arXiv.org Artificial Intelligence

The integration of generative AI (GenAI) and large language models (LLMs) in healthcare presents both unprecedented opportunities and challenges, necessitating innovative regulatory approaches. GenAI and LLMs offer broad applications, from automating clinical workflows to personalizing diagnostics. However, the non-deterministic outputs, broad functionalities and complex integration of GenAI and LLMs challenge existing medical device regulatory frameworks, including the total product life cycle (TPLC) approach. Here we discuss the constraints of the TPLC approach to GenAI and LLM-based medical device regulation, and advocate for global collaboration in regulatory science research. This serves as the foundation for developing innovative approaches including adaptive policies and regulatory sandboxes, to test and refine governance in real-world settings. International harmonization, as seen with the International Medical Device Regulators Forum, is essential to manage implications of LLM on global health, including risks of widening health inequities driven by inherent model biases. By engaging multidisciplinary expertise, prioritizing iterative, data-driven approaches, and focusing on the needs of diverse populations, global regulatory science research enables the responsible and equitable advancement of LLM innovations in healthcare.


Indiana Jones: There Are Always Some Useful Ancient Relics

arXiv.org Artificial Intelligence

This paper introduces Indiana Jones, an innovative approach to jailbreaking Large Language Models (LLMs) by leveraging inter-model dialogues and keyword-driven prompts. Through orchestrating interactions among three specialised LLMs, the method achieves near-perfect success rates in bypassing content safeguards in both white-box and black-box LLMs. The research exposes systemic vulnerabilities within contemporary models, particularly their susceptibility to producing harmful or unethical outputs when guided by ostensibly innocuous prompts framed in historical or contextual contexts. Experimental evaluations highlight the efficacy and adaptability of Indiana Jones, demonstrating its superiority over existing jailbreak methods. These findings emphasise the urgent need for enhanced ethical safeguards and robust security measures in the development of LLMs. Moreover, this work provides a critical foundation for future studies aimed at fortifying LLMs against adversarial exploitation while preserving their utility and flexibility.


Copyright and Competition: Estimating Supply and Demand with Unstructured Data

arXiv.org Machine Learning

Copyright policies play a pivotal role in protecting the intellectual property of creators and companies in creative industries. The advent of cost-reducing technologies, such as generative AI, in these industries calls for renewed attention to the role of these policies. This paper studies product positioning and competition in a market of creatively differentiated products and the competitive and welfare effects of copyright protection. A common feature of products with creative elements is that their key attributes (e.g., images and text) are unstructured and thus high-dimensional. We focus on a stylized design product, fonts, and use data from the world's largest online marketplace for fonts. We use neural network embeddings to quantify unstructured attributes and measure the visual similarity. We show that this measure closely aligns with actual human perception. Based on this measure, we empirically find that competitions occur locally in the visual characteristics space. We then develop a structural model for supply and demand that integrate the embeddings. Through counterfactual analyses, we find that local copyright protection can enhance consumer welfare when products are relocated, and the interplay between copyright and cost-reducing technologies is essential in determining an optimal policy for social welfare. We believe that the embedding analysis and empirical models introduced in this paper can be applicable to a range of industries where unstructured data captures essential features of products and markets.


Determining Mosaic Resilience in Sugarcane Plants using Hyperspectral Images

arXiv.org Artificial Intelligence

Sugarcane mosaic disease poses a serious threat to the Australian sugarcane industry, leading to yield losses of up to 30% in susceptible varieties. Existing manual inspection methods for detecting mosaic resilience are inefficient and impractical for large-scale application. This study introduces a novel approach using hyperspectral imaging and machine learning to detect mosaic resilience by leveraging global feature representation from local spectral patches. Hyperspectral data were collected from eight sugarcane varieties under controlled and field conditions. Local spectral patches were analyzed to capture spatial and spectral variations, which were then aggregated into global feature representations using a ResNet18 deep learning architecture. While classical methods like Support Vector Machines struggled to utilize spatial-spectral relationships effectively, the deep learning model achieved high classification accuracy, demonstrating its capacity to identify mosaic resilience from fine-grained hyperspectral data. This approach enhances early detection capabilities, enabling more efficient management of susceptible strains and contributing to sustainable sugarcane production.