Goto

Collaborating Authors

 train model





We thank all reviewers for their positive reception of our paper and for their constructive feedback

Neural Information Processing Systems

We thank all reviewers for their positive reception of our paper and for their constructive feedback. On dual norms and prior work. Thank you for pointing us to the relevant prior work of Demontis et al. and Xu et al. which we apparently missed. We will discuss these connections between our work and the prior work of Demontis et al. and Xu et al. in the Nevertheless, as MNIST is the only vision dataset for which we've been able to train models to high levels of MNIST is clearly not solved from an adversarial robustness perspective. We think this is an interesting open problem for the community to consider.


Are nuclear masks all you need for improved out-of-domain generalisation? A closer look at cancer classification in histopathology

Tomar, Dhananjay, Binder, Alexander, Kleppe, Andreas

arXiv.org Artificial Intelligence

Domain generalisation in computational histopathology is challenging because the images are substantially affected by differences among hospitals due to factors like fixation and staining of tissue and imaging equipment. We hypothesise that focusing on nuclei can improve the out-of-domain (OOD) generalisation in cancer detection. We propose a simple approach to improve OOD generalisation for cancer detection by focusing on nuclear morphology and organisation, as these are domain-invariant features critical in cancer detection. Our approach integrates original images with nuclear segmentation masks during training, encouraging the model to prioritise nuclei and their spatial arrangement. Going beyond mere data augmentation, we introduce a regularisation technique that aligns the representations of masks and original images. We show, using multiple datasets, that our method improves OOD generalisation and also leads to increased robustness to image corruptions and adversarial attacks. The source code is available at https://github.com/undercutspiky/SFL/


Investigating Semi-Supervised Learning Algorithms in Text Datasets

Kesgin, Himmet Toprak, Amasyali, Mehmet Fatih

arXiv.org Artificial Intelligence

Using large training datasets enhances the generalization capabilities of neural networks. Semi-supervised learning (SSL) is useful when there are few labeled data and a lot of unlabeled data. SSL methods that use data augmentation are most successful for image datasets. In contrast, texts do not have consistent augmentation methods as images. Consequently, methods that use augmentation are not as effective in text data as they are in image data. In this study, we compared SSL algorithms that do not require augmentation; these are self-training, co-training, tri-training, and tri-training with disagreement. In the experiments, we used 4 different text datasets for different tasks. We examined the algorithms from a variety of perspectives by asking experiment questions and suggested several improvements. Among the algorithms, tri-training with disagreement showed the closest performance to the Oracle; however, performance gap shows that new semi-supervised algorithms or improvements in existing methods are needed.


The Guardian blocks ChatGPT owner OpenAI from trawling its content

The Guardian

The Guardian has blocked OpenAI from using its content to power artificial intelligence products such as ChatGPT. Concerns that OpenAI is using unlicensed content to create its AI tools have led to writers bringing lawsuits against the company and creative industries calling for safeguards to protect their intellectual property. The Guardian has confirmed that it has prevented OpenAI from deploying software that harvests its content. Generative AI technology – the term for products that generate convincing text, image and audio from simple human prompts – has dazzled the public since a breakthrough version of its ChatGPT chatbot launched last year. However, fears have arisen about the potential mass-production of disinformation and the way in which such tools are built.


Mini-Batch Learning Strategies for modeling long term temporal dependencies: A study in environmental applications

Xu, Shaoming, Khandelwal, Ankush, Li, Xiang, Jia, Xiaowei, Liu, Licheng, Willard, Jared, Ghosh, Rahul, Cutler, Kelly, Steinbach, Michael, Duffy, Christopher, Nieber, John, Kumar, Vipin

arXiv.org Artificial Intelligence

In many environmental applications, recurrent neural networks (RNNs) are often used to model physical variables with long temporal dependencies. However, due to mini-batch training, temporal relationships between training segments within the batch (intra-batch) as well as between batches (inter-batch) are not considered, which can lead to limited performance. Stateful RNNs aim to address this issue by passing hidden states between batches. Since Stateful RNNs ignore intra-batch temporal dependency, there exists a trade-off between training stability and capturing temporal dependency. In this paper, we provide a quantitative comparison of different Stateful RNN modeling strategies, and propose two strategies to enforce both intra- and inter-batch temporal dependency. First, we extend Stateful RNNs by defining a batch as a temporally ordered set of training segments, which enables intra-batch sharing of temporal information. While this approach significantly improves the performance, it leads to much larger training times due to highly sequential training. To address this issue, we further propose a new strategy which augments a training segment with an initial value of the target variable from the timestep right before the starting of the training segment. In other words, we provide an initial value of the target variable as additional input so that the network can focus on learning changes relative to that initial value. By using this strategy, samples can be passed in any order (mini-batch training) which significantly reduces the training time while maintaining the performance. In demonstrating our approach in hydrological modeling, we observe that the most significant gains in predictive accuracy occur when these methods are applied to state variables whose values change more slowly, such as soil water and snowpack, rather than continuously moving flux variables such as streamflow.


Understanding the Keras Library

#artificialintelligence

Keras is a high-level neural networks API that provides an easy-to-use interface for building and training deep learning models. It is built on top of other popular deep learning frameworks, such as TensorFlow, Theano, and Microsoft Cognitive Toolkit (CNTK), and provides a simpler and more user-friendly interface for building and training models. The main problem domain that Keras solves is the complexity and verbosity of building and training deep learning models using low-level frameworks such as TensorFlow and Theano. Keras provides a consistent and simple API that makes it easy to build and train models, and it abstracts the low-level details of the underlying framework. This makes it a great choice for beginners who want to get started with deep learning quickly and easily.


We could run out of data to train AI language programs

#artificialintelligence

The trouble is, the types of data typically used for training language models may be used up in the near future--as early as 2026, according to a paper by researchers from Epoch, an AI research and forecasting organization, that is yet to be peer reviewed. The issue stems from the fact that, as researchers build more powerful models with greater capabilities, they have to find ever more texts to train them on. Large language model researchers are increasingly concerned that they are going to run out of this sort of data, says Teven Le Scao, a researcher at AI company Hugging Face, who was not involved in Epoch's work. The issue stems partly from the fact that language AI researchers filter the data they use to train models into two categories: high quality and low quality. The line between the two categories can be fuzzy, says Pablo Villalobos, a staff researcher at Epoch and the lead author of the paper, but text from the former is viewed as better-written and is often produced by professional writers.