AITopics

2410.23123

Country: North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.92)

Industry:

Materials > Chemicals > Industrial Gases > Liquified Gas (0.45)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.45)
Energy > Oil & Gas > Midstream (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

arXiv.org Artificial IntelligenceOct-27-2024

Historical Test-time Prompt Tuning for Vision Foundation Models

Zhang, Jingyi, Huang, Jiaxing, Zhang, Xiaoqin, Shao, Ling, Lu, Shijian

Test-time prompt tuning, which learns prompts online with unlabelled test samples during the inference stage, has demonstrated great potential by learning effective prompts on-the-fly without requiring any task-specific annotations. However, its performance often degrades clearly along the tuning process when the prompts are continuously updated with the test data flow, and the degradation becomes more severe when the domain of test samples changes continuously. We propose HisTPT, a Historical Test-time Prompt Tuning technique that memorizes the useful knowledge of the learnt test samples and enables robust test-time prompt tuning with the memorized knowledge. HisTPT introduces three types of knowledge banks, namely, local knowledge bank, hard-sample knowledge bank, and global knowledge bank, each of which works with different mechanisms for effective knowledge memorization and test-time prompt optimization. In addition, HisTPT features an adaptive knowledge retrieval mechanism that regularizes the prediction of each test sample by adaptively retrieving the memorized knowledge. Extensive experiments show that HisTPT achieves superior prompt tuning performance consistently while handling different visual recognition tasks (e.g., image classification, semantic segmentation, and object detection) and test samples from continuously changing domains.

artificial intelligence, machine learning, natural language, (16 more...)

2410.20346

Country:

Asia > China (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
Asia > Singapore (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.36)

arXiv.org Artificial IntelligenceOct-25-2024

Measuring memorization through probabilistic discoverable extraction

Hayes, Jamie, Swanberg, Marika, Chaudhari, Harsh, Yona, Itay, Shumailov, Ilia

Large language models (LLMs) are susceptible to memorizing training data, raising concerns due to the potential extraction of sensitive information. Current methods to measure memorization rates of LLMs, primarily discoverable extraction (Carlini et al., 2022), rely on single-sequence greedy sampling, potentially underestimating the true extent of memorization. This paper introduces a probabilistic relaxation of discoverable extraction that quantifies the probability of extracting a target sequence within a set of generated samples, considering various sampling schemes and multiple attempts. This approach addresses the limitations of reporting memorization rates through discoverable extraction by accounting for the probabilistic nature of LLMs and user interaction patterns. Our experiments demonstrate that this probabilistic measure can reveal cases of higher memorization rates compared to rates found through discoverable extraction. We further investigate the impact of different sampling schemes on extractability, providing a more comprehensive and realistic assessment of LLM memorization and its associated risks. Our contributions include a new probabilistic memorization definition, empirical evidence of its effectiveness, and a thorough evaluation across different models, sizes, sampling schemes, and training data repetitions.

large language model, machine learning, natural language, (15 more...)

2410.19482

Country: South America > Brazil > Paraná > Curitiba (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Banatt, Eryk, Cheng, Jonathan, Vaidyanath, Skanda, Hwu, Tiffany

WILT: A Multi-Turn, Memorization-Robust Inductive Logic Benchmark for LLMs

arXiv.org Artificial IntelligenceOct-14-2024

While large language models (LLMs) have shown impressive capabilities across a wide range of domains, they still encounter significant challenges in reasoning tasks that require gathering evidence over multiple turns and drawing logical conclusions from this evidence. These challenges present significant obstacles for LLM chat user interfaces, which rely on multi-turn interactions to facilitate effective collaboration. This limitation leads to real-world issues; for example, service chatbots must gather necessary information from customers over multiple turns to diagnose and resolve problems effectively. Despite the multi-turn nature of many real-world LLM use cases, most existing benchmarks rely on carefully curated single-turn tests, which often blur the line between memorization and genuine reasoning. To address this, we introduce the Wason Inductive Logic Test (WILT), a simple yet challenging multi-turn reasoning benchmark designed to resist memorization. WILT is inspired by the Wason 2-4-6 task (Wason, 1960), where participants must infer a basic boolean function involving three variables (e.g., x < y < z) by proposing test cases (such as (2, 4, 6)). In WILT, each test starts from a clean slate, with only the initial instructions provided, preventing models from relying on pre-learned responses. Over several turns, models must interact with the environment by suggesting test cases to narrow the possible hypotheses and ultimately infer the hidden function based on the outcomes. Our findings reveal that LLMs struggle with this task, exhibiting distinct strengths and weaknesses: some are better at narrowing down the hypothesis space by proposing valuable test cases, while others are more adept at deducing the hidden function from observed cases. Despite these variations, the best-performing model achieves only 28% accuracy, highlighting a significant gap in LLM performance on complex multi-turn reasoning tasks. Large language models (LLMs) powered by the transformer architecture (Vaswani, 2017) have enabled a new computing paradigm driven by natural language. These models are increasingly integrated into day-to-day life beyond the machine learning research space, where they help many people with common tasks. These models interact with users through multi-turn conversations, a capability of next-token-prediction models bolstered via instruction-tuning (Mishra et al., 2021) and alignment post-training phases (Ouyang et al., 2022).

large language model, machine learning, natural language, (17 more...)

2410.10998

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.80)

Neural Information Processing SystemsOct-11-2024, 15:45:52 GMT

Measures of Information Reflect Memorization Patterns

Neural networks are known to exploit spurious artifacts (or shortcuts) that co-occur with a target label, exhibiting heuristic memorization. On the other hand, networks have been shown to memorize training examples, resulting in example-level memorization. These kinds of memorization impede generalization of networks beyond their training distributions. Detecting such memorization could be challenging, often requiring researchers to curate tailored test sets. In this work, we hypothesize--and subsequently show--that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization.

information reflect memorization pattern, memorization, neural activation, (1 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Neural Information Processing SystemsOct-11-2024, 02:59:14 GMT

The Privacy Onion Effect: Memorization is Relative

Machine learning models trained on private datasets have been shown to leak their private data. Recent work has found that the average data point is rarely leaked---it is often the outlier samples that are subject to memorization and, consequently, leakage. We demonstrate and analyze an Onion Effect of memorization: removing the "layer" of outlier points that are most vulnerable to a privacy attack exposes a new layer of previously-safe points to the same attack. We perform several experiments that are consistent with this hypothesis. For example, we show that for membership inference attacks, when the layer of easiest-to-attack examples is removed, another layer below becomes easy-to-attack.

memorization, privacy onion effect

Industry: Information Technology > Security & Privacy (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.96)

Neural Information Processing SystemsOct-11-2024, 02:10:34 GMT

Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity

We study finite sample expressivity, i.e., memorization power of ReLU networks. Recent results require N hidden nodes to memorize/interpolate arbitrary N data points. In contrast, by exploiting depth, we show that 3-layer ReLU networks with \Omega(\sqrt{N}) hidden nodes can perfectly memorize most datasets with N points. We also prove that width \Theta(\sqrt{N}) is necessary and sufficient for memorizing N data points, proving tight bounds on memorization capacity. The sufficiency result can be extended to deeper networks; we show that an L -layer network with W parameters in the hidden layers can memorize N data points if W \Omega(N) .

memorization capacity, relu network, small relu network, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.94)

arXiv.org Machine LearningOct-11-2024

Losing dimensions: Geometric memorization in generative diffusion

Achilli, Beatrice, Ventura, Enrico, Silvestri, Gianluigi, Pham, Bao, Raya, Gabriel, Krotov, Dmitry, Lucibello, Carlo, Ambrogioni, Luca

Generative diffusion processes are state-of-the-art machine learning models deeply connected with fundamental concepts in statistical physics. Depending on the dataset size and the capacity of the network, their behavior is known to transition from an associative memory regime to a generalization phase in a phenomenon that has been described as a glassy phase transition. Here, using statistical physics techniques, we extend the theory of memorization in generative diffusion to manifold-supported data. Our theoretical and experimental findings indicate that different tangent subspaces are lost due to memorization effects at different critical times and dataset sizes, which depend on the local variance of the data along their directions. Perhaps counterintuitively, we find that, under some conditions, subspaces of higher variance are lost first due to memorization effects. This leads to a selective loss of dimensionality where some prominent features of the data are memorized without a full collapse on any individual training point. We validate our theory with a comprehensive set of experiments on networks trained both in image datasets and on linear manifolds, which result in a remarkable qualitative agreement with the theoretical predictions.

diffusion model, singular value, singular value normalized singular value, (14 more...)

arXiv.org Machine Learning

2410.08727

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Neural Information Processing SystemsOct-10-2024, 23:14:25 GMT

ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection

Class-incremental learning (CIL) learns a classification model with training data of different classes arising progressively. Existing CIL either suffers from serious accuracy loss due to catastrophic forgetting, or invades data privacy by revisiting used exemplars. Inspired by learning of linear problems, we propose an analytic class-incremental learning (ACIL) with absolute memorization of past knowledge while avoiding breaching of data privacy (i.e., without storing historical data). The absolute memorization is demonstrated in the sense that the CIL using ACIL given present data would give identical results to that from its joint-learning counterpart that consumes both present and historical samples. This equality is theoretically validated.

absolute memorization and privacy protection, acil, analytic class-incremental learning, (2 more...)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.90)

Neural Information Processing SystemsOct-10-2024, 22:51:43 GMT

An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks

It is well known that modern deep neural networks are powerful enough to memorize datasets even when the labels have been randomized. Recently, Vershynin(2020) settled a long standing question by Baum(1988), proving that deep threshold networks can memorize n points in d dimensions using \widetilde{\mathcal{O}}(e {1/\delta 2} \sqrt{n}) neurons and \widetilde{\mathcal{O}}(e {1/\delta 2}(d \sqrt{n}) n) weights, where \delta is the minimum distance between the points. Our construction uses Gaussian random weights only in the first layer, while all the subsequent layers use binary or integer weights. We also prove new lower bounds by connecting memorization in neural networks to the purely geometric problem of separating n points on a sphere using hyperplanes.

deep threshold network, delta, exponential improvement, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.66)