Goto

Collaborating Authors

 information entropy


Inference Suppose S: X R is a continuous set function w.r.t Hausdorff distance dH(,). ε > 0, foranyfunctionf andanyinvertiblemapP: X Rn, functionhandg,suchthatfor anyX X: |S(X) g(P

Neural Information Processing Systems

Theorem 2. The Instances in the bag are represented by random variables Θ1,Θ2,...,Θn, the information entropy of the bag under the correlation assumption can be expressed as H(Θ1,Θ2,...,Θn), and the information entropy of the bag under the i.i.d. Therefore, it is proved that the information source under the correlation assumption has smaller information entropy. In other words, correlation assumption reduces the uncertainty and brings more useful information. Given a set of bags {X1,X2,...,Xb}, and each bag Xi contains multiple instances {xi,1,xi,2,...,xi,n} and a corresponding label Yi. Obviously, the key to Transformer based MIL is how to design the mapping of X T. However, there are many difficulties to directly apply Transformer in WSI classification, including the large number of instances in each bag and the large variation in the number of instances in different bags (e.g., ranging from hundreds to thousands).


Are we living in a simulation? This experiment could tell us

New Scientist

Are we living in a simulation? The idea that we might be living in a simulated reality has worried us for centuries. Thomas Anderson - otherwise known as Neo - is walking up a flight of stairs when he sees a black cat shake itself and walk past a doorway. Then the moment seems to replay before his eyes. Just a touch of déjà vu, he thinks.


3D Motion Perception of Binocular Vision Target with PID-CNN

Shi, Jiazhao, Pan, Pan, Shi, Haotian

arXiv.org Artificial Intelligence

This article trained a network for perceiving three-dimensional motion information of binocular vision target, which can provide real-time three-dimensional coordinate, velocity, and acceleration, and has a basic spatiotemporal perception capability. Understood the ability of neural networks to fit nonlinear problems from the perspective of PID. Considered a single-layer neural network as using a second-order difference equation and a nonlinearity to describe a local problem. Multilayer networks gradually transform the raw representation to the desired representation through multiple such combinations. Analysed some reference principles for designing neural networks. Designed a relatively small PID convolutional neural network, with a total of 17 layers and 413 thousand parameters. Implemented a simple but practical feature reuse method by concatenation and pooling. The network was trained and tested using the simulated randomly moving ball datasets, and the experimental results showed that the prediction accuracy was close to the upper limit that the input image resolution can represent. Analysed the experimental results and errors, as well as the existing shortcomings and possible directions for improvement. Finally, discussed the advantages of high-dimensional convolution in improving computational efficiency and feature space utilization. As well as the potential advantages of using PID information to implement memory and attention mechanisms.



Machine Learning Classification and Portfolio Allocation: with Implications from Machine Uncertainty

Bai, Yang, Pukthuanthong, Kuntara

arXiv.org Artificial Intelligence

We use multi-class machine learning classifiers to identify the stocks that outperform or underperform other stocks. The resulting long-short portfolios achieve annual Sharpe ratios of 1.67 (value-weighted) and 3.35 (equal-weighted), with annual alphas ranging from 29\% to 48\%. These results persist after controlling for machine learning regressions and remain robust among large-cap stocks. Machine uncertainty, as measured by predicted probabilities, impairs the prediction performance. Stocks with higher machine uncertainty experience lower returns, particularly when human proxies of information uncertainty align with machine uncertainty. Consistent with the literature, such an effect is driven by the past underperformers.


DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression

Zhao, Yi, Li, Zuchao, Zhao, Hai, Qi, Baoyuan, Liu, Guoming

arXiv.org Artificial Intelligence

Task-agnostic prompt compression leverages the redundancy in natural language to reduce computational overhead and enhance information density within prompts, especially in long-context scenarios. Existing methods predominantly rely on information entropy as the metric to compress lexical units, aiming to achieve minimal information loss. However, these approaches overlook two critical aspects: (i) the importance of attention-critical tokens at the algorithmic level, and (ii) shifts in information entropy during the compression process. Motivated by these challenges, we propose a dynamic attention-aware approach for task-agnostic prompt compression (DAC). This approach effectively integrates entropy and attention information, dynamically sensing entropy shifts during compression to achieve fine-grained prompt compression. Extensive experiments across various domains, including LongBench, GSM8K, and BBH, show that DAC consistently yields robust and substantial improvements across a diverse range of tasks and LLMs, offering compelling evidence of its efficacy.


Universal language model with the intervention of quantum theory

Qin, D. -F.

arXiv.org Artificial Intelligence

We assume that natural language has a property analogous to quantum mechanics, and the rationale for adopting this analogy comes not only from noticing that the meanings of natural language can be analogous described by in the form of superposition of states, but also from the fact that we notice that the representation of natural language symbols has a duality in which the symbols-meanings correspond to each other. At the same time, natural language processing (NLP) methods based on statistical implementations have been popular for decades and continue to evolve.It makes us wonder whether we can go further than statistical theory and consider introducing the relevant theories of quantum mechanics into the modeling of natural language. Meanwhile, in the last decade or so, NLP has generally adopted the approach of converting natural language symbols into some kind of mathematical representation for processing, and this approach, known as word embedding, has achieved surprisingly good performance in universal language processing tasks.We see that the technical idea of completely converting the symbols that make up natural language into a numerical representation for processing to build a universal language model(ULM) is highly feasible and possible. The experimental progress fed back from these two real-world applications motivates us to explore building natural language models based on quantum theory. Within the past century, quantum theory has become more and more perfect as mankind has studied the physical world in depth.


LLM$\times$MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources

Wang, Haoyu, Fu, Yujia, Zhang, Zhu, Wang, Shuo, Ren, Zirui, Wang, Xiaorong, Li, Zhili, He, Chaoqun, An, Bo, Liu, Zhiyuan, Sun, Maosong

arXiv.org Artificial Intelligence

Long-form generation is crucial for a wide range of practical applications, typically categorized into short-to-long and long-to-long generation. While short-to-long generations have received considerable attention, generating long texts from extremely long resources remains relatively underexplored. The primary challenge in long-to-long generation lies in effectively integrating and analyzing relevant information from extensive inputs, which remains difficult for current large language models (LLMs). In this paper, we propose LLM$\times$MapReduce-V2, a novel test-time scaling strategy designed to enhance the ability of LLMs to process extremely long inputs. Drawing inspiration from convolutional neural networks, which iteratively integrate local features into higher-level global representations, LLM$\times$MapReduce-V2 utilizes stacked convolutional scaling layers to progressively expand the understanding of input materials. Both quantitative and qualitative experimental results demonstrate that our approach substantially enhances the ability of LLMs to process long inputs and generate coherent, informative long-form articles, outperforming several representative baselines. Both LLM$\times$MapReduce-V2 and SurveyEval are publicly available at https://github.com/thunlp/LLMxMapReduce .


A Novel Double Pruning method for Imbalanced Data using Information Entropy and Roulette Wheel Selection for Breast Cancer Diagnosis

Bacha, Soufiane, Ning, Huansheng, Mostefa, Belarbi, Sarwatt, Doreen Sebastian, Dhelim, Sahraoui

arXiv.org Artificial Intelligence

Accurate illness diagnosis is vital for effective treatment and patient safety. Machine learning models are widely used for cancer diagnosis based on historical medical data. However, data imbalance remains a major challenge, leading to hindering classifier performance and reliability. The SMOTEBoost method addresses this issue by generating synthetic data to balance the dataset, but it may overlook crucial overlapping regions near the decision boundary and can produce noisy samples. This paper proposes RE-SMOTEBoost, an enhanced version of SMOTEBoost, designed to overcome these limitations. Firstly, RE-SMOTEBoost focuses on generating synthetic samples in overlapping regions to better capture the decision boundary using roulette wheel selection. Secondly, it incorporates a filtering mechanism based on information entropy to reduce noise, and borderline cases and improve the quality of generated data. Thirdly, we introduce a double regularization penalty to control the synthetic samples proximity to the decision boundary and avoid class overlap. These enhancements enable higher-quality oversampling of the minority class, resulting in a more balanced and effective training dataset. The proposed method outperforms existing state-of-the-art techniques when evaluated on imbalanced datasets. Compared to the top-performing sampling algorithms, RE-SMOTEBoost demonstrates a notable improvement of 3.22\% in accuracy and a variance reduction of 88.8\%. These results indicate that the proposed model offers a solid solution for medical settings, effectively overcoming data scarcity and severe imbalance caused by limited samples, data collection difficulties, and privacy constraints.


Research on visual simultaneous localization and mapping technology based on near infrared light

Ma, Rui, Liu, Mengfang, Li, Boliang, Li, Xinghui

arXiv.org Artificial Intelligence

SLAM originated from the probabilistic SLAM problem at the IEEE Robot and Automation Conference held in San Francisco in 1986[2], and experienced three stages of initial theoretical exploration (1986-2004), algorithmic framework development (2004-2015), and system robustness improvement (2015-now)[3]. According to the sensor classification, the SLAM technology can be divided into laser SLAM, visual SLAM, and multi-sensor fusion SLAM. Laser SLAM is scanned by lidar, who are suitable for indoor environment but inaccurate positioning in a single repeated environment[4-6]. Visual SLAM captures images through the camera, acquires positions and maps through image pixels and features, and is suitable for textured rich scenes. In addition, visual SLAM has the advantages of low cost and small size, which can provide intuitive visual input[7-9].