Goto

Collaborating Authors

 pre


On the Convergence of Encoder-only Shallow Transformers

Neural Information Processing Systems

Besides, neural tangent kernel (NTK) based analysis is also given, which facilitates a comprehensive comparison. Our theory demonstrates the separation on the importance of different scaling schemes and initialization.




A Concept uniqueness and granularity

Neural Information Processing Systems

Here, we report statistics about the uniqueness of neuron concepts, as we increase the maximum formula length of our explanations. Figure S1: Number of repeated concepts across probed vision and NLI models, by maximum formula length. Table S1: For probed Image Classification and NLI models, average number of occurrences of each detected concept and percentage of detected concepts that are unique (i.e. A.1 Image Classification Figure S1 (left) plots the number of times each unique concept appears across the 512 units of ResNet-18 as the maximum formula length increases. Table S1 displays the mean number of occurrences per concept, and percentage of concepts occurring that are unique (i.e.



492114f6915a69aa3dd005aa4233ef51-Supplemental.pdf

Neural Information Processing Systems

A deterministic path uses a self-attention and cross-attention to summarize contexts. B.1 1DRegression Architectures For models without attention (CNP, NP, BNP), we set`pre = 4,`post = 2,`dec = 3,dh = 128. For NP we set dz = 128. For Student-t noise, we addedε γ T(2.1) to the curves generated from GP with RBF kernel, whereT(2.1) is a Student'st distribution with degree of freedom2.1 and γ Unif(0,0.15). After realizing them, the prior functions are used to optimize via Bayesian optimization.




Fine Tuning a Simulation-Driven Estimator

Lakshminarayanan, Braghadeesh, Guerrero, Margarita A., Rojas, Cristian R.

arXiv.org Machine Learning

Many industries now deploy high-fidelity simulators (digital twins) to represent physical systems, yet their parameters must be calibrated to match the true system. This motivated the construction of simulation-driven parameter estimators, built by generating synthetic observations for sampled parameter values and learning a supervised mapping from observations to parameters. However, when the true parameters lie outside the sampled range, predictions suffer from an out-of-distribution (OOD) error. This paper introduces a fine-tuning approach for the Two-Stage estimator that mitigates OOD effects and improves accuracy. The effectiveness of the proposed method is verified through numerical simulations.


Detection of Cyberbullying in GIF using AI

Dave, Pal, Yuan, Xiaohong, Siddula, Madhuri, Roy, Kaushik

arXiv.org Artificial Intelligence

Cyberbullying is a well-known social issue, and it is escalating day by day. Due to the vigorous development of the internet, social media provide many different ways for the user to express their opinions and exchange information. Cyberbullying occurs on social media using text messages, comments, sharing images and GIFs or stickers, and audio and video. Much research has been done to detect cyberbullying on textual data; some are available for images. Very few studies are available to detect cyberbullying on GIFs/stickers. We collect a GIF dataset from Twitter and Applied a deep learning model to detect cyberbullying from the dataset. Firstly, we extracted hashtags related to cyberbullying using Twitter. We used these hashtags to download GIF file using publicly available API GIPHY. We collected over 4100 GIFs including cyberbullying and non cyberbullying. we applied deep learning pre-trained model VGG16 for the detection of the cyberbullying. The deep learning model achieved the accuracy of 97%. Our work provides the GIF dataset for researchers working in this area.