repetition
A proposal for PU classification under Non-SCAR using clustering and logistic model
Furmanczyk, Konrad, Paczutkowski, Kacper
The present study aims to investigate a cluster cleaning algorithm that is both computationally simple and capable of solving the PU classification when the SCAR condition is unsatisfied. A secondary objective of this study is to determine the robustness of the LassoJoint method to perturbations of the SCAR condition. In the first step of our algorithm, we obtain cleaning labels from 2-means clustering. Subsequently, we perform logistic regression on the cleaned data, assigning positive labels from the cleaning algorithm with additional true positive observations. The remaining observations are assigned the negative label. The proposed algorithm is evaluated by comparing 11 real data sets from machine learning repositories and a synthetic set. The findings obtained from this study demonstrate the efficacy of the clustering algorithm in scenarios where the SCAR condition is violated and further underscore the moderate robustness of the LassoJoint algorithm in this context.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Poland > Masovia Province > Warsaw (0.05)
- North America > United States > California > Orange County > Irvine (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)
- Europe > United Kingdom > Scotland (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Asia > Middle East > Israel (0.04)
Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective Huayang Li Tian Lan Zihao Fu Deng Cai Lemao Liu Nigel Collier
In this work, we aim to advance our understanding by presenting a straightforward and fundamental explanation from the data perspective. Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data. Subsequent experiments also demonstrate that by selectively dropping out the attention to repetitive words in training data, degeneration can be significantly minimized.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (9 more...)
Chatting Makes Perfect: Chat-based Image Retrieval Supplementary Material
In Appendix A, we start by showing more qualitative results of chats and their retrieval results, and BLIP2 chats compared to a human answerer. Next, in Appendix B, we present the few shot instructional prompts that were used by different LLMs for generating follow-up questions. Another example in Figure 2 describes two trains, searched by the text "A train that is parked next to another train". Figure 3 demonstrates a case where the description "a small and dirty kitchen with pots and food everywhere" is ambiguous, subjective to the viewer and may match many images in the corpus. In Figure 4 we show an example of a dialog between ChatIR and a human.
- North America > United States (0.04)
- Asia > Middle East > Israel (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Finland (0.04)
- (2 more...)
- North America > United States (0.15)
- Europe > Germany > Lower Saxony (0.04)
- Asia > Taiwan (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.92)
- Overview (0.67)
- Health & Medicine (0.67)
- Education (0.46)
- Information Technology (0.45)
- Health & Medicine > Therapeutic Area (0.98)
- Health & Medicine > Diagnostic Medicine > Imaging (0.70)