duplicate
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > Austria > Styria > Graz (0.04)
- (4 more...)
- Law (1.00)
- Information Technology > Security & Privacy (0.67)
- North America > United States > Massachusetts (0.04)
- North America > United States > Florida > Martin County > Stuart (0.04)
- Law (1.00)
- Information Technology (1.00)
- Banking & Finance > Real Estate (0.93)
Random Walk Learning and the Pac-Man Attack
Chen, Xingran, Parag, Parimal, Bhagat, Rohit, Liu, Zonghong, Rouayheb, Salim El
Random walk (RW)-based algorithms have long been popular in distributed systems due to low overheads and scalability, with recent growing applications in decentralized learning. However, their reliance on local interactions makes them inherently vulnerable to malicious behavior. In this work, we investigate an adversarial threat that we term the ``Pac-Man'' attack, in which a malicious node probabilistically terminates any RW that visits it. This stealthy behavior gradually eliminates active RWs from the network, effectively halting the learning process without triggering failure alarms. To counter this threat, we propose the Average Crossing (AC) algorithm--a fully decentralized mechanism for duplicating RWs to prevent RW extinction in the presence of Pac-Man. Our theoretical analysis establishes that (i) the RW population remains almost surely bounded under AC and (ii) RW-based stochastic gradient descent remains convergent under AC, even in the presence of Pac-Man, with a quantifiable deviation from the true optimum. Our extensive empirical results on both synthetic and real-world datasets corroborate our theoretical findings. Furthermore, they uncover a phase transition in the extinction probability as a function of the duplication threshold. We offer theoretical insights by analyzing a simplified variant of the AC, which sheds light on the observed phase transition.
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Information Technology > Security & Privacy (0.88)
Intrinsic Self-Supervision for Data Quality Audits
Requests for name changes in the electronic proceedings will be accepted with no questions asked. However name changes may cause bibliographic tracking issues. Authors are asked to consider this carefully and discuss it with their co-authors prior to requesting a name change in the electronic proceedings. Use the Report an Issue link to request a name change.
The seed vaults that could save humanity
These genetic libraries plan for worse-case scenarios. An employee at the Leibniz Institute of Plant Genetics and Crop Plant Research in Germany shows off a specimen of frozen plant seeds from the institute's genebank. Breakthroughs, discoveries, and DIY tips sent every weekday. Amid the 872-day siege of Leningrad in the early 1940s, nine people died protecting a library. This library was not for books, but for seeds collected from around the globe.
- Europe > Germany (0.25)
- Asia > Middle East > Syria (0.06)
- Africa > Middle East > Morocco (0.06)
- (3 more...)
- Health & Medicine (0.96)
- Food & Agriculture > Agriculture (0.50)
Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects
Our world is full of identical objects (\emph{e.g.}, cans of coke, cars of same model). These duplicates, when seen together, provide additional and strong cues for us to effectively reason about 3D. Inspired by this observation, we introduce Structure from Duplicates (SfD), a novel inverse graphics framework that reconstructs geometry, material, and illumination from a single image containing multiple identical objects. SfD begins by identifying multiple instances of an object within an image, and then jointly estimates the 6DoF pose for all instances. An inverse graphics pipeline is subsequently employed to jointly reason about the shape, material of the object, and the environment light, while adhering to the shared geometry and material constraint across instances.Our primary contributions involve utilizing object duplicates as a robust prior for single-image inverse graphics and proposing an in-plane rotation-robust Structure from Motion (SfM) formulation for joint 6-DoF object pose estimation. By leveraging multi-view cues from a single image, SfD generates more realistic and detailed 3D reconstructions, significantly outperforming existing single image reconstruction models and multi-view reconstruction approaches with a similar or greater number of observations.
Lost in the Pipeline: How Well Do Large Language Models Handle Data Preparation?
Spreafico, Matteo, Tassini, Ludovica, Sancricca, Camilla, Cappiello, Cinzia
Large language models have recently demonstrated their exceptional capabilities in supporting and automating various tasks. Among the tasks worth exploring for testing large language model capabilities, we considered data preparation, a critical yet often labor-intensive step in data-driven processes. This paper investigates whether large language models can effectively support users in selecting and automating data preparation tasks. To this aim, we considered both general-purpose and fine-tuned tabular large language models. We prompted these models with poor-quality datasets and measured their ability to perform tasks such as data profiling and cleaning. We also compare the support provided by large language models with that offered by traditional data preparation tools. To evaluate the capabilities of large language models, we developed a custom-designed quality model that has been validated through a user study to gain insights into practitioners' expectations.
- North America > United States (0.04)
- Europe > Italy > Lombardy > Milan (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
A Supplementary Material
On HuggingFace, we find information about the annotation creators ( e.g., crowdsource, experts, ml-generated) or specific task categories ( e.g., image-classification, image-to-text, text-to-image). Kaggle automatically computes a usability score, which is associated with the tag "well-documented", Kaggle's usability score is based on: Completeness: subtitle, tag, description, cover image . Credibility: provenance, public noteboook, update frequency . Compatibility: license, file format, file description, column description . The usability score is based on only 4 out of 7 aspects from Datasheets [40].
- Information Technology > Artificial Intelligence > Vision (0.37)
- Information Technology > Communications > Social Media > Crowdsourcing (0.35)