Goto

Collaborating Authors

 Data Science


Hundreds of Chrome extensions create a web-scraping botnet

PCWorld

Browser extensions can be just as dangerous as regular apps, and their integration with the tool everyone's constantly using can make them seem erroneously innocuous. Case in point: a collection of more than 200 extensions for Chrome and other major browsers are being used to "scrape" website content. This essentially turns browser users into a free data center, with capacity sold off for profit. The Secure Annex report (spotted by Ars Technica) is an interesting one, documenting the MellowTel system. Here's how it works: Step one, a developer of a legitimate extension is offered a tool that integrates a software library into the extension.


1,000-year-old medieval sword emerges from Dutch river after chance discovery: 'Barely corroded'

FOX News

SOLVA Archaeology Service in Belgium announced the recent discovery of ancient Roman artifacts and remains, including a well-preserved dog, in Velzeke. A remarkable medieval sword with rare symbols was recently put on display in a Dutch museum, over a year after it was found by construction workers unexpectedly. The discovery of the sword was announced by the Netherlands' National Museum of Antiquities (RMO) in Leiden on June 24. The artifact, named the Linschoten Sword, was found in March 2024 during "maintenance dredging activities," the museum said in a press release. Construction workers were struck by a "long piece of iron" while cleaning a small river known as the Korte Linschoten, the statement noted.


4 ways your organization can adapt and thrive in the age of AI

ZDNet

The evidence suggests almost all business leaders are piloting or investing in AI initiatives, and biopharmaceutical giant Boehringer Ingelheim is committed to investing in emerging technology that could have life-altering consequences. The company's 55,000 employees focus on developing innovative therapies that can improve lives in areas of high unmet medical need, with AI and data playing an increasingly crucial role in their work. Global CIO Markus Schümmelfeder told ZDNET that emerging technology can open all kinds of possibilities when its adoption is accompanied by organizational change: "AI together with big data availability and access to the right capability is the real game-changer." So, how can business leaders drive successful organizational change in an age of AI? Schümmelfeder and his colleague Oliver Sluke, head of IT research, development, and medicine at Boehringer, told ZDNET their four best-practice tips for AI-enabled business transformation. Most digital leaders agree: before you start tinkering with technology, you must ensure your data is managed, sorted, and accessible.


70 of the best Harvard University courses you can take online for free

Mashable

The catch with these free courses is that they don't include certificate of completion or graded assignments and exams. But you can still enroll at any time and start learning at your own pace. Find the best free online courses from Harvard University with edX.


Neuro-Vision to Language: Enhancing Brain Recording-based Visual Reconstruction and Language Interaction

Neural Information Processing Systems

Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition but faces challenges due to individual differences and complex neural signal representations. Traditional methods often require customized models and extensive trials, lacking interpretability in visual reconstruction tasks.


Community Detection on Evolving Graphs

Neural Information Processing Systems

Clustering is a fundamental step in many information-retrieval and data-mining applications. Detecting clusters in graphs is also a key tool for finding the community structure in social and behavioral networks. In many of these applications, the input graph evolves over time in a continual and decentralized manner, and, to maintain a good clustering, the clustering algorithm needs to repeatedly probe the graph. Furthermore, there are often limitations on the frequency of such probes, either imposed explicitly by the online platform (e.g., in the case of crawling proprietary social networks like twitter) or implicitly because of resource limitations (e.g., in the case of crawling the web). In this paper, we study a model of clustering on evolving graphs that captures this aspect of the problem.


FairJob: A Real-World Dataset for Fairness in Online Systems

Neural Information Processing Systems

We introduce a fairness-aware dataset for job recommendation in advertising, designed to foster research in algorithmic fairness within real-world scenarios. It was collected and prepared to comply with privacy standards and business confidentiality. An additional challenge is the lack of access to protected user attributes such as gender, for which we propose a solution to obtain a proxy estimate. Despite being anonymized and including a proxy for a sensitive attribute, our dataset preserves predictive power and maintains a realistic and challenging benchmark. This dataset addresses a significant gap in the availability of fairnessfocused resources for high-impact domains like advertising - the actual impact being having access or not to precious employment opportunities, where balancing fairness and utility is a common industrial challenge. We also explore various stages in the advertising process where unfairness can occur and introduce a method to compute a fair utility metric for the job recommendations in online systems case from a biased dataset. Experimental evaluations of bias mitigation techniques on the released dataset demonstrate potential improvements in fairness and the associated trade-offs with utility.


Optimized Pre-Processing for Discrimination Prevention

Neural Information Processing Systems

Non-discrimination is a recognized objective in algorithmic decision making. In this paper, we introduce a novel probabilistic formulation of data pre-processing for reducing discrimination. We propose a convex optimization for learning a data transformation with three goals: controlling discrimination, limiting distortion in individual data samples, and preserving utility. We characterize the impact of limited sample size in accomplishing this objective. Two instances of the proposed optimization are applied to datasets, including one on real-world criminal recidivism.


Data curation via joint example selection further accelerates multimodal learning Olivier J. Hénaff

Neural Information Processing Systems

Data curation is an essential component of large-scale pretraining. In this work, we demonstrate that jointly prioritizing batches of data is more effective for learning than selecting examples independently. Multimodal contrastive objectives expose the dependencies between data and thus naturally yield criteria for measuring the joint learnability of a batch. We derive a simple and tractable algorithm for selecting such batches, which significantly accelerate training beyond individuallyprioritized data points. As performance improves by selecting from large superbatches, we also leverage recent advances in model approximation to reduce the computational overhead of scoring.


Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions

Neural Information Processing Systems

Recent work on deriving O(\log T) anytime regret bounds for stochastic dueling bandit problems has considered mostly Condorcet winners, which do not always exist, and more recently, winners defined by the Copeland set, which do always exist. In this work, we consider a broad notion of winners defined by tournament solutions in social choice theory, which include the Copeland set as a special case but also include several other notions of winners such as the top cycle, uncovered set, and Banks set, and which, like the Copeland set, always exist. We develop a family of UCB-style dueling bandit algorithms for such general tournament solutions, and show O(\log T) anytime regret bounds for them. Experiments confirm the ability of our algorithms to achieve low regret relative to the target winning set of interest.