Africa
e2cfb719f58585f779d0a4f9f07bd618-Supplemental-Datasets_and_Benchmarks.pdf
A.1 Creation of the Multimodal Web Document Dataset A.1.1 Collecting of a Large Number of HTMLFiles Our data collection process begins by considering the 25 most recent Common Crawl6 dumps available at the time of dataset creation. It contains webpages spanning from February 2020 to January/February 2023. We use a modified version of readability-lxml7 to extract the main text from the pages, discarding any pages that contain text of excessively high perplexity. This process yields a total of 41.2 billion documents. Selection of English content To identify non-English content, we apply the FastText classifier (Joulin et al., 2017) to the extracted text, e ectively filtering out 63.6% of the documents. Early text deduplication Often, a set of URLs is crawled repeatedly across di erent Common Crawl snapshots. However, the content of these websites may vary as web administrators make changes over time. Hence, at this stage, we refrain from deduplicating documents based on their URLs. Instead, we perform MinHash (Broder, 1997) deduplication with 16 hashes calculated over 5-grams. To further refine the data, we eliminate documents containing substantial proportions of repeated paragraphs and n-grams, employing the methodology described in MassiveText (Rae et al., 2022).
Hierarchical Spatio-Channel Clustering for Efficient Model Compression in Medical Image Analysis
Hamlomo, Sisipho, Atemkeng, Marcellin, Likassa, Habte Tadesse, Ravelo, Blaise, Bouwmans, Thierry, Lalléchère, Sébastien, Vacavant, Antoine, Chen, Ding-Geng
Convolutional neural networks (CNNs) have become increasingly difficult to deploy in resource-constrained environments due to their large memory and computational requirements. Although low-rank compression methods can reduce this burden, most existing approaches compress spatial and channel redundancy independently and therefore do not fully exploit the localised structure within convolutional feature maps. This paper proposes a hierarchical spatio-channel low-rank compression framework for CNNs that exploits redundancy across spatial regions and channel activations. Unlike conventional methods, which apply a uniform decomposition across an entire layer, the proposed approach first partitions feature maps into spatial regions, then groups channels according to their co-activation patterns within each region, and finally applies rank-adaptive SVD to each resulting spatio-channel cluster. The method is evaluated on an AlexNet-based brain tumour MRI classification model and compared with Global SVD and Tucker decomposition under \(3\times\) and \(6\times\) compression budgets. Our method outperforms both baselines, reducing FLOPs from \(8.21\,\mathrm{G}\) to \(1.55\,\mathrm{G}\) (\(81.1\%\) reduction), achieving a \(1.38\times\) inference speed-up, and increasing classification accuracy from \(87.76\%\) to \(89.80\%\). The method also improves the macro \(F_1\)-score and performance on challenging classes such as meningioma. A hyper-parameter trade-off analysis demonstrates that the framework provides Pareto-optimal configurations, enabling control over the balance between compression and predictive performance. Moderate clustering with adaptive rank selection yields strong results. Bootstrap standard errors are reported for all classification metrics.
Learning to Think from Multiple Thinkers
Joshi, Nirmit, Magen, Roey, Srebro, Nathan, Tsilivis, Nikolaos, Vardi, Gal
We study learning with Chain-of-Thought (CoT) supervision from multiple thinkers, all of whom provide correct but possibly systematically different solutions, e.g., step-by-step solutions to math problems written by different thinkers, or step-by-step execution traces of different programs solving the same problem. We consider classes that are computationally easy to learn using CoT supervision from a single thinker, but hard to learn with only end-result supervision, i.e., without CoT (Joshi et al. 2025). We establish that, under cryptographic assumptions, learning can be hard from CoT supervision provided by two or a few different thinkers, in passive data-collection settings. On the other hand, we provide a generic computationally efficient active learning algorithm that learns with a small amount of CoT data per thinker that is completely independent of the target accuracy $\varepsilon$, a moderate number of thinkers that scales as $\log \frac{1}{\varepsilon}\log \log \frac{1}{\varepsilon}$, and sufficient passive end-result data that scales as $\frac{1}{\varepsilon}\cdot poly\log\frac{1}{\varepsilon}$.
Sebastian Sawe breaks London marathon record with first run under two hours
Kenya's Sabastian Sawe has become the first man to run a marathon in under two hours, winning the London Marathon in 1:59:30. Ethiopia's Tigst Assefa defended her London Marathon crown on Sunday, breaking her own world record. The 31-year-old, who has never lost a marathon, smashed the world record by 65 seconds. Yomif Kejelcha of Ethiopia stayed on Sawe's heels for most of the 42.195km course before fading down the final stretch to take second in his marathon debut with 1:59:41, while Jacob Kiplimo of Uganda won bronze in 2:02:28. All three finished under Kiptum's previous record time.
Towards a Standardised Performance Evaluation Protocol for Cooperative MARL
Multi-agent reinforcement learning (MARL) has emerged as a useful approach to solving decentralised decision-making problems at scale. Research in the field has been growing steadily with many breakthrough algorithms proposed in recent years. In this work, we take a closer look at this rapid development with a focus on evaluation methodologies employed across a large body of research in cooperative MARL. By conducting a detailed meta-analysis of prior work, spanning 75 papers accepted for publication from 2016 to 2022, we bring to light worrying trends that put into question the true rate of progress. We further consider these trends in a wider context and take inspiration from single-agent RL literature on similar issues with recommendations that remain applicable to MARL. Combining these recommendations, with novel insights from our analysis, we propose a standardised performance evaluation protocol for cooperative MARL. We argue that such a standard protocol, if widely adopted, would greatly improve the validity and credibility of future research, make replication and reproducibility easier, as well as improve the ability of the field to accurately gauge the rate of progress over time by being able to make sound comparisons across different works. Finally, we release our meta-analysis data publicly on our project website for future research on evaluation 3 accompanied by our open-source evaluation tools repository4.
China car giant BYD says it can thrive without US
The recent surge in fuel prices due to the war in Iran has spurred demand for electric vehicles around the world, and Chinese car makers are making the most of the opportunity. China is the world's top producer of EVs, and while its manufacturers remain largely shut out of the major car market of the United States, they are benefiting from an uptick in interest and orders via dealerships across Asia and elsewhere. BYD, which overtook Tesla as the world's largest seller of electric vehicles last year and is expanding aggressively overseas, is at the centre of this shift in focus. We survive and are successful without the US market today, BYD executive vice president Stella Li told the BBC at the Beijing Auto Show. Instead of aiming for US customers, the company says its challenge is meeting increased demand in other regions, including Brazil, the UK and Europe.
Who's in control of AI?
Owner of US tech giant reveals breach of one of world's most powerful AI models. Reports of unauthorised access to one of the most powerful Artificial Intelligence models yet developed have emerged. Nothing malicious, say the owners - but it has intensified focus on such technology falling into the wrong hands. So, how is AI being controlled globally? Will complex EU loan deal intensify conflict?
Archaeologists discover 7-foot-tall statue of legendary Egyptian pharaoh
The over 3,000-year-old statement piece belonged to Ramses the Great. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Ramses is considered the greatest pharaoh of ancient Egypt's New Kingdom. Breakthroughs, discoveries, and DIY tips sent six days a week. Ramses II (1303-1213 BCE), aka Ramses the Great, is easily one of ancient Egyptian history's most recognizable rulers.
'Chemical-spraying' drones reportedly stolen from New Jersey facility sparks fears of 'nightmare scenario'
Rob Reiner's son Jake shares horrific new details from night of his parents' murders and says it is'almost impossible to process' that his brother Nick has been charged with the killings Bloodbath on the streets as millions of dogs are'massacred' by firing squad ahead of the World Cup Tucker Carlson's secret heiress sister reveals bitter feud over family fortune: He says'I don't know her'... but trove of photos tells a very different story Lesbian sex secrets of Kristi Noem's ICE leader: Ex lover claims jealous rages over men, screaming through hotel walls... and vile tight bodysuit demand Hidden cameras at NYC's live animal markets expose filthy conditions, disease risks, and brutal treatment of chickens, ducks, rabbits and sheep MAUREEN CALLAHAN: Dark indisputable Michael Jackson truths Hollywood STILL covers up. His own daughter reportedly now thinks he was a pedophile, so why's this so hard to say? Scandal after high-ranking female prison officer gave birth to twins... as shocking rumor spreads about identity of their father My senior government source has told me why these scientists may REALLY be going missing. This is so serious even the President is being kept on a'need-to-know basis': KENNEDY Former NFL quarterback Tim Tebow announces tragic news of dad's death after battle with Parkinson's in heartbreaking post Reclusive Athina Onassis, heiress to $2.7billion fortune who stepped away from public life after humiliating heartbreak, breaks cover at Barcelona Bridal Week in rare public appearance Sam's Club just launched a perk that targets Costco's biggest flaw Disappointed customers reveal the most'overrated' chain restaurants... do YOU have good taste? Woke author who boasted about shoplifting from Whole Foods flies into foul-mouthed RAGE when confronted outside her $2.2m Brooklyn brownstone Sherrone Moore's ex-mistress reveals pregnancy as she details night fired Michigan coach came to her apartment Troubling past of'father of the year' who murdered son, 11, in airport bathroom... as grieving grandpa reveals warning sign that something awful was about to happen US threatens to'review' UK claim to Falklands Islands and ban Spain from NATO as punishment for failure to back Iran War'Chemical-spraying' drones reportedly stolen from New Jersey facility sparks fears of'nightmare scenario' An alarm has erupted after 15 powerful agricultural spray drones were stolen in a suspected coordinated heist in New Jersey last month. A report from The High Side claimed the FBI is investigating the theft amid fears the machines could be used to disperse dangerous materials.