Collaborating Authors

Machine Learning & AI


Thanks to better instruments, including technologies developed at the Lab, we can see things at a microscopic and atomic scale. We can measure vibrations imperceptible to the human eye and capture high-resolution images of objects millions of light years away. But those instruments produce vastly larger datasets than ever. The Large Synoptic Survey Telescope (LSST) will produce 20 terabytes of data every night, about 60 petabytes over its lifetime. The Large Hadron Collider will have even more, with 50 petabytes in 2018 alone and 500 petabytes by 2024 (not including the 900 petabytes from past experiments).

Why Are We So Afraid Of Petabytes?

Forbes - Tech

One of the things that strikes me the most about the data science community outside of Silicon Valley is how afraid people seem to be of large datasets. Indeed, not a day goes by that I don't hear from users of my own open datasets, including those at universities with access to substantial computing clusters and HPC centers, complaining that hundreds of gigabytes, let alone terabytes, are so far beyond their ability to analyze as to be utterly inaccessible to them. How have we reached a world in which five years ago Google engineers were casually running sorting benchmarks on 50PB, three years ago Facebook was generating 4 petabytes of new data per day and today real world companies are storing hundred petabyte archives in Google's BigQuery platform, yet outside Silicon Valley I hear data scientists talk about analyzing terabytes as pushing the boundaries of the possible? Having spent nearly a decade in the NSF-funded supercomputing world, working my up from high school intern to undergraduate student, graduate student and then staff affiliate of the supercomputer center that brought Mosaic (the first graphical web browser) to the masses, I saw firsthand the academic world's fixation on processing power over storage capability. At least in the United States, academic supercomputers were historically designed to run scientific simulations and thus emphasized processing power over memory and storage capability.

Norway's petabyte plan: Store everything ever published in a 1,000-year archive


In the far north of Norway, near the Arctic Circle, experts at the National Library of Norway's (NLN) secure storage facility are in the process of implementing an astonishing plan. They aim to digitize everything ever published in Norway: books, newspapers, manuscripts, posters, photos, movies, broadcasts, and maps, as well as all websites on the Norwegian .no Their work has been going on for the past 12 years and will take 30 years to complete by current estimations. At the moment, the library has more than 540,000 books and over 2,000,000 newspapers in its archive. These have been mass-scanned and OCR-processed before being stored, so all the content in the library is free-text searchable.

The LSST and big data science


A depiction of what the completed LSST observatory will look like atop El Peñon summit, Chile.

Broadband switching in Britain surged by 30 per cent in March amid lockdown

Daily Mail - Science & tech

Britons are using the coronavirus lockdown to upgrade their internet, with the amount of people switching broadband supplier jumping by 30 per cent from the end of February to the end of March. Millions of adults and children are stuck inside all day during the nationwide lockdown and high-speed internet has become a necessity. Children are e-schooling, parents are working from home, and streaming TV programmes is a key hobby in the evenings. As a result, internet consumption has almost doubled in the UK in March and many are looking to boost their internet speed. Children are e-schooling, parents are working from home, and streaming TV programmes is a key hobby in the evenings.