database
Disneyland Now Uses Face Recognition on Visitors
Plus: The NSA tests Anthropic's Mythos Preview to find vulnerabilities, a Finnish teen is charged over the Scattered Spider hacking spree, and more. A gunman attempted to enter the White House Correspondents' Dinner in Washington, DC, last weekend, while President Donald Trump, Vice President JD Vance, and other administration officials were in attendance. Media reports and Trump himself quickly identified the suspected shooter as 31-year-old engineer and computer scientist Cole Tomas Allen. The California resident was arrested at the scene on Saturday and appeared Monday in the US District Court for the District of Columbia to face three federal charges: attempting to assassinate the president, transportation of a firearm in interstate commerce, and discharge of a firearm during a crime of violence. The authentication standards body known as the FIDO Alliance announced working groups this week along with Google and Mastercard to develop technical guardrails for validating and protecting transactions initiated by an AI agent .
Retrieval-Augmented Diffusion Models
Novel architectures have recently improved generative image synthesis leading to excellent visual quality in various tasks. Much of this success is due to the scalability of these architectures and hence caused by a dramatic increase in model complexity and in the computational resources invested in training these models.
materials
A.1 Access instructions OpenProteinSet is hosted by the Registry of Open Data on AWS (RODA) and can be accessed at the following link: registry.opendata.aws/openfold/. A.2 Documentation and intended uses We include a datasheet [1] in Section B. Detailed documentation on the precise structure and content of the dataset is provided on the dataset's landing page. A.3 Data format All OpenProteinSet files are in standard plaintext formats (A3M for MSAs, HHSearch format for template hits, and PDB for structure files) that can be read by a wide variety of bioinformatics software. A.5 License OpenProteinSet is made available under the CCBY 4.0 license. A copy of the license is provided with the dataset.
OpenProteinSet: Training data for structural biology at scale
Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To remedy this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. We have previously demonstrated the utility of OpenProteinSet by successfully retraining AlphaFold2 on it. We expect OpenProteinSet to be broadly useful as training and validation data for 1) diverse tasks focused on protein structure, function, and design and 2) large-scale multimodal machine learning research.
ScenarioNet: Open-Source Platform for Large-Scale Traffic Scenario Simulation and Modeling
Large-scale driving datasets such as Waymo Open Dataset and nuScenes substantially accelerate autonomous driving research, especially for perception tasks such as 3D detection and trajectory forecasting. Since the driving logs in these datasets contain HD maps and detailed object annotations that accurately reflect the realworld complexity of traffic behaviors, we can harvest a massive number of complex traffic scenarios and recreate their digital twins in simulation. Compared to the handcrafted scenarios often used in existing simulators, data-driven scenarios collected from the real world can facilitate many research opportunities in machine learning and autonomous driving.