failed
Do DeepSeek's A.I. Advances Mean US Tech Controls Have Failed?
DeepSeek has said that its most recent model was trained on Nvidia H800s. This is an A.I. chip that Nvidia developed specifically for the Chinese market after export controls were first imposed, and that caused a fair amount of drama in Washington. When the United States put restrictions on Nvidia's most advanced chips in 2022, Nvidia quickly adapted by creating slightly downgraded chips that fell just under the threshold the government had set. These chips were technically legal for Chinese companies to use, but allowed them to achieve practically the same results. This angered Biden officials, and they moved to restrict the new chips as well. But the government moved slowly, and it took them about a year to ban the H800 and other downgraded chips.
- Asia > China (0.45)
- North America > United States (0.42)
Design choices made by LLM-based test generators prevent them from finding bugs
Mathews, Noble Saji, Nagappan, Meiyappan
There is an increasing amount of research and commercial tools for automated test case generation using Large Language Models (LLMs). This paper critically examines whether recent LLM-based test generation tools, such as Codium CoverAgent and CoverUp, can effectively find bugs or unintentionally validate faulty code. Considering bugs are only exposed by failing test cases, we explore the question: can these tools truly achieve the intended objectives of software testing when their test oracles are designed to pass? Using real human-written buggy code as input, we evaluate these tools, showing how LLM-generated tests can fail to detect bugs and, more alarmingly, how their design can worsen the situation by validating bugs in the generated test suite and rejecting bug-revealing tests. These findings raise important questions about the validity of the design behind LLM-based test generation tools and their impact on software quality and test suite reliability.
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- Asia > Singapore (0.04)
ArchCode: Incorporating Software Requirements in Code Generation with Large Language Models
Han, Hojae, Kim, Jaejin, Yoo, Jaeseok, Lee, Youngwon, Hwang, Seung-won
This paper aims to extend the code generation capability of large language models (LLMs) to automatically manage comprehensive software requirements from given textual descriptions. Such requirements include both functional (i.e. achieving expected behavior for inputs) and non-functional (e.g., time/space performance, robustness, maintainability) requirements. However, textual descriptions can either express requirements verbosely or may even omit some of them. We introduce ARCHCODE, a novel framework that leverages in-context learning to organize requirements observed in descriptions and to extrapolate unexpressed requirements from them. ARCHCODE generates requirements from given descriptions, conditioning them to produce code snippets and test cases. Each test case is tailored to one of the requirements, allowing for the ranking of code snippets based on the compliance of their execution results with the requirements. Public benchmarks show that ARCHCODE enhances to satisfy functional requirements, significantly improving Pass@k scores. Furthermore, we introduce HumanEval-NFR, the first evaluation of LLMs' non-functional requirements in code generation, demonstrating ARCHCODE's superiority over baseline methods. The implementation of ARCHCODE and the HumanEval-NFR benchmark are both publicly accessible.
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- (4 more...)
Saturn: Sample-efficient Generative Molecular Design using Memory Manipulation
Guo, Jeff, Schwaller, Philippe
Generative molecular design for drug discovery has very recently achieved a wave of experimental validation, with language-based backbones being the most common architectures employed. The most important factor for downstream success is whether an in silico oracle is well correlated with the desired end-point. To this end, current methods use cheaper proxy oracles with higher throughput before evaluating the most promising subset with high-fidelity oracles. The ability to directly optimize high-fidelity oracles would greatly enhance generative design and be expected to improve hit rates. However, current models are not efficient enough to consider such a prospect, exemplifying the sample efficiency problem. In this work, we introduce Saturn, which leverages the Augmented Memory algorithm and demonstrates the first application of the Mamba architecture for generative molecular design. We elucidate how experience replay with data augmentation improves sample efficiency and how Mamba synergistically exploits this mechanism. Saturn outperforms 22 models on multi-parameter optimization tasks relevant to drug discovery and may possess sufficient sample efficiency to consider the prospect of directly optimizing high-fidelity oracles.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Asia > Middle East > Israel (0.04)
23andMe Failed to Detect Account Intrusions for Months
Police took a digital rendering of a suspect's face, generated using DNA evidence, and ran it through a facial recognition system in a troubling incident reported for the first time by WIRED this week. The tactic came to light in a trove of hacked police records published by the transparency collective Distributed Denial of Secrets. Meanwhile, information about United States intelligence agencies purchasing Americans' phone location data and internet metadata without a warrant was revealed this week only after US senator Ron Wyden blocked the appointment of a new NSA director until the information was made public. And a California teen who allegedly used the handle Torswats to carry out hundreds of swatting attacks across the US is being extradited to Florida to face felony charges. The infamous spyware developer NSO Group, creator of the Pegasus spyware, has been quietly planning a comeback, which involves investing millions of dollars lobbying in Washington while exploiting the Israel-Hamas war to stoke global security fears and position its products as a necessity.
- Asia > South Korea (0.31)
- North America > United States > California (0.26)
- Asia > Middle East > Israel (0.26)
- (3 more...)
Difference of Probability and Information Entropy for Skills Classification and Prediction in Student Learning
Ehimwenma, Kennedy Efosa, Sharji, Safiya Al, Raheem, Maruf
The probability of an event is in the range of [0, 1]. In a sample space S, the value of probability determines whether an outcome is true or false. The probability of an event Pr(A) that will never occur = 0. The probability of the event Pr(B) that will certainly occur = 1. This makes both events A and B thus a certainty. Furthermore, the sum of probabilities Pr(E1) + Pr(E2) + ... + Pr(En) of a finite set of events in a given sample space S = 1. Conversely, the difference of the sum of two probabilities that will certainly occur is 0. Firstly, this paper discusses Bayes' theorem, then complement of probability and the difference of probability for occurrences of learning-events, before applying these in the prediction of learning objects in student learning. Given the sum total of 1; to make recommendation for student learning, this paper submits that the difference of argMaxPr(S) and probability of student-performance quantifies the weight of learning objects for students. Using a dataset of skill-set, the computational procedure demonstrates: i) the probability of skill-set events that has occurred that would lead to higher level learning; ii) the probability of the events that has not occurred that requires subject-matter relearning; iii) accuracy of decision tree in the prediction of student performance into class labels; and iv) information entropy about skill-set data and its implication on student cognitive performance and recommendation of learning [1].
- Oceania > New Zealand > North Island > Waikato (0.04)
- North America > United States > California (0.04)
- Asia > Middle East > Oman > Muscat Governorate > Muscat (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Education (1.00)
The US Has Failed to Pass AI Regulation. New York City Is Stepping Up
As the US federal government struggles to meaningfully regulate AI--or even function--New York City is stepping into the governance gap. The city introduced an AI Action Plan this week that mayor Eric Adams calls a first of its kind in the nation. The set of roughly 40 policy initiatives is designed to protect residents against harm like bias or discrimination from AI. It includes development of standards for AI purchased by city agencies and new mechanisms to gauge the risk of AI used by city departments. New York's AI regulation could soon expand still further.
- North America > United States > New York (0.94)
- North America > United States > District of Columbia > Washington (0.06)
- Law > Statutes (0.91)
- Government > Regional Government > North America Government > United States Government (0.33)
I Asked AI Chatbots to Help Me Shop. They All Failed
Like people in many fields, we here on the WIRED Gear desk are mildly concerned that ChatGPT is coming for our jobs. But we feel relatively safe because it's our job to test things, and AI can't really do that. A large language model can't pedal an ebike. A chatbot can't see the curves of a Dynamic Island. A cloud service can't tell you whether a grill cooked a burger evenly.
AI-Powered Hiring Tools Have Failed to Reduce Bias, New Study Claims
In recent years, there has been an increase in the usage of AI tools that are advertised as a solution to the lack of diversity in the workforce. These tools range from chatbots and CV scrapers to aid companies in hiring employees. Users of such tools claim that it eliminates gender and ethnic biases in hiring by utilizing algorithms that analyze job applicants through their speech patterns, expressions, and other aspects. However, researchers from Cambridge's Centre for Gender Studies contend that AI recruiting tools are superficial and equivalent to "automated pseudoscience" in a recent report published in Philosophy and Technology. They claim it is a risky instance of "technosolutionism" - the use of technology to address complex issues like discrimination without making the necessary investments or alterations to organizational culture.
Three Edge Cases That AI Has Failed
Even though you've not realized it, AI has changed every aspect of our lives. Maps app predicts the traffic and offers you the fastest route while you're trying to arrive at your meeting. The dress you want to buy appears on the ads box of a random website. Netflix recommends the show from your favourite genre. These are just a few of endless examples about how AI is making our daily life much easier.