Generative AI
Elon Musk says he'll drop his 97bn bid for OpenAI if it remains a non-profit
Elon Musk says he will abandon his 97.4bn offer to buy the non-profit behind OpenAI if the ChatGPT maker drops its plan to convert into a for-profit company. "If OpenAI, Inc's Board is prepared to preserve the charity's mission and stipulate to take the'for sale' sign off its assets by halting its conversion, Musk will withdraw the bid," lawyers for the billionaire said in a filing to a California court on Wednesday. "Otherwise, the charity must be compensated by what an arms-length buyer will pay for its assets." Musk and a group of investors made their offer earlier this week, in the latest twist to a dispute with the artificial intelligence company that he helped found a decade ago. OpenAI is controlled by a non-profit board bound to its original mission of safely building "better-than-human" AI for public benefit.
Using AI tools like ChatGPT can reduce critical thinking skills
Are we losing critical thinking skills to artificial intelligence? Using generative AI can limit its users' critical thinking when doing tasks. People using generative AI also think less critically when they trust the AI to do a task, such as developing an argument for a paper or presentation. The researchers behind the findings say the solution is to adapt the technology, rather than to limit its use. How does ChatGPT work and do AI-powered chatbots "think" like us? Lev Tankelevitch at Microsoft Research and his colleagues asked 319 workers to take part in a survey.
'Not on the Best Path'
In an age of breathless predictions and sky-high valuations, cognitive scientist Gary Marcus has emerged as one of the best-known skeptics of generative artificial intelligence (AI). In fact, he recently wrote a book about his concerns, Taming Silicon Valley, in which he made the case that "we are not on the best path right now, either technically or morally." Marcus--who has spent his career examining both natural and artificial intelligence--explained his reasoning in a recent conversation with Leah Hoffmann. You've written about neural networks in everything from your 1992 monograph on language acquisition to, most recently, your book Taming Silicon Valley. Your thoughts about how AI companies and policies fall short have been well covered in your U.S. Senate testimony and other outlets (including your own Substack).
Major publishers sue AI startup Cohere over copyright infringement
This is another salvo in the ongoing war between the people that make stuff and the AI algorithms that mimic the stuff that people make. Additionally, the startup has been accused of passing off large segments of entire articles to its users without proper attribution. "Rather than create their own content, they're stealing ours to compete with us without our permission, without compensation, and undermining our very business that feeds their machines in the first place," said Danielle Coffey, CEO of the News Media Alliance, which organized the lawsuit on behalf of its members. The suit also says the company has engaged in trademark infringement, suggesting that the algorithm would send articles to users with proper attribution, using the publisher's name, but the article itself would be filled with hallucinated and incorrect information. One example given in the suit involves a piece that The Guardian published about Hamas's attack on the Nova music festival in Israel, only the AI conflated the terror attack with a 2020 shooting in Nova Scotia, Canada. Members of the News Media Alliance are suing the AI company Cohere, accusing it of stealing their journalism without permission to train its generative AI model.
OpenAI postpones o3 model release, will wrap it up with GPT-5 instead
OpenAI's CEO Sam Altman wrote a social media post with an update on the roadmap for ChatGPT. In it, he explained that they've halted the launch of its upcoming o3 reasoning model to instead focus more on a streamlined yet monolithic version of GPT-5. "We want AI to'just work' for you; we realize how complicated our model and product offerings have gotten. We hate the model picker as much as you do and want to return to magic unified intelligence. We will next ship GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model.
Apple will use Alibaba's generative AI for its iPhones in China
Apple will use Alibaba's generative AI to power artificial intelligence features for iPhones meant for sale in the Chinese market. Joe Tsai, Alibaba Group's Chairman, has confirmed the companies' partnership at the World Governments Summit in Dubai. He revealed that Apple talked to a number of other companies in China for a potential partnership, but it decided to team up with Alibaba in the end. Apple Intelligence features are not accessible in China at the moment, and even those who purchased their iPhones outside the country will not be able to use those features once they change their region to mainland China. As CNBC explains, the country has strict regulations surrounding AI, including requiring large language models to get approval for commercial use.
Massive AI Stargate Project under Trump admin reveals next steps
Stargate, the massive artificial intelligence (AI) infrastructure project recently unveiled by President Donald Trump, has begun production in Texas -- with data center construction in other states expected to be announced in the coming months. OpenAI, Softbank, Oracle and other partners' total investment of 500 million in the project will produce a large-scale network of campuses. Each campus will be designed in the roughly 1 gigawatt (GW) or greater range, a measurement of electricity that can power a minimum of 750,000 homes. During a recent press briefing on The Stargate Project attended by Fox News Digital, OpenAI announced that construction on the first site is underway in Abilene, Texas. Significant progress has been made in identifying additional locations.
PenTest++: Elevating Ethical Hacking with AI and Automation
Al-Sinani, Haitham S., Mitchell, Chris J.
Traditional ethical hacking relies on skilled professionals and time-intensive command management, which limits its scalability and efficiency. To address these challenges, we introduce PenTest++, an AI-augmented system that integrates automation with generative AI (GenAI) to optimise ethical hacking workflows. Developed in a controlled virtual environment, PenTest++ streamlines critical penetration testing tasks, including reconnaissance, scanning, enumeration, exploitation, and documentation, while maintaining a modular and adaptable design. The system balances automation with human oversight, ensuring informed decision-making at key stages, and offers significant benefits such as enhanced efficiency, scalability, and adaptability. However, it also raises ethical considerations, including privacy concerns and the risks of AI-generated inaccuracies (hallucinations). This research underscores the potential of AI-driven systems like PenTest++ to complement human expertise in cybersecurity by automating routine tasks, enabling professionals to focus on strategic decision-making. By incorporating robust ethical safeguards and promoting ongoing refinement, PenTest++ demonstrates how AI can be responsibly harnessed to address operational and ethical challenges in the evolving cybersecurity landscape.
Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking
Warren, Greta, Shklovski, Irina, Augenstein, Isabelle
The pervasiveness of large language models and generative AI in online media has amplified the need for effective automated fact-checking to assist fact-checkers in tackling the increasing volume and sophistication of misinformation. The complex nature of fact-checking demands that automated fact-checking systems provide explanations that enable fact-checkers to scrutinise their outputs. However, it is unclear how these explanations should align with the decision-making and reasoning processes of fact-checkers to be effectively integrated into their workflows. Through semi-structured interviews with fact-checking professionals, we bridge this gap by: (i) providing an account of how fact-checkers assess evidence, make decisions, and explain their processes; (ii) examining how fact-checkers use automated tools in practice; and (iii) identifying fact-checker explanation requirements for automated fact-checking tools. The findings show unmet explanation needs and identify important criteria for replicable fact-checking explanations that trace the model's reasoning path, reference specific evidence, and highlight uncertainty and information gaps.
Trust at Your Own Peril: A Mixed Methods Exploration of the Ability of Large Language Models to Generate Expert-Like Systems Engineering Artifacts and a Characterization of Failure Modes
Topcu, Taylan G., Husain, Mohammed, Ofsa, Max, Wach, Paul
Multi-purpose Large Language Models (LLMs), a subset of generative Artificial Intelligence (AI), have recently made significant progress. While expectations for LLMs to assist systems engineering (SE) tasks are paramount; the interdisciplinary and complex nature of systems, along with the need to synthesize deep-domain knowledge and operational context, raise questions regarding the efficacy of LLMs to generate SE artifacts, particularly given that they are trained using data that is broadly available on the internet. To that end, we present results from an empirical exploration, where a human expert-generated SE artifact was taken as a benchmark, parsed, and fed into various LLMs through prompt engineering to generate segments of typical SE artifacts. This procedure was applied without any fine-tuning or calibration to document baseline LLM performance. We then adopted a two-fold mixed-methods approach to compare AI generated artifacts against the benchmark. First, we quantitatively compare the artifacts using natural language processing algorithms and find that when prompted carefully, the state-of-the-art algorithms cannot differentiate AI-generated artifacts from the human-expert benchmark. Second, we conduct a qualitative deep dive to investigate how they differ in terms of quality. We document that while the two-material appear very similar, AI generated artifacts exhibit serious failure modes that could be difficult to detect. We characterize these as: premature requirements definition, unsubstantiated numerical estimates, and propensity to overspecify. We contend that this study tells a cautionary tale about why the SE community must be more cautious adopting AI suggested feedback, at least when generated by multi-purpose LLMs.