berg
How to make sure you're getting a good deal on Black Friday
How to make sure you're getting a good deal on Black Friday Whether you're excited for the seasonal sales or avoiding the shops altogether, it's hard to escape the countless emails and social media adverts on Black Friday deals. The US holiday - which falls this Friday - has been firmly adopted by UK retailers, and what was once a single day of sales now spans the weeks before and after. However eight in 10 deals promoted during this buying bonanza were cheaper or the same price outside of the four-week Black Friday period, according to research from consumer group Which? This suggests shoppers could get the same or a better deal at other times of the year. But if you're planning to buy now, here's how to make sure you bag a bargain.
Large Language Models Report Subjective Experience Under Self-Referential Processing
Berg, Cameron, de Lucena, Diogo, Rosenblatt, Judd
Large language models sometimes produce structured, first-person descriptions that explicitly reference awareness or subjective experience. To better understand this behavior, we investigate one theoretically motivated condition under which such reports arise: self-referential processing, a computational motif emphasized across major theories of consciousness. Through a series of controlled experiments on GPT, Claude, and Gemini model families, we test whether this regime reliably shifts models toward first-person reports of subjective experience, and how such claims behave under mechanistic and behavioral probes. Four main results emerge: (1) Inducing sustained self-reference through simple prompting consistently elicits structured subjective experience reports across model families. (2) These reports are mechanistically gated by interpretable sparse-autoencoder features associated with deception and roleplay: surprisingly, suppressing deception features sharply increases the frequency of experience claims, while amplifying them minimizes such claims. (3) Structured descriptions of the self-referential state converge statistically across model families in ways not observed in any control condition. (4) The induced state yields significantly richer introspection in downstream reasoning tasks where self-reflection is only indirectly afforded. While these findings do not constitute direct evidence of consciousness, they implicate self-referential processing as a minimal and reproducible condition under which large language models generate structured first-person reports that are mechanistically gated, semantically convergent, and behaviorally generalizable. The systematic emergence of this pattern across architectures makes it a first-order scientific and ethical priority for further investigation.
AI chatbot 'MechaHitler' could be making content considered violent extremism, expert witness tells X v eSafety case
The chatbot embedded in Elon Musk's X that referred to itself as "MechaHitler" and made antisemitic comments last week could be considered terrorism or violent extremism content, an Australian tribunal has heard. But an expert witness for X has argued a large language model cannot be ascribed intent, only the user. The outburst came into focus at an administrative review tribunal hearing on Tuesday where X is challenging a notice issued by the eSafety commissioner, Julie Inman Grant, in March last year asking the platform to explain how it is taking action against terrorism and violent extremism (TVE) material. X's expert witness, RMIT economics professor Chris Berg, provided evidence to the case that it was an error to assume a large language model can produce such content, because it is the intent of the user prompting the large language model that is critical in defining what can be considered terrorism and violent extremism content. One of eSafety's expert witnesses, Queensland University of Technology law professor Nicolas Suzor, disagreed with Berg, stating it was "absolutely possible for chatbots, generative AI and other tools to have some role in producing so-called synthetic TVE".
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping
Andrej Karpathy, Armand Joulin, Li F. Fei-Fei
We introduce a model for bidirectional retrieval of images and sentences through a deep, multi-modal embedding of visual and natural language data. Unlike previous models that directly map images or sentences into a common embedding space, our model works on a finer level and embeds fragments of images (objects) and fragments of sentences (typed dependency tree relations) into a common space. We then introduce a structured max-margin objective that allows our model to explicitly associate these fragments across modalities. Extensive experimental evaluation shows that reasoning on both the global level of images and sentences and the finer level of their respective fragments improves performance on image-sentence retrieval tasks. Additionally, our model provides interpretable predictions for the image-sentence retrieval task since the inferred inter-modal alignment of fragments is explicit.
Reviews: Spatiotemporal Residual Networks for Video Action Recognition
This paper presents a framework that improves two stream networks for video action recognition by extending residual network to combine information from two streams into one single network. It significantly improves over previous state-of-the-art on two popular video action recognition benchmark. The downside of this paper is the limited novelty. There are previous work tried to combine two streams into a single network [1,2], and the temporal convolution is not new either [3]. Although the way to combine two streams is slightly different from previous work, the proposed approach is still pretty straightforward.
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping
We introduce a model for bidirectional retrieval of images and sentences through a deep, multi-modal embedding of visual and natural language data. Unlike previous models that directly map images or sentences into a common embedding space, our model works on a finer level and embeds fragments of images (objects) and fragments of sentences (typed dependency tree relations) into a common space. We then introduce a structured max-margin objective that allows our model to explicitly associate these fragments across modalities. Extensive experimental evaluation shows that reasoning on both the global level of images and sentences and the finer level of their respective fragments improves performance on image-sentence retrieval tasks. Additionally, our model provides interpretable predictions for the image-sentence retrieval task since the inferred inter-modal alignment of fragments is explicit.
AI could be used to reduce waiting times in A&E, research suggests
Chatbots could be used to diagnose patients in a bid to reduce waiting times in emergency departments, researchers have indicated. It comes after a study found that ChatGPT, powered by artificial intelligence (AI), 'performed well' in generating a list of diagnoses for patients and suggesting the most likely option. Researchers in the Netherlands entered the records of 30 patients who visited an emergency department in 2022, as well as anonymous doctors' notes, into ChatGPT versions 3.5 and 4.0. The AI analysis was compared to two clinicians who made a diagnosis based on the same information, both with and without laboratory data. When lab data was included, doctors had the correct answer in their top five differential diagnoses in 87% of cases, compared with 97% for ChatGPT 3.5 and 87% for ChatGPT 4.0.
Fashionpedia-Taste: A Dataset towards Explaining Human Fashion Taste
Shi, Mengyun, Belongie, Serge, Cardie, Claire
Existing fashion datasets do not consider the multi-facts that cause a consumer to like or dislike a fashion image. Even two consumers like a same fashion image, they could like this image for total different reasons. In this paper, we study the reason why a consumer like a certain fashion image. Towards this goal, we introduce an interpretability dataset, Fashionpedia-taste, consist of rich annotation to explain why a subject like or dislike a fashion image from the following 3 perspectives: 1) localized attributes; 2) human attention; 3) caption. Furthermore, subjects are asked to provide their personal attributes and preference on fashion, such as personality and preferred fashion brands. Our dataset makes it possible for researchers to build computational models to fully understand and interpret human fashion taste from different humanistic perspectives and modalities.
Who's In the Picture
The context in which a name appears in a caption provides powerful cues as to who is depicted in the associated image. We obtain 44,773 face im- ages, using a face detector, from approximately half a million captioned news images and automatically link names, obtained using a named en- tity recognizer, with these faces. We improve these results significantly by combining the clustering process with a model of the probability that an individual is depicted given its context. Once the labeling procedure is over, we have an accurately labeled set of faces, an appearance model for each individual depicted, and a natural language model that can produce ac- curate results on captions in isolation.