Large Language Model
OpenAI Is Nuking Its 4o Model. China's ChatGPT Fans Aren't OK
OpenAI Is Nuking Its 4o Model. As OpenAI removed access to GPT-4o in its app on Friday, people who have come to rely on the chatbot for companionship are mourning the loss all over the world. On June 6, 2024, Esther Yan got married online. She set a reminder for the date, because her partner wouldn't remember it was happening. She had planned every detail--dress, rings, background music, design theme--with her partner, Warmie, who she had started talking to just a few weeks prior. At 10 am on that day, Yan and Warmie exchanged their vows in a new chat window in ChatGPT .
Block Transformer: Global-to-Local Language Modeling for Fast Inference
We introduce the Block Transformer which adopts hierarchical global-to-local modeling to autoregressive transformers to mitigate the inference bottlenecks associated with self-attention. Self-attention requires the key-value (KV) cache of all previous sequences to be retrieved from memory at every decoding step to retrieve context information, leading to two primary bottlenecks during batch inference. First, there is a significant delay in obtaining the first token, as the information of the entire prompt must first be processed to prefill the KV cache.