Goto

Collaborating Authors

 Atlantic Ocean


Can We Verify Step by Step for Incorrect Answer Detection?

arXiv.org Artificial Intelligence

Chain-of-Thought (CoT) prompting has marked a significant advancement in enhancing the reasoning capabilities of large language models (LLMs). Previous studies have developed various extensions of CoT, which focus primarily on enhancing end-task performance. In addition, there has been research on assessing the quality of reasoning chains in CoT. This raises an intriguing question: Is it possible to predict the accuracy of LLM outputs by scrutinizing the reasoning chains they generate? To answer this research question, we introduce a benchmark, R2PE, designed specifically to explore the relationship between reasoning chains and performance in various reasoning tasks spanning five different domains. This benchmark aims to measure the falsehood of the final output of LLMs based on the reasoning steps. To make full use of information in multiple reasoning chains, we propose the process discernibility score (PDS) framework that beats the answer-checking baseline by a large margin. Concretely, this resulted in an average of 5.1% increase in the F1 score across all 45 subsets within R2PE. We further demonstrate our PDS's efficacy in advancing open-domain QA accuracy. Data and code are available at https://github.com/XinXU-USTC/R2PE.


Missile strike on Belgorod, Russia, kills 6, injures 18

FOX News

Seven people, including three children, were killed in a Russian drone attack on a gas station in the Ukrainian city of Kharkiv on Saturday. A missile strike on the Russian city of Belgorod near the Ukraine border on Thursday killed six people, including a child, and injured 18 others, a Russian official said. It was the latest in exchanges of long-range missile and rocket fire in Russia's war on Ukraine. Hours earlier, Russia fired two dozen cruise and ballistic missiles at a broad area of Ukraine, hitting multiple regions after a midnight strike in Ukraine's northeast killed five people in an apartment building, authorities said. Five of the 18 people injured in Belgorod, a city of around 340,000 people, were children, regional Gov. Vyacheslav Gladkov said on Telegram.


Kyiv aims to use more Ukrainian drones; Trump, Biden clash on NATO

Al Jazeera

Ukraine changed its military leadership and announced a change of tactics in the past week, as a vote in the US Senate brought renewed hope of US aid for the embattled country. Ukrainian President Volodymyr Zelenskyy appointed ground forces commander Oleksandr Syrskii as commander-in-chief of the armed forces on February 8. Zelenskyy reportedly asked the outgoing Valery Zaluzhny to "continue to be part of the team", without specifying what that meant. "We stood against a vile and powerful enemy. Endured together," wrote Zaluzhny, an immensely popular general who stopped Russia's invasion in February 2022 and ordered a counterattack in August that year, which claimed more than 1,500sq km (580sq miles) Since then, Ukrainian forces have become bogged down in positional warfare. A counteroffensive last summer failed to achieve its goal of cutting the Russian front in two.


Russia-Ukraine war: List of key events, day 723

Al Jazeera

Ukraine said it critically damaged the Caesar Kunikov, a Russian landing warship, off occupied Crimea, in a drone attack, the latest blow to the Russian navy's Black Sea Fleet. Ukraine said the ship, one of Russia's newest vessels, had a crew of 87 and had taken part in wars in Georgia and Syria as well as Ukraine. There was no official comment from Russia on the attack. Newly-appointed Ukrainian armed forces chief Oleksandr Syrskyii visited troops fighting around the key flashpoint of Avdiivka on the eastern front line, and described the situation as "extremely complex and stressful". Syrskyii, who was accompanied by Defence Minister Rustem Umerov, said Russian forces had "a numerical advantage in personnel".


Generative Representational Instruction Tuning

arXiv.org Artificial Intelligence

All text-based language problems can be reduced to either generation or embedding. Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is trained to handle both generative and embedding tasks by distinguishing between them through instructions. Compared to other open models, our resulting GritLM 7B sets a new state of the art on the Massive Text Embedding Benchmark (MTEB) and outperforms all models up to its size on a range of generative tasks. By scaling up further, GritLM 8x7B outperforms all open generative language models that we tried while still being among the best embedding models. Notably, we find that GRIT matches training on only generative or embedding data, thus we can unify both at no performance loss. Among other benefits, the unification via GRIT speeds up Retrieval-Augmented Generation (RAG) by > 60% for long documents, by no longer requiring separate retrieval and generation models. Models, code, etc. are freely available at https://github.com/ContextualAI/gritlm.


Russian landing ship Caesar Kunikov sunk off Crimea, says Ukraine

BBC News

There was no confirmation from Russia's navy that the Caesar Kunikov had been sunk in the Black Sea, merely that six Ukrainian drones had been destroyed. Video appearing to show the aftermath of the Ukrainian attack was uploaded only recently, BBC Verify confirmed.


Long-form evaluation of model editing

arXiv.org Artificial Intelligence

Evaluations of model editing currently only use the `next few token' completions after a prompt. As a result, the impact of these methods on longer natural language generation is largely unknown. We introduce long-form evaluation of model editing (\textbf{\textit{LEME}}) a novel evaluation protocol that measures the efficacy and impact of model editing in long-form generative settings. Our protocol consists of a machine-rated survey and a classifier which correlates well with human ratings. Importantly, we find that our protocol has very little relationship with previous short-form metrics (despite being designed to extend efficacy, generalization, locality, and portability into a long-form setting), indicating that our method introduces a novel set of dimensions for understanding model editing methods. Using this protocol, we benchmark a number of model editing techniques and present several findings including that, while some methods (ROME and MEMIT) perform well in making consistent edits within a limited scope, they suffer much more from factual drift than other methods. Finally, we present a qualitative analysis that illustrates common failure modes in long-form generative settings including internal consistency, lexical cohesion, and locality issues.


Hybrid Machine Learning techniques in the management of harmful algal blooms impact

arXiv.org Artificial Intelligence

Harmful algal blooms (HABs) are episodes of high concentrations of algae that are potentially toxic for human consumption. Mollusc farming can be affected by HABs because, as filter feeders, they can accumulate high concentrations of marine biotoxins in their tissues. To avoid the risk to human consumption, harvesting is prohibited when toxicity is detected. At present, the closure of production areas is based on expert knowledge and the existence of a predictive model would help when conditions are complex and sampling is not possible. Although the concentration of toxin in meat is the method most commonly used by experts in the control of shellfish production areas, it is rarely used as a target by automatic prediction models. This is largely due to the irregularity of the data due to the established sampling programs. As an alternative, the activity status of production areas has been proposed as a target variable based on whether mollusc meat has a toxicity level below or above the legal limit. This new option is the most similar to the actual functioning of the control of shellfish production areas. For this purpose, we have made a comparison between hybrid machine learning models like Neural-Network-Adding Bootstrap (BAGNET) and Discriminative Nearest Neighbor Classification (SVM-KNN) when estimating the state of production areas. The study has been carried out in several estuaries with different levels of complexity in the episodes of algal blooms to demonstrate the generalization capacity of the models in bloom detection. As a result, we could observe that, with an average recall value of 93.41% and without dropping below 90% in any of the estuaries, BAGNET outperforms the other models both in terms of results and robustness.


Machine Learning in management of precautionary closures caused by lipophilic biotoxins

arXiv.org Artificial Intelligence

Mussel farming is one of the most important aquaculture industries. The main risk to mussel farming is harmful algal blooms (HABs), which pose a risk to human consumption. In Galicia, the Spanish main producer of cultivated mussels, the opening and closing of the production areas is controlled by a monitoring program. In addition to the closures resulting from the presence of toxicity exceeding the legal threshold, in the absence of a confirmatory sampling and the existence of risk factors, precautionary closures may be applied. These decisions are made by experts without the support or formalisation of the experience on which they are based. Therefore, this work proposes a predictive model capable of supporting the application of precautionary closures. Achieving sensitivity, accuracy and kappa index values of 97.34%, 91.83% and 0.75 respectively, the kNN algorithm has provided the best results. This allows the creation of a system capable of helping in complex situations where forecast errors are more common.


For Ukraine's defence industry ambitions, the sky's the limit

Al Jazeera

As Ukraine approaches the second anniversary of Russia's full-scale invasion, it plans to produce more if its own ammunition and key weapons systems. The goal of greater self-sufficiency comes as Ukraine's Western allies meet increasing political resistance to military aid and Russia ramps up weapons production. Last month, Ukraine's prime minister, Denys Shmyhal, said the country plans to increase its domestic weapons production sixfold this year. Ukraine's defence industry has already begun to expand. Strategic industries minister Oleksandr Kamyshin said Ukraine last year doubled its ammunition production for NATO-calibre artillery systems.