condor
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement
Cao, Maosong, Zhang, Taolin, Li, Mo, Zhang, Chuyu, Liu, Yunxin, Duan, Haodong, Zhang, Songyang, Chen, Kai
The quality of Supervised Fine-Tuning (SFT) data plays a critical role in enhancing the conversational capabilities of Large Language Models (LLMs). However, as LLMs become more advanced, the availability of high-quality human-annotated SFT data has become a significant bottleneck, necessitating a greater reliance on synthetic training data. In this work, we introduce Condor, a novel two-stage synthetic data generation framework that incorporates World Knowledge Tree and Self-Reflection Refinement to produce high-quality SFT data at scale. Our experimental results demonstrate that a base model fine-tuned on only 20K Condor-generated samples achieves superior performance compared to counterparts. The additional refinement stage in Condor further enables iterative self-improvement for LLMs at various scales (up to 72B), validating the effectiveness of our approach. Furthermore, our investigation into the scaling for synthetic data in post-training reveals substantial unexplored potential for performance improvements, opening promising avenues for future research.
Stable Motion Primitives via Imitation and Contrastive Learning
Pรฉrez-Dattari, Rodrigo, Kober, Jens
Learning from humans allows non-experts to program robots with ease, lowering the resources required to build complex robotic solutions. Nevertheless, such data-driven approaches often lack the ability to provide guarantees regarding their learned behaviors, which is critical for avoiding failures and/or accidents. In this work, we focus on reaching/point-to-point motions, where robots must always reach their goal, independently of their initial state. This can be achieved by modeling motions as dynamical systems and ensuring that they are globally asymptotically stable. Hence, we introduce a novel Contrastive Learning loss for training Deep Neural Networks (DNN) that, when used together with an Imitation Learning loss, enforces the aforementioned stability in the learned motions. Differently from previous work, our method does not restrict the structure of its function approximator, enabling its use with arbitrary DNNs and allowing it to learn complex motions with high accuracy. We validate it using datasets and a real robot. In the former case, motions are 2 and 4 dimensional, modeled as first- and second-order dynamical systems. In the latter, motions are 3, 4, and 6 dimensional, of first and second order, and are used to control a 7DoF robot manipulator in its end effector space and joint space. More details regarding the real-world experiments are presented in: \url{https://youtu.be/OM-2edHBRfc}.
Programmers, beware: ChatGPT has ruined your magic trick John Naughton
Benedict Evans, a tech analyst whose newsletter is required reading for those who follow the industry, made an interesting point this week. He had, he said, been talking to generalist journalists who "were still under the impression that ChatGPT was a trivial parlour trick and the whole thing was about as interesting as a new iPhone app". On the other hand, he continued, "most people in tech are walking around slowly, holding on to the top of their head with both hands to stop it flying off. But within that, I think we can see a range of attitudes." We certainly can โ on a spectrum ranging from the view that this "generative AI" is going to be the biggest bonanza since the invention of the wheel, to fears that it augurs an existential risk to humanity, and numerous opinions in between.
GraphBreak: Tool for Network Community based Regulatory Medicine, Gene co-expression, Linkage Disequilibrium analysis, functional annotation and more
Graph network science is becoming increasingly popular, notably in big-data perspective where understanding individual entities for individual functional roles is complex and time consuming. It is likely when a set of genes are regulated by a set of genetic variants, the genes set is recruited for a common or related functional purpose. Grouping and extracting communities from network of associations becomes critical to understand system complexity, thus prioritizing genes for dis-ease and functional associations. Workload is reduced when studying entities one at a time. For this, we present GraphBreak, a suite of tools for community detection application, such as for gene co-expression, protein interaction, regulation network, etc.Although developed for use case of eQTLs regulatory genomic net-work community study -- results shown with our analysis with sample eQTL data. Graphbreak can be deployed for other studies if input data has been fed in requisite format, including but not limited to gene co-expression networks, protein-protein interaction network, signaling pathway and metabolic network. Graph-Break showed critical use case value in its downstream analysis for disease association of communities detected. If all independent steps of community detection and analysis are a step-by-step sub-part of the algorithm, GraphBreak can be considered a new algorithm for community based functional characterization. Combination of various algorithmic implementation modules into a single script for this purpose illustrates GraphBreak novelty. Compared to other similar tools, with GraphBreak we can better detect communities with over-representation of its member genes for statistical association with diseases, therefore target genes which can be prioritized for drug-positioning or drug-re-positioning as the case be.
U.S. To Equip MQ-9 Reaper Drones With Artificial Intelligence
The Pentagon's Joint Artificial Intelligence Center has awarded a $93.3 million contract to General Atomics Aeronautical Systems Inc (GA-ASI), makers of the MQ-9 Reaper, to equip the drone with new AI technology. The aim is for the Reaper to be able to carry out autonomous flight, decide where to direct its battery of sensors, and to recognize objects on the ground. The contract, announced at the end of last month, builds on a successful test earlier this year. In some ways this is not a major development, more of an incremental step using existing technology. What makes it significant is the drone that is being equipped, and what it will be able to do afterwards.
Finding Stable Groups of Cross-Correlated Features in Multi-View data
Dewaskar, Miheer, Palowitch, John, He, Mark, Love, Michael I., Nobel, Andrew
Multi-view data, in which data of different types are obtained from a common set of samples, is now common in many scientific problems. An important problem in the analysis of multi-view data is identifying interactions between groups of features from different data types. A bimodule is a pair $(A,B)$ of feature sets from two different data types such that the aggregate cross-correlation between the features in $A$ and those in $B$ is large. A bimodule $(A,B)$ is stable if $A$ coincides with the set of features having significant aggregate correlation with the features in $B$, and vice-versa. At the population level, stable bimodules correspond to connected components of the cross-correlation network, which is the bipartite graph whose edges are pairs of features with non-zero cross-correlations. We develop an iterative, testing-based procedure, called BSP, to identify stable bimodules in two moderate- to high-dimensional data sets. BSP relies on permutation-based p-values for sums of squared cross-correlations. We efficiently approximate the p-values using tail probabilities of gamma distributions that are fit using analytical estimates of the permutation moments of the test statistic. Our moment estimates depend on the eigenvalues of the intra-correlation matrices of $A$ and $B$ and as a result, the significance of observed cross-correlations accounts for the correlations within each data type. We carry out a thorough simulation study to assess the performance of BSP, and present an extended application of BSP to the problem of expression quantitative trait loci (eQTL) analysis using recent data from the GTEx project. In addition, we apply BSP to climatology data in order to identify regions in North America where annual temperature variation affects precipitation.
Drone Delivery Canada All Set To Make Cargo Drone A Reality
According to March press release, Drone Delivery Canada stated that the'Condor' (the cargo drone) would be able to travel 150 km, carry a payload of 400 pounds and handle pallet-sized shipments, making it ideal for transporting bulk cargos. As of now, DDC is working on this model. "Our engineering team is focused on building out our fleet to provide drones capable of addressing a wide range of client requirements in different geographies," said Tony Di Benedetto, CEO of Drone Delivery Canada. The company stated that the Condor is their first delivery drone that will offer the customer a platform for greater capabilities of bulk shipments, and the Condor would be fully integrated with the company's proprietary FLYTETM management system. About DEEPAERO DEEP AERO is a global leader in drone technology innovation.