Goto

Collaborating Authors

 brd


Basic Reading Distillation

Zhou, Zhi, Miao, Sirui, Duan, Xiangyu, Yang, Hao, Zhang, Min

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated remarkable abilities in various natural language processing areas, but they demand high computation resources which limits their deployment in real-world. Distillation is one technique to solve this problem through either knowledge distillation or task distillation. Both distillation approaches train small models to imitate specific features of LLMs, but they all neglect basic reading education for small models on generic texts that are \emph{unrelated} to downstream tasks. In this paper, we propose basic reading distillation (BRD) which educates a small model to imitate LLMs basic reading behaviors, such as named entity recognition, question raising and answering, on each sentence. After such basic education, we apply the small model on various tasks including language inference benchmarks and BIG-bench tasks. It shows that the small model can outperform or perform comparable to over 20x bigger LLMs. Analysis reveals that BRD effectively influences the probability distribution of the small model, and has orthogonality to either knowledge distillation or task distillation.


Optimized projection-free algorithms for online learning: construction and worst-case analysis

Weibel, Julien, Gaillard, Pierre, Koolen, Wouter M., Taylor, Adrien

arXiv.org Machine Learning

This work studies and develop projection-free algorithms for online learning with linear optimization oracles (a.k.a. Frank-Wolfe) for handling the constraint set. More precisely, this work (i) provides an improved (optimized) variant of an online Frank-Wolfe algorithm along with its conceptually simple potential-based proof, and (ii) shows how to leverage semidefinite programming to jointly design and analyze online Frank-Wolfe-type algorithms numerically in a variety of settings-that include the design of the variant (i). Based on the semidefinite technique, we conclude with strong numerical evidence suggesting that no pure online Frank-Wolfe algorithm within our model class can have a regret guarantee better than O(T^3/4) (T is the time horizon) without additional assumptions, that the current algorithms do not have optimal constants, that the algorithm benefits from similar anytime properties O(t^3/4) not requiring to know T in advance, and that multiple linear optimization rounds do not generally help to obtain better regret bounds.


Autoassociative Learning of Structural Representations for Modeling and Classification in Medical Imaging

Buchnajzer, Zuzanna, Dobek, Kacper, Hapke, Stanisław, Jankowski, Daniel, Krawiec, Krzysztof

arXiv.org Artificial Intelligence

Annotation of medical imaging is notoriously time-consuming, prone to human biases, and hard to reconcile with the insatiable demands of contemporary machine learning. Deep Learning (DL) models trained on annotated data are often narrow in focusing on features that are specific to a given context (anomaly, pathology, etc.) rather than discovering and capturing general characteristics of observed structures and processes, which may make them susceptible to deceptive image features and lead to inferior generalization. We posit that one of the primary causes of this challenge is the unstructured character of DL architectures. Contemporary DL models are essentially intertwined compositions of dot products and nonlinearities, conglomerates of often billions of unsophisticated units that process data in a highly distributed and continuous, non-symbolic fashion. Their training requires large volumes of data, which are often hard to come by, and involves exorbitant amounts of compute and energy. If the task is posed within the supervised learning paradigm, those data need to be not only curated, but also annotated (labeled), which limits their availability even further. Last but not least, as each processing unit takes care only of a minuscule fraction of inference, it is very hard to explain the model and its decisions to a human in a transparent and succinct fashion. In this study, we argue for stronger involvement of unlabeled data in the construction of analytic and diagnostic ML models and propose ASR, a neurosymbolic architecture trained to form Auto-associative Structural Representations, in which a generative decoder synthesizes physically plausible structural models that explain the observed image.


A Game-Theoretic Approach for Hierarchical Epidemic Control

Jia, Feiran, Mate, Aditya, Li, Zun, Jabbari, Shahin, Chakraborty, Mithun, Tambe, Milind, Wellman, Michael, Vorobeychik, Yevgeniy

arXiv.org Artificial Intelligence

Democratic governments and institutions typically have a hierarchical structure. For example, policies in the U.S., Canada, and many European democracies emerge from complex interactions among the federal and state governments, as well as county boards, city councils and mayors. Such interactions are characterized by inherent asymmetries across different levels of the hierarchy. On the one hand, the specifics of policy formulation and enforcement (e.g., training and deployment of personnel and updating of infrastructure) are generally in the hands of administrative bodies at lower levels of the hierarchy -- often the lowest level -- for practical reasons; actions these entities take are the ones that truly matter in the sense that they directly impact costs and benefits realized at all levels. On the other hand, entities at higher levels may have the power to impose constraints in some form or another on the policy-makers within their immediate jurisdiction (e.g., the U.S. federal government can constrain state policies); violations of these constraints, in turn, entail a noncompliance cost to the violator, such as legal costs, penalties, or reputation loss. Examples of such hierarchical policy structure arise in the spheres of education (e.g., topics to be included in primary education), healthcare (e.g., vaccination) and immigration. A preeminent recent example of such hierarchical policy-making is the response to the ongoing COVID-19 pandemic in countries with decentralized administration. Policies concerning social distancing, masking and vaccination have involved recommendations at the federal level, guidelines and restrictions at the state/province/district level, and measures adopted by specific counties, cities or even individual businesses and schools. In general, policies are contentious.


Artificial Intelligence In Agriculture – Another Place Where Medical Techniques Can Help

#artificialintelligence

As weird as it is to me to know that ranching is part of agriculture, I do find it interesting that the price points on medical technology means that new techniques can move into the industry. Artificial intelligence in medicine has been a big part of its growth. Now AI is also moving to help ranchers to better manage their herds. While much of the focus on AI in the field has been on vision and back-end analysis, the real world has a lot of another sense that matters – sound. In human medicine, of the first and easiest tools to use is the x-ray.


Global Big Data Conference

#artificialintelligence

One of the challenges in scaling up meat production are issues of disease for the animals. Take bovine respiratory disease (BRD), for example. This contagious infection is responsible for nearly half of all feedlot deaths for cattle every year in North America. The industry's costs for managing the disease come close to $1 billion annually. Preventative measures could significantly decrease these costs, and a small team comprising a data scientist, a college student and two entrepreneurs spent the past weekend at the Forbes Under 30 Agtech Hackathon figuring out a concept for better managing the disease.