block
MultiPDENet: PDE-embedded Learning with Multi-time-stepping for Accelerated Flow Simulation
Wang, Qi, Mi, Yuan, Wang, Haoyun, Zhang, Yi, Chengze, Ruizhi, Liu, Hongsheng, Wen, Ji-Rong, Sun, Hao
Solving partial differential equations (PDEs) by numerical methods meet computational cost challenge for getting the accurate solution since fine grids and small time steps are required. Machine learning can accelerate this process, but struggle with weak generalizability, interpretability, and data dependency, as well as suffer in long-term prediction. To this end, we propose a PDE-embedded network with multiscale time stepping (MultiPDENet), which fuses the scheme of numerical methods and machine learning, for accelerated simulation of flows. In particular, we design a convolutional filter based on the structure of finite difference stencils with a small number of parameters to optimize, which estimates the equivalent form of spatial derivative on a coarse grid to minimize the equation's residual. A Physics Block with a 4th-order Runge-Kutta integrator at the fine time scale is established that embeds the structure of PDEs to guide the prediction. To alleviate the curse of temporal error accumulation in long-term prediction, we introduce a multiscale time integration approach, where a neural network is used to correct the prediction error at a coarse time scale. Experiments across various PDE systems, including the Navier-Stokes equations, demonstrate that MultiPDENet can accurately predict long-term spatiotemporal dynamics, even given small and incomplete training data, e.g., spatiotemporally down-sampled datasets. MultiPDENet achieves the state-of-the-art performance compared with other neural baseline models, also with clear speedup compared to classical numerical methods.
DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control
Cui, Zichen Jeff, Pan, Hengkai, Iyer, Aadhithya, Haldar, Siddhant, Pinto, Lerrel
Imitation learning has proven to be a powerful tool for training complex visuomotor policies. However, current methods often require hundreds to thousands of expert demonstrations to handle high-dimensional visual observations. A key reason for this poor data efficiency is that visual representations are predominantly either pretrained on out-of-domain data or trained directly through a behavior cloning objective. In this work, we present DynaMo, a new in-domain, self-supervised method for learning visual representations. Given a set of expert demonstrations, we jointly learn a latent inverse dynamics model and a forward dynamics model over a sequence of image embeddings, predicting the next frame in latent space, without augmentations, contrastive sampling, or access to ground truth actions. Importantly, DynaMo does not require any out-of-domain data such as Internet datasets or cross-embodied datasets. On a suite of six simulated and real environments, we show that representations learned with DynaMo significantly improve downstream imitation learning performance over prior self-supervised learning objectives, and pretrained representations. Gains from using DynaMo hold across policy classes such as Behavior Transformer, Diffusion Policy, MLP, and nearest neighbors. Finally, we ablate over key components of DynaMo and measure its impact on downstream policy performance. Robot videos are best viewed at https://dynamo-ssl.github.io
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York (0.04)
- Europe > Albania > Durrës County (0.04)
- Asia > Japan (0.04)
Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance
Ahn, Donghoon, Cho, Hyoungwon, Min, Jaewon, Jang, Wooseok, Kim, Jungwoo, Kim, SeonHwa, Park, Hyun Hee, Jin, Kyong Hwan, Kim, Seungryong
Recent studies have demonstrated that diffusion models are capable of generating high-quality samples, but their quality heavily depends on sampling guidance techniques, such as classifier guidance (CG) and classifier-free guidance (CFG). These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration. In this paper, we propose a novel sampling guidance, called Perturbed-Attention Guidance (PAG), which improves diffusion sample quality across both unconditional and conditional settings, achieving this without requiring additional training or the integration of external modules. PAG is designed to progressively enhance the structure of samples throughout the denoising process. It involves generating intermediate samples with degraded structure by substituting selected self-attention maps in diffusion U-Net with an identity matrix, by considering the self-attention mechanisms' ability to capture structural information, and guiding the denoising process away from these degraded samples. In both ADM and Stable Diffusion, PAG surprisingly improves sample quality in conditional and even unconditional scenarios. Moreover, PAG significantly improves the baseline performance in various downstream tasks where existing guidances such as CG or CFG cannot be fully utilized, including ControlNet with empty prompts and image restoration such as inpainting and deblurring.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)
Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How
Arango, Sebastian Pineda, Ferreira, Fabio, Kadra, Arlind, Hutter, Frank, Grabocka, Josif
With the ever-increasing number of pretrained models, machine learning practitioners are continuously faced with which pretrained model to use, and how to finetune it for a new dataset. In this paper, we propose a methodology that jointly searches for the optimal pretrained model and the hyperparameters for finetuning it. Our method transfers knowledge about the performance of many pretrained models with multiple hyperparameter configurations on a series of datasets. To this aim, we evaluated over 20k hyperparameter configurations for finetuning 24 pretrained image classification models on 87 datasets to generate a large-scale meta-dataset. We meta-learn a multi-fidelity performance predictor on the learning curves of this meta-dataset and use it for fast hyperparameter optimization on new datasets. We empirically demonstrate that our resulting approach can quickly select an accurate pretrained model for a new dataset together with its optimal hyperparameters.
- Europe > Germany > Baden-Württemberg > Freiburg (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (5 more...)
vehicle-detection by JunshengFu
Anaconda is used for managing my dependencies. You can download the weight from here and save it to the weights folder. The code for this step is contained in the function named extract_features and codes from line 464 to 552 in svm_pipeline.py. If the SVM classifier exist, load it directly. Otherwise, I started by reading in all the vehicle and non-vehicle images, around 8000 images in each category.
Robot Planning
Drew McDermott Research on planning for robots is in such a state of flux that there is disagreement about what planning is and whether it is necessary. We can take planning to be the optimization and debugging of a robot's program by reasoning about possible courses of execution. It is necessary to the extent that fragments of robot programs are combined at run time. There are several strands of research in the field; I survey six: (1) attempts to avoid planning; (2) the design of flexible plan notations; (3) theories of time-constrained planning; (4) planning by projecting and repairing faulty plans; (5) motion planning; and (6) the learning of optimal behaviors from reinforcements. More research is needed on formal semantics for robot plans.
Universal Planning: An (Almost) Universally Bad Idea
To present a sharp criticism of the approach known as universal planning, I begin by giving a precise definition of it. The key idea in this work is that an agent is working to achieve some goal and that to determine what to do next in the pursuit of this goal, the agent finds its current situation in a large table that prescribes the correct action to take. Of course, the action suggested by the table might simply be, "Think about your current situation and decide what to do next." This method is, in many ways, representative of the conventional approach to planning; however, what distinguishes universal plans from conventional plans is that the action suggested by a universal plan is always a primitive one that the agent can execute immediately (Agre and Chapman 1987; Drummond 1988; Kaelbling 1988; Nilsson 1989; Rosenschein and Kaelbling 1986; Schoppers 1987). Several authors have recently suggested that a possible approach to planning in uncertain domains is to analyze all possible situations beforehand and then store information about what to do in each.
554
An important task in postal automation technology is determining the position and orientation of the destination address block in the image of a mail piece such as a letter, magazine, or parcel. The corresponding subimage is then presented to a human operator or a machine reader (optical character reader) that can read the zip code and, if necessary, other address information and direct the mail piece to the appropriate sorting bin Analysis of physical characteristics of mail pieces indicates that in order to automate the addressfinding task, several different image analysis operations are necessary Some examples are locating a rectangular white address label on a multicolor background, progressively grouping characters into text lines and text Lines into text blocks, eliminating candidate regions by specialized detectors (fol example, detecting regions such as postage stamps), and identifying handwritten regions. A typical mail piece has several regions or blocks that are meaningful to mail processing, for example, address blocks (destination and return), postage [meter mark or stamp) as well as extraneous blocks WINTER 1987 25 Figure 1. The heuristics listed in the previous section suggest that the design of ABLS consist of several specialized tools that are appropriately deployed. Rule R2 suggests the need for a tool to detect postage fluorescence, rule R3 a tool for isolating blocks of a certain color, rule R4 for discriminating between handwriting and print, and so on.
Spar: A Planner That Satisfies Operational and Geometric Goals in Uncertain Environments
A prerequisite for intelligent behavior is the ability to reason about actions and their effects. This ability is the essence of the classical AI planning problem in which plans are constructed by reasoning about how available actions can be applied to achieve various goals. For this reasoning process to occur, the planner must be aware of its available actions, the situations in which they are applicable, and the changes affected in the world by their execution. Classical AI planners typically use a highlevel, symbolic representation of actions (for example, well-formed formulas from predicate calculus). Although this type of representational scheme is attractive from a computational standpoint, it cannot adequately represent the intricacies of a domain that includes complex actions, such as robotic assembly (consider, for example, that any geometric configuration of the robotic manipulator is a rather complex function of six joint angles).