Goto

Collaborating Authors

 graft


GRAFT: GRaPH and Table Reasoning for Textual Alignment -- A Benchmark for Structured Instruction Following and Visual Reasoning

Verma, Abhigya, Puttagunta, Sriram, Subramanian, Seganrasan, Ramachandran, Sravan

arXiv.org Artificial Intelligence

GRAFT is a structured multimodal benchmark designed to probe how well LLMs handle instruction following, visual reasoning, and tasks requiring tight visual textual alignment. The dataset is built around programmatically generated charts and synthetically rendered tables, each paired with a carefully constructed, multi step analytical question that depends solely on what can be inferred from the image itself. Responses are formatted in structured outputs such as JSON or YAML, enabling consistent and fine grained evaluation of both reasoning processes and adherence to output specifications. The benchmark further introduces a taxonomy of reasoning operations ranging from comparison and trend identification to ranking, aggregation, proportional estimation, and anomaly detection to support a comprehensive assessment of model capabilities. Taken together, GRAFT provides a unified and scalable framework for evaluating multimodal LLMs on visually grounded, structured reasoning tasks, offering a more rigorous standard for future benchmarking efforts.


GRAFT: Gradient-Aware Fast MaxVol Technique for Dynamic Data Sampling

Jha, Ashish, Phan, Anh huy, Dibo, Razan, Leplat, Valentin

arXiv.org Artificial Intelligence

Training modern neural networks on large datasets is computationally and environmentally costly. We introduce GRAFT, a scalable in-training subset selection method that (i) extracts a low-rank feature representation for each batch, (ii) applies a Fast MaxVol sampler to select a small, diverse subset that spans the batch's dominant subspace, and (iii) dynamically adjusts the subset size using a gradient-approximation criterion. By operating in low-rank subspaces and training on carefully chosen examples instead of full batches, GRAFT preserves the training trajectory while reducing wall-clock time, energy consumption, and $\mathrm{CO}_2$ emissions. Across multiple benchmarks, GRAFT matches or exceeds recent selection baselines in both accuracy and efficiency, providing a favorable trade-off between accuracy, efficiency, and emissions.


GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation

Dutta, Himanshu, Manchanda, Sunny, Bapat, Prakhar, Gurjar, Meva Ram, Bhattacharyya, Pushpak

arXiv.org Artificial Intelligence

Document level Machine Translation (DocMT) approaches often struggle with effectively capturing discourse level phenomena. Existing approaches rely on heuristic rules to segment documents into discourse units, which rarely align with the true discourse structure required for accurate translation. Otherwise, they fail to maintain consistency throughout the document during translation. To address these challenges, we propose Graph Augmented Agentic Framework for Document Level Translation (GRAFT), a novel graph based DocMT system that leverages Large Language Model (LLM) agents for document translation. Our approach integrates segmentation, directed acyclic graph (DAG) based dependency modelling, and discourse aware translation into a cohesive framework. Experiments conducted across eight translation directions and six diverse domains demonstrate that GRAFT achieves significant performance gains over state of the art DocMT systems. Specifically, GRAFT delivers an average improvement of 2.8 d BLEU on the TED test sets from IWSLT2017 over strong baselines and 2.3 d BLEU for domain specific translation from English to Chinese. Moreover, our analyses highlight the consistent ability of GRAFT to address discourse level phenomena, yielding coherent and contextually accurate translations.


Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment

Mall, Utkarsh, Phoo, Cheng Perng, Liu, Meilin Kelsey, Vondrick, Carl, Hariharan, Bharath, Bala, Kavita

arXiv.org Artificial Intelligence

We introduce a method to train vision-language models for remote-sensing images without using any textual annotations. Our key insight is to use co-located internet imagery taken on the ground as an intermediary for connecting remote-sensing images and language. Specifically, we train an image encoder for remote sensing images to align with the image encoder of CLIP using a large amount of paired internet and satellite images. Our unsupervised approach enables the training of a first-of-its-kind large-scale vision language model (VLM) for remote sensing images at two different resolutions. We show that these VLMs enable zero-shot, open-vocabulary image classification, retrieval, segmentation and visual question answering for satellite images. On each of these tasks, our VLM trained without textual annotations outperforms existing VLMs trained with supervision, with gains of up to 20% for classification and 80% for segmentation. Our planet is constantly captured by an extensive array of remote sensors such as satellites or drones. These earth observation images enable the monitoring of various events on the earth such as deforestation, forest fires, and droughts so that rapid actions can be taken to protect our environment. While these images can shed light on various insights about our planet, the scale of such data is huge. This has prompted the development of automatic analysis models that could extract relevant information from a large amount of remotely sensed images. While useful, these models are often specialized and can only recognize a pre-defined set of concepts. Besides, they could be complex, decreasing their accessibility to experts outside of the domain of artificial intelligence. Researchers developing automatic analysis methods for internet imagery encountered a similar problem a few years ago. One promising solution is to leverage large-scale vision-language models (VLMs) that are trained on millions or even billions of text-image pairs collected on the internet (Radford et al., 2021; Li et al., 2023). These models have demonstrated remarkable abilities to perform open-vocabulary recognition (Gu et al., 2022; Kuo et al., 2023) and enhance accessibility to non-AI experts (Alayrac et al., 2022; Surís et al., 2023). It would be incredibly valuable for a range of applications to replicate the success of openvocabulary recognition for satellite images as well, allowing an analyst to simply query, say, "Where are all the farmlands in the state of Massachusetts?" without requiring any new training or annotation for farms.


Learning-based and unrolled motion-compensated reconstruction for cardiac MR CINE imaging

Pan, Jiazhen, Rueckert, Daniel, Küstner, Thomas, Hammernik, Kerstin

arXiv.org Artificial Intelligence

Motion-compensated MR reconstruction (MCMR) is a powerful concept with considerable potential, consisting of two coupled sub-problems: Motion estimation, assuming a known image, and image reconstruction, assuming known motion. In this work, we propose a learning-based self-supervised framework for MCMR, to efficiently deal with non-rigid motion corruption in cardiac MR imaging. Contrary to conventional MCMR methods in which the motion is estimated prior to reconstruction and remains unchanged during the iterative optimization process, we introduce a dynamic motion estimation process and embed it into the unrolled optimization. We establish a cardiac motion estimation network that leverages temporal information via a group-wise registration approach, and carry out a joint optimization between the motion estimation and reconstruction. Experiments on 40 acquired 2D cardiac MR CINE datasets demonstrate that the proposed unrolled MCMR framework can reconstruct high quality MR images at high acceleration rates where other state-of-the-art methods fail. We also show that the joint optimization mechanism is mutually beneficial for both sub-tasks, i.e., motion estimation and image reconstruction, especially when the MR image is highly undersampled.


A robotic shoulder could make it easier to grow usable human tissue

MIT Technology Review

But growing usable human tendon cells--which need to stretch and twist--has proved trickier. Over the past two decades, scientists have encouraged engineered tendon cells and tissue to grow and mature by repeatedly stretching them in one direction. However, this approach has so far failed to produce fully functional tissue grafts that could be used clinically, in human bodies. A new study, published in Nature Communications Engineering today, shows how humanoid robots could be used to make engineered tendon tissue that is more like the real thing. "The clinical need is clearly there," says Pierre-Alexis Mouthuy from the University of Oxford, who led the team.


Slack's former head of machine learning wants to put AI in reach of every company – TechCrunch

#artificialintelligence

Adam Oliner, co-founder and CEO of Graft used to run machine learning at Slack, where he helped build the company's internal artificial intelligence infrastructure. Slack lacked the resources of a company like Meta or Google, but it still had tons of data to sift through and it was his job to build something on a smaller scale to help put AI to work on the dataset. With a small team, he could only build what he called a "miniature" solution in comparison to the web scale counterparts. After he and his team built it, however, he realized that it was broadly applicable and could help other smaller organizations tap into AI and machine learning without huge resources. "We built a sort of mini Graft at Slack for driving semantic search and recommendations throughout the product. And it was hugely effective … And that was when we said, this is so useful, and so powerful if we can get this into the hands of most organizations, we think we could really change the way people interact with their data and interact with AI," Oliner told me.


Scalable Hierarchical Clustering with Tree Grafting

Monath, Nicholas, Kobren, Ari, Krishnamurthy, Akshay, Glass, Michael, McCallum, Andrew

arXiv.org Machine Learning

We introduce Grinch, a new algorithm for large-scale, non-greedy hierarchical clustering with general linkage functions that compute arbitrary similarity between two point sets. The key components of Grinch are its rotate and graft subroutines that efficiently reconfigure the hierarchy as new points arrive, supporting discovery of clusters with complex structure. Grinch is motivated by a new notion of separability for clustering with linkage functions: we prove that when the model is consistent with a ground-truth clustering, Grinch is guaranteed to produce a cluster tree containing the ground-truth, independent of data arrival order. Our empirical results on benchmark and author coreference datasets (with standard and learned linkage functions) show that Grinch is more accurate than other scalable methods, and orders of magnitude faster than hierarchical agglomerative clustering.


Multi-Task Survival Analysis of Liver Transplantation Using Deep Learning

Farzindar, Atefeh (Anna) (University of Southern California) | Kashi, Anirudh (University of Southern California)

AAAI Conferences

In this paper, we present the application of deep learning techniques to develop a modern model for the prediction of graft failure and survival analysis in liver transplant patients. We trained our model using the United Network for Organ Sharing (UNOS) dataset consisting of 59,115 patients from year 2002 to 2016 with around 150 features each. We also compare our model against an- other dataset – Scientific Registry of Transplant Recipients (SRTR) including 87,334 patients from year 2002 to 2018 – after selecting features by mapping them from UNOS data. Some of the most important features common to both datasets are Model for End-stage Liver Disease (MELD) score, patient body mass index (BMI), donor and patient age, cold ischemia time, and levels of various chemicals within the patient. To provide an additional tool to clinical practitioners in the allocation of a scarce resource, we developed a multi-task model to learn the survival function of a donor-recipient pair and hence predict the exact time of failure which outper- forms the traditional cox hazard models. The multi-task model produces very promising C-index results of 0.82 and 0.57 on the SRTR and UNOS datasets respectively.


Robots with artificial skin?

FOX News

Intelligent algorithms are starting to perceive sights and sounds like human beings. Androids are taking more anthropomorphic forms, powered by actuators wrapped silicone and latex skins. Even these skins are becoming increasingly lifelike. Earlier this year, researchers created an artificial material that's twice as sensitive as human skin. And this month, a team of Oxford professors proposed a provocative idea -- grow human tissue on humanoid robots.