Goto

Collaborating Authors

 best match


DeepWriter: A Fact-Grounded Multimodal Writing Assistant Based On Offline Knowledge Base

Mao, Song, Cheng, Lejun, Cai, Pinlong, Yan, Guohang, Wang, Ding, Shi, Botian

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable capabilities in various applications. However, their use as writing assistants in specialized domains like finance, medicine, and law is often hampered by a lack of deep domain-specific knowledge and a tendency to hallucinate. Existing solutions, such as Retrieval-Augmented Generation (RAG), can suffer from inconsistency across multiple retrieval steps, while online search-based methods often degrade quality due to unreliable web content. To address these challenges, we introduce DeepWriter, a customizable, multimodal, long-form writing assistant that operates on a curated, offline knowledge base. DeepWriter leverages a novel pipeline that involves task decomposition, outline generation, multimodal retrieval, and section-by-section composition with reflection. By deeply mining information from a structured corpus and incorporating both textual and visual elements, DeepWriter generates coherent, factually grounded, and professional-grade documents. We also propose a hierarchical knowledge representation to enhance retrieval efficiency and accuracy. Our experiments on financial report generation demonstrate that DeepWriter produces high-quality, verifiable articles that surpasses existing baselines in factual accuracy and generated content quality.


UnPaSt: unsupervised patient stratification by differentially expressed biclusters in omics data

Hartung, Michael, Maier, Andreas, Delgado-Chaves, Fernando, Burankova, Yuliya, Isaeva, Olga I., Patroni, Fábio Malta de Sá, He, Daniel, Shannon, Casey, Kaufmann, Katharina, Lohmann, Jens, Savchik, Alexey, Hartebrodt, Anne, Chervontseva, Zoe, Firoozbakht, Farzaneh, Probul, Niklas, Zotova, Evgenia, Tsoy, Olga, Blumenthal, David B., Ester, Martin, Laske, Tanja, Baumbach, Jan, Zolotareva, Olga

arXiv.org Artificial Intelligence

Most complex diseases, including cancer and non-malignant diseases like asthma, have distinct molecular subtypes that require distinct clinical approaches. However, existing computational patient stratification methods have been benchmarked almost exclusively on cancer omics data and only perform well when mutually exclusive subtypes can be characterized by many biomarkers. Here, we contribute with a massive evaluation attempt, quantitatively exploring the power of 22 unsupervised patient stratification methods using both, simulated and real transcriptome data. From this experience, we developed UnPaSt (https://apps.cosy.bio/unpast/) optimizing unsupervised patient stratification, working even with only a limited number of subtype-predictive biomarkers. We evaluated all 23 methods on real-world breast cancer and asthma transcriptomics data. Although many methods reliably detected major breast cancer subtypes, only few identified Th2-high asthma, and UnPaSt significantly outperformed its closest competitors in both test datasets. Essentially, we showed that UnPaSt can detect many biologically insightful and reproducible patterns in omic datasets.


Improvement in Semantic Address Matching using Natural Language Processing

Gupta, Vansh, Gupta, Mohit, Garg, Jai, Garg, Nitesh

arXiv.org Artificial Intelligence

Address matching is an important task for many businesses especially delivery and take out companies which help them to take out a certain address from their data warehouse. Existing solution uses similarity of strings, and edit distance algorithms to find out the similar addresses from the address database, but these algorithms could not work effectively with redundant, unstructured, or incomplete address data. This paper discuss semantic Address matching technique, by which we can find out a particular address from a list of possible addresses. We have also reviewed existing practices and their shortcoming. Semantic address matching is an essentially NLP task in the field of deep learning. Through this technique We have the ability to triumph the drawbacks of existing methods like redundant or abbreviated data problems. The solution uses the OCR on invoices to extract the address and create the data pool of addresses. Then this data is fed to the algorithm BM-25 for scoring the best matching entries. Then to observe the best result, this will pass through BERT for giving the best possible result from the similar queries. Our investigation exhibits that our methodology enormously improves both accuracy and review of cutting-edge technology existing techniques.


Feature-based Image Matching for Identifying Individual K\=ak\=a

O'Sullivan, Fintan, Escott, Kirita-Rose, Shaw, Rachael C., Lensen, Andrew

arXiv.org Artificial Intelligence

This report investigates an unsupervised, feature-based image matching pipeline for the novel application of identifying individual k\=ak\=a. Applied with a similarity network for clustering, this addresses a weakness of current supervised approaches to identifying individual birds which struggle to handle the introduction of new individuals to the population. Our approach uses object localisation to locate k\=ak\=a within images and then extracts local features that are invariant to rotation and scale. These features are matched between images with nearest neighbour matching techniques and mismatch removal to produce a similarity score for image match comparison. The results show that matches obtained via the image matching pipeline achieve high accuracy of true matches. We conclude that feature-based image matching could be used with a similarity network to provide a viable alternative to existing supervised approaches.


How to create a chatbot in Python

#artificialintelligence

Natural language processing (NLP) is one of the most promising fields of artificial intelligence that uses natural languages to enable human interactions with machines. There are two main approaches to NLP: – rule-based methods, – statistical methods, i.e., methods related to machine learning. There are several exciting Python libraries for NLP, such as Natural Language Toolkit (NLTK), spaCy, TextBlob, etc. A chatbot is a computer software able to interact with humans using a natural language. They usually rely on machine learning, especially on NLP. Apple's Siri, Amazon's Alexa, Google Assitant, and Microsoft's Cortana are some well-known examples of software able to process natural languages.


How to create a chatbot in Python

#artificialintelligence

Today we will talk about how to create a chatbot with Python. Natural language processing (NLP) is one of the most promising fields of artificial intelligence that uses natural languages to enable human interactions with machines. There are two main approaches to NLP: – rule-based methods, – statistical methods, i.e., methods related to machine learning. There are several exciting Python libraries for NLP, such as Natural Language Toolkit (NLTK), spaCy, TextBlob, etc. A chatbot is a computer software able to interact with humans using a natural language. They usually rely on machine learning, especially on NLP.


Disrupting Healthcare with Artificial Intelligence

#artificialintelligence

The healthcare industry is evolving with the exponential increase in the exploration of artificial intelligence (AI). These implications go far beyond technology, points out the Everest Group, with the majority of AI decisions impacting everything from customer experience to cost to business processes. While there are certainly huge cost impacts (think: reduced need for customer care executives and reduced cost of population health management) as well as significant business impacts (think: increased healthcare savings and enhanced patient experience), the operational impact is perhaps the most vital because it personalizes patient care. To that end, physicians can make more accurate diagnoses and more efficiently engage with patients on a daily basis. This is where today's blog will focus: preventing physician burnout in the healthcare industry with the help of AI.


Unaligned Sequence Similarity Search Using Deep Learning

Senter, James K., Royalty, Taylor M., Steen, Andrew D., Sadovnik, Amir

arXiv.org Machine Learning

--Gene annotation has traditionally required direct comparison of DNA sequences between an unknown gene and a database of known ones using string comparison methods. However, these methods do not provide useful information when a gene does not have a close match in the database. In addition, each comparison can be costly when the database is large since it requires alignments and a series of string comparisons. In this work we propose a novel approach: using recurrent neural networks to embed DNA or amino-acid sequences in a low-dimensional space in which distances correlate with functional similarity. This embedding space overcomes both shortcomings of the method of aligning sequences and comparing homology. First, it allows us to obtain information about genes which do not have exact matches by measuring their similarity to other ones in the database. If our database is labeled this can provide labels for a query gene as is done in traditional methods. However, even if the database is unlabeled it allows us to find clusters and infer some characteristics of the gene population. In addition, each comparison is much faster than traditional methods since the distance metric is reduced to the Euclidean distance, and thus efficient approximate nearest neighbor algorithms can be used to find the best match. More specifically we show how our embedding can be useful for both classification tasks when our labels are known, and clustering tasks where our sequences belong to classes which have not been seen before. The central dogma of biology states that all organisms contain DNA, which is transcribed into RNA and then translated into proteins, which catalyze the chemical reactions that define life.


Hinge's newest feature claims to use machine learning to find your best match

#artificialintelligence

Most Compatible -- attempts to use all your cumulative data to find the perfect match for you. The company's been testing this feature, which occasionally recommends a possible match to users, for at least month now. Those recommendations were only offered once a week during testing but will now come every day. Justin McLeod, Hinge's CEO, tells me the company spent the testing time honing its backend algorithm and getting Most Compatible to a point where the company feels confident putting it fully out there. Most Compatible, he says, uses machine learning to figure out each user's taste.


On Hyperparameter Search in Cluster Ensembles

Helfmann, Luzie, von Lindheim, Johannes, Mollenhauer, Mattes, Banisch, Ralf

arXiv.org Machine Learning

Quality assessments of models in unsupervised learning and clustering verification in particular have been a long-standing problem in the machine learning research. The lack of robust and universally applicable cluster validity scores often makes the algorithm selection and hyperparameter evaluation a tough guess. In this paper, we show that cluster ensemble aggregation techniques such as consensus clustering may be used to evaluate clusterings and their hyperparameter configurations. We use normalized mutual information to compare individual objects of a clustering ensemble to the constructed consensus of the whole ensemble and show, that the resulting score can serve as an overall quality measure for clustering problems. This method is capable of highlighting the standout clustering and hyperparameter configuration in the ensemble even in the case of a distorted consensus. We apply this very general framework to various data sets and give possible directions for future research.