Goto

Collaborating Authors

 technical content


BanglaSTEM: A Parallel Corpus for Technical Domain Bangla-English Translation

arXiv.org Artificial Intelligence

Large language models work well for technical problem solving in English but perform poorly when the same questions are asked in Bangla. A simple solution would be to translate Bangla questions into English first and then use these models. However, existing Bangla-English translation systems struggle with technical terms. They often mistranslate specialized vocabulary, which changes the meaning of the problem and leads to wrong answers. We present BanglaSTEM, a dataset of 5,000 carefully selected Bangla-English sentence pairs from STEM fields including computer science, mathematics, physics, chemistry, and biology. We generated over 12,000 translations using language models and then used human evaluators to select the highest quality pairs that preserve technical terminology correctly. We train a T5-based translation model on BanglaSTEM and test it on two tasks: generating code and solving math problems. Our results show significant improvements in translation accuracy for technical content, making it easier for Bangla speakers to use English-focused language models effectively. Both the BanglaSTEM dataset and the trained translation model are publicly released at https://huggingface.co/reyazul/BanglaSTEM-T5.


Reviews: Dynamic Ensemble Modeling Approach to Nonstationary Neural Decoding in Brain-Computer Interfaces

Neural Information Processing Systems

Originality: - The paper references prior work and the authors note their approach differs from previous work that assumes fixed decoding models. Authors should include a brief summary of how motor imagery BCIs operate (note that there are a variety of BCI control signals, motor-imagery is one of them (line 17). Authors propose enhancements to the model by adapting model parameters based on tracking functional changes in neural signals and noisy neurons. Quality: - Overall, the technical content appears mostly correct, some information is missing. Sections 2.2 and 2.3 discuss previously developed algorithms/methods.


A Study on Effect of Reference Knowledge Choice in Generating Technical Content Relevant to SAPPhIRE Model Using Large Language Model

arXiv.org Artificial Intelligence

Representation of systems using the SAPPhIRE model of causality can be an inspirational stimulus in design. However, creating a SAPPhIRE model of a technical or a natural system requires sourcing technical knowledge from multiple technical documents regarding how the system works. This research investigates how to generate technical content accurately relevant to the SAPPhIRE model of causality using a Large Language Model, also called LLM. This paper, which is the first part of the two-part research, presents a method for hallucination suppression using Retrieval Augmented Generating with LLM to generate technical content supported by the scientific information relevant to a SAPPhIRE con-struct. The result from this research shows that the selection of reference knowledge used in providing context to the LLM for generating the technical content is very important. The outcome of this research is used to build a software support tool to generate the SAPPhIRE model of a given technical system.


Can AI write your technical content?

#artificialintelligence

AI, or more correctly termed machine learning, can be found in many different applications in your home, car, and workplace. Our relationship with machine learning applications started when we began to use voice-operated smartphone assistants like Apple's Siri or Google's Voice but voice recognition is just one example of a machine learning application; others include vision processing and object recognition. With the wide range of industrial and electronics sector clients Publitek has, machine learning is a frequently covered topic. Over the past ten or so years, we've seen the use of machine learning (ML) neural networks, once restricted to compute-intensive data centers, emerge to bring intelligent control to the edge of industrial applications. It isn't always apparent, but many of the voice recognition applications we interact with rely on constant cloud connectivity to process our speech.


Terminology-based Text Embedding for Computing Document Similarities on Technical Content

arXiv.org Machine Learning

We propose in this paper a new, hybrid document embedding approach in order to address the problem of document similarities with respect to the technical content. To do so, we employ a state-of-the-art graph techniques to first extract the keyphrases (composite keywords) of documents and, then, use them to score the sentences. Using the ranked sentences, we propose two approaches to embed documents and show their performances with respect to two baselines. With domain expert annotations, we illustrate that the proposed methods can find more relevant documents and outperform the baselines up to 27% in terms of NDCG.