smart solution
Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset
Mahabadi, Rabeeh Karimi, Satheesh, Sanjeev, Prabhumoye, Shrimai, Patwary, Mostofa, Shoeybi, Mohammad, Catanzaro, Bryan
Pretraining large language models (LLMs) on high-quality, structured data such as mathematics and code substantially enhances reasoning capabilities. However, existing math-focused datasets built from Common Crawl suffer from degraded quality due to brittle extraction heuristics, lossy HTML-to-text conversion, and the failure to reliably preserve mathematical structure. In this work, we introduce Nemotron-CC-Math, a large-scale, high-quality mathematical corpus constructed from Common Crawl using a novel, domain-agnostic pipeline specifically designed for robust scientific text extraction. Unlike previous efforts, our pipeline recovers math across various formats (e.g., MathJax, KaTeX, MathML) by leveraging layout-aware rendering with lynx and a targeted LLM-based cleaning stage. This approach preserves the structural integrity of equations and code blocks while removing boilerplate, standardizing notation into LaTeX representation, and correcting inconsistencies. We collected a large, high-quality math corpus, namely Nemotron-CC-Math-3+ (133B tokens) and Nemotron-CC-Math-4+ (52B tokens). Notably, Nemotron-CC-Math-4+ not only surpasses all prior open math datasets-including MegaMath, FineMath, and OpenWebMath-but also contains 5.5 times more tokens than FineMath-4+, which was previously the highest-quality math pretraining dataset. When used to pretrain a Nemotron-T 8B model, our corpus yields +4.8 to +12.6 gains on MATH and +4.6 to +14.3 gains on MBPP+ over strong baselines, while also improving general-domain performance on MMLU and MMLU-Stem. We present the first pipeline to reliably extract scientific content--including math--from noisy web-scale data, yielding measurable gains in math, code, and general reasoning, and setting a new state of the art among open math pretraining corpora. To support open-source efforts, we release our code and datasets.
Using Machine Learning in Testing and Maintenance
With machine learning, we can reduce maintenance efforts and improve the quality of products. It can be used in various stages of the software testing life-cycle, including bug management, which is an important part of the chain. We can analyze large amounts of data for classifying, triaging, and prioritizing bugs in a more efficient way by means of machine learning algorithms. Mesut Durukal, a test automation engineer at Rapyuta Robotics, spoke at Aginext 2021 about using machine learning in testing. Durukal uses machine learning to classify and cluster bugs.
10 Applications of Deep Learning in Business
Deep learning is a subset of artificial intelligence, in particular, the field of machine learning. Deep learning uses a multi-layered artificial neural network to carry out a range of tasks, from fraud detection to speech recognition or language translation. Deep learning differs from traditional machine learning systems in that it is capable of self-learning and improving as it analyses large data sets. A highly flexible system it has a number of applications in business. In this article, we explain exactly what deep learning is and explore the ways that it is already transforming businesses. Deep learning is a function of artificial intelligence. It is designed to replicate the way that the human brain processes data. It also re-creates the patterns found in the brain's decision-making process. Sometimes called deep neural networking or neural learning, it is part of the wider field of machine learning. It is powered by networks that can carry out unsupervised learning. This process uses algorithms to analyse raw data, extracting information and presenting it in a structured, useful model. Often it is also used to process unstructured or unlabeled data.
Artificial Intelligence (AI)-powered smart solutions changing the face of ecommerce
Ecommerce has emerged as the most popular online activity around the globe. Revenues have hiked almost twice in the past five years with expectations of the total retail ecommerce sales worldwide of $2.3 trillion in 2017 to go up to $4.88 trillion US dollars in 2021. Due to this, there has been a rapid downfall of revenues for the traditional brick-and-mortar stores. To keep pace with global consumer demands, ecommerce sites are incorporating new emerging technologies into their portals every day, including artificial intelligence (AI). The use of user personalization registered shopping habits, tailored advertisements and customer engagement is just the tip of the iceberg to boost sales.
How eCommerce companies are using AI to drive Higher Sales & User Experience
Artificial intelligence is one of the hottest topics in this ever-changing world. AI is a way ahead to offer the industry with innovative and smart business solutions. It is adding valuable elements to the e-commerce platforms to stay in the market. Advancement is to such an extent that speakers are talking, vendors are selling, and e-commerce is growing. The user experience and brand performance are getting equilibrium as Artificial Intelligence technology is changing a tedious working pattern.
Smart solutions for Insurance Companies with Artificial Intelligence
Digitalization has provided organizations with huge sets of data, opening new avenues to deploy and benefit from Big data and analytics. Every year insurance companies report billions of dollars of fraudulent cases and the most common form of fraud is identity theft. There is an imminent need for data to be secured and discovered properly to mitigate such fraudulent claims. AI with machine learning, big data and analytics can enhance the efficiency of data. AI assistant or chatbots were being used to handle customer complaints and process simple transactions.
From Browser to Buyer: How Smart Solutions Are Transforming E-Commerce
Shopping: It's not what it once was. Forget the long drive to the mall and countless hours of browsing. Customers now simply log in from the comfort of their own home to have the world at their fingertips. The massive uptake of e-commerce is an unprecedented shift in consumer behavior, expectations and purchases. The evidence is in the numbers: Internet shopping is the most popular online activity.
Smart Solutions for Smart Machines
Powered by smart machines, the new industrial revolution is changing how manufacturers operate today and plan for the future, influencing a significant transformation in manufacturing, engineering and factory-floor industries. Adding to this, manufacturers are under pressure to meet the demand for faster delivery of new products, coupled with shorter production lifecycles. Organizations are adopting agile, flexible production plant systems and processes to adapt and evolve, so as to remain competitive and profitable. Going forward plants and machines will have to be smarter, better connected, more efficient, flexible, and safe. Over the past several years, innovation frameworks have emerged in industry organizations worldwide, such as Industry 4.0 (Europe), the Industrial Internet Consortium (America), and the Made-in-China initiative, to name a few.
Know what are six technologies crucial for building a smart cities
Technological literacy is a key to turn a city into smart city which is well connected, sustainable and resilient, where information is not just available but also findable. It is not a new thing that smart city is all about providing smart services to its citizens which can save their time and ease their lives. It is also about connecting them to the governance where they can give their feedback to the government as of how they want their city to be. And this aim can't be turned into reality without technology. Let's have a look at six technologies without which smartness of a city can never be enhanced.
E-Gov: Smart governments, smart solutions; how Karnataka, Madhya Pradesh are looking to enhance productivity
The government of Karnataka has decided to partner US tech giant Microsoft to use artificial intelligence (AI) for digital agriculture. The collaboration intends to empower smallholder farmers with technology-oriented solutions that will help them increase income using ground-breaking, cloud-based technologies, machine learning and advanced analytics. The collaboration will experiment with the Karnataka Agricultural Price Commission (KAPC), department of agriculture to help improve price forecasting practices to benefit farmers. Microsoft, with guidance from KAPC, is attempting to develop a multi-variant agricultural commodity price forecasting model considering the following datasets--historical sowing area, production, yield, weather datasets and other related datasets as relevant. For this season, Tur crop has been identified for this prediction model.