AITopics | console output

Collaborating Authors

console output

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving

Evstafev, Evgenii

arXiv.org Artificial IntelligenceJan-28-2025

Large language models (LLMs) excel in many natural language tasks, yet they struggle with complex mathemat-ical problem-solving, particularly in symbolic reasoning and maintaining consistent output. This study evalu-ates 10 LLMs with 7 to 8 billion parameters using 945 competition-level problems from the MATH dataset. The focus is on their ability to generate executable Python code as a step in their reasoning process, involving over 9,450 code executions. The research introduces an evaluation framework using mistral-large-2411 to rate answers on a 5-point scale, which helps address inconsistencies in mathematical notation. It also examines the impact of regenerating output token-by-token on refining results. The findings reveal a significant 34.5% per-formance gap between the top commercial model (gpt-4o-mini, scoring 83.7%) and the least effective open-source model (open-codestral-mamba:v0.1, scoring 49.2%). This disparity is especially noticeable in complex areas like Number Theory. While token-by-token regeneration slightly improved accuracy (+0.8%) for the model llama3.1:8b, it also reduced code execution time by 36.7%, highlighting a trade-off between efficiency and precision. The study also noted a consistent trend where harder problems correlated with lower accuracy across all models. Despite using controlled execution environments, less than 1% of the generated code was unsafe, and 3.17% of problems remained unsolved after 10 attempts, suggesting that hybrid reasoning methods may be beneficial.

accuracy, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2501.17084

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)

Genre: Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Stop Using 0.5 as the Threshold for Your Binary Classifier

#artificialintelligenceNov-29-2022, 19:56:11 GMT

To produce a binary response, classifiers output a real-valued score that is thresholded. For example, logistic regression outputs a probability (a value between 0.0 and 1.0); and observations with a score equal to or higher than 0.5 produce a positive binary output (many other models use the 0.5 threshold by default). However, using the default 0.5 threshold is suboptimal. In this blog post, I'll show you how you can choose the best threshold from your binary classifier. We'll be using Ploomber to execute our experiments in parallel and sklearn-evaluation to generate the plots.

classifier, console output, threshold, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.55)
Information Technology > Communications > Social Media (0.50)

Add feedback

Who needs MLflow when you have SQLite?

#artificialintelligenceNov-19-2022, 07:25:28 GMT

I spent about six years working as a data scientist and tried to use MLflow several times (and others as well) to track my experiments; however, every time I tried using it, I abandoned it a few days after. There were a few things I didn't like: it seemed too much to have to start a web server to look at my experiments, and I found the query feature extremely limiting (if my experiments are stored in a SQL table, why not allow me to query them with SQL). I also found comparing the experiments limited. I rarely have a project where a single (or a couple of) metric(s) is enough to evaluate a model. It's mostly a combination of metrics and evaluation plots that I need to look at to assess a model.

console output, experiment, experiment tracker, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.38)

Add feedback

vpj/lab

#artificialintelligenceJun-18-2019, 16:53:46 GMT

This library lets you organize machine learning experiments. Maintains logs, summaries and checkpoints of all the experiments in a folder structure without you explicitly having to worry about them. It keeps references to git commit when the experiement was run, along with other information like date, the python file executed and experiment description. Optionally, the library can update the python file by inserting experiment results as a comment automatically. You can use monitored code segments to measure time and to get status updates on the console.

artificial intelligence, experiment, machine learning, (16 more...)

#artificialintelligence

Industry: Leisure & Entertainment (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback