current issue
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Eriksson, Maria, Purificato, Erasmo, Noroozian, Arman, Vinagre, Joao, Chaslot, Guillaume, Gomez, Emilia, Fernandez-Llorca, David
Quantitative Artificial Intelligence (AI) Benchmarks have emerged as fundamental tools for evaluating the performance, capability, and safety of AI models and systems. Currently, they shape the direction of AI development and are playing an increasingly prominent role in regulatory frameworks. As their influence grows, however, so too does concerns about how and with what effects they evaluate highly sensitive topics such as capabilities, including high-impact capabilities, safety and systemic risks. This paper presents an interdisciplinary meta-review of about 100 studies that discuss shortcomings in quantitative benchmarking practices, published in the last 10 years. It brings together many fine-grained issues in the design and application of benchmarks (such as biases in dataset creation, inadequate documentation, data contamination, and failures to distinguish signal from noise) with broader sociotechnical issues (such as an over-focus on evaluating text-based AI models according to one-time testing logic that fails to account for how AI models are increasingly multimodal and interact with humans and other technical systems). Our review also highlights a series of systemic flaws in current benchmarking practices, such as misaligned incentives, construct validity issues, unknown unknowns, and problems with the gaming of benchmark results. Furthermore, it underscores how benchmark practices are fundamentally shaped by cultural, commercial and competitive dynamics that often prioritise state-of-the-art performance at the expense of broader societal concerns. By providing an overview of risks associated with existing benchmarking procedures, we problematise disproportionate trust placed in benchmarks and contribute to ongoing efforts to improve the accountability and relevance of quantitative AI benchmarks within the complexities of real-world scenarios.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Spain > Andalusia > Seville Province > Seville (0.05)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- (15 more...)
- Overview (1.00)
- Research Report > New Finding (0.67)
- Law (1.00)
- Health & Medicine (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.93)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.67)
Ethics in Ai -- Current issues, existing precautions, and probable solutions
Introduction- Most of the Artificial Intelligent (Ai) Systems are developed as black boxes, especially Machine Learning and Deep Learning-based systems. Nowadays, these Machine and Deep Learning-based systems make decisions for our daily life, and should be explainable and should not be taken for granted to the end-users. The implication of such systems is rarely explored for the efficiency in the public usage (i.e., usage in -- Agriculture, Air Combat, Military Training, Education, Finance, Health Care, Human Resources, Customer Service, Autonomous Vehicles, Social Media, and several others[1]-[9]). Not only these, but the future might also be relying on Ai based system that will do our laundry, mow our lawn, fight wars [9]. Thus, there is so much room to improve the transparency of the systems along with fairness and accountability. There are some works that already stated the necessity of guidelines and governance of the Ai based systems, but more exposure is required in each area of application.
- North America > United States > New York (0.05)
- Europe > France (0.05)
- Information Technology > Security & Privacy (0.98)
- Government > Military (0.88)
Questions on artificial intelligence in UPSC CSE Main, more focus on analytical skills
The Union Public Service Commission (UPSC) Civil Services Main exam started Friday and will continue till September 29. Based on the feedback of students and experts, the recruitment exam is believed to be easier than last year. Several are of the opinion that the 2017 pattern has made a comeback after undergoing slight changes in 2018. "In the essay paper, UPSC is back to questions on current affairs in one section and philosophical topics in the other. So, it is a good mix that tests your skills for writing both kinds of essays. This is almost the template that UPSC followed until 2017, and it is back to it in 2019. The candidates would have felt confident seeing the topics, particularly after the extremely difficult topics in 2018," said Pulkit Sachdeva, co-founder, SleepyClasses, a UPSC preparation platform.
IEEE Xplore: IEEE Transactions on Computational Intelligence and AI in Games - (Current Issue)
The IEEE Transactions on Computational Intelligence and AI in Games (T-CIAIG) publishes archival journal quality original papers in computational intelligence and related areas in artificial intelligence applied to games, including but not limited to videogames, mathematical games, human–computer interactions in games, and games involving physical objects. Emphasis is placed on the use of these methods to improve performance in and understanding of the dynamics of games, as well as gaining insight into the properties of the methods as applied to games. It also includes using games as a platform for building intelligent embedded agents for the real world. Papers connecting games to all areas of computational intelligence and traditional AI are considered.
Coming to the Classroom: Artificial Intelligence The Amplifier - Georgia Tech Experts on Current Issues
Artificial intelligence (AI) is already in the classroom: as digital textbooks that include question-and- answer simulations; as intelligent nano-tutors to help students work through complex problems and as intelligent systems to grade student assignments. Ashok Goel teaches Knowledge-Based AI as part of the Institute's Online Master of Science in Computer Science (OMS CS) program. He says he and his peers are on the verge of ushering AI into higher education in bold, new ways. What's next are virtual teaching assistants (VTAs). This modern form of AI will become omnipresent and available on demand for students.
Current Issues in Natural Language Generation: An Overview of the AAAI Workshop on Text Planning and Realization
Hovy, Eduard H., McDonald, David D., Young, Sheryl R.
Largely from this Traditionally, systems that automatically and realization--was widely experience, we came to understand generate natural language have deemed more convenient than accurate: the sorts of tasks that a text planner been conceived as consisting of two The components of a generator has to perform: determining which principal components: a text planner should be able to communicate at elements to say, coherently structuring and a realization grammar. Recent any level where their information is the input elements, building advances in the art, especially in the applicable.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Massachusetts (0.05)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (5 more...)