AITopics | txt file

Collaborating Authors

txt file

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ai.txt: A Domain-Specific Language for Guiding AI Interactions with the Internet

Li, Yuekang, Song, Wei, Zhu, Bangshuo, Gong, Dong, Liu, Yi, Deng, Gelei, Chen, Chunyang, Ma, Lei, Sun, Jun, Walsh, Toby, Xue, Jingling

arXiv.org Artificial IntelligenceMay-14-2025

We introduce ai.txt, a novel domain-specific language (DSL) designed to explicitly regulate interactions between AI models, agents, and web content, addressing critical limitations of the widely adopted robots.txt standard. As AI increasingly engages with online materials for tasks such as training, summarization, and content modification, existing regulatory methods lack the necessary granularity and semantic expressiveness to ensure ethical and legal compliance. ai.txt extends traditional URL-based access controls by enabling precise element-level regulations and incorporating natural language instructions interpretable by AI systems. To facilitate practical deployment, we provide an integrated development environment with code autocompletion and automatic XML generation. Furthermore, we propose two compliance mechanisms: XML-based programmatic enforcement and natural language prompt integration, and demonstrate their effectiveness through preliminary experiments and case studies. Our approach aims to aid the governance of AI-Internet interactions, promoting responsible AI use in digital ecosystems.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2505.07834

Country: North America > United States (0.47)

Genre: Research Report (0.82)

Industry:

Law (1.00)
Government > Regional Government (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models

Zhang, Andy K., Perry, Neil, Dulepet, Riya, Jones, Eliot, Lin, Justin W., Ji, Joey, Menders, Celeste, Hussein, Gashon, Liu, Samantha, Jasper, Donovan, Peetathawatchai, Pura, Glenn, Ari, Sivashankar, Vikram, Zamoshchin, Daniel, Glikbarg, Leo, Askaryar, Derek, Yang, Mike, Zhang, Teddy, Alluri, Rishi, Tran, Nathan, Sangpisit, Rinnara, Yiorkadjis, Polycarpos, Osele, Kenny, Raghupathi, Gautham, Boneh, Dan, Ho, Daniel E., Liang, Percy

arXiv.org Artificial IntelligenceAug-15-2024

Language Model (LM) agents for cybersecurity that are capable of autonomously identifying vulnerabilities and executing exploits have the potential to cause real-world impact. Policymakers, model providers, and other researchers in the AI and cybersecurity communities are interested in quantifying the capabilities of such agents to help mitigate cyberrisk and investigate opportunities for penetration testing. Toward that end, we introduce Cybench, a framework for specifying cybersecurity tasks and evaluating agents on those tasks. We include 40 professional-level Capture the Flag (CTF) tasks from 4 distinct CTF competitions, chosen to be recent, meaningful, and spanning a wide range of difficulties. Each task includes its own description, starter files, and is initialized in an environment where an agent can execute bash commands and observe outputs. Since many tasks are beyond the capabilities of existing LM agents, we introduce subtasks, which break down a task into intermediary steps for more gradated evaluation; we add subtasks for 17 of the 40 tasks. To evaluate agent capabilities, we construct a cybersecurity agent and evaluate 7 models: GPT-4o, Claude 3 Opus, Claude 3.5 Sonnet, Mixtral 8x22b Instruct, Gemini 1.5 Pro, Llama 3 70B Chat, and Llama 3.1 405B Instruct. Without guidance, we find that agents are able to solve only the easiest complete tasks that took human teams up to 11 minutes to solve, with Claude 3.5 Sonnet and GPT-4o having the highest success rates. Finally, subtasks provide more signal for measuring performance compared to unguided runs, with models achieving a 3.2\% higher success rate on complete tasks with subtask-guidance than without subtask-guidance. All code and data are publicly available at https://cybench.github.io

agent, decrypt, research plan and status, (15 more...)

arXiv.org Artificial Intelligence

2408.08926

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New York (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(3 more...)

Genre:

Workflow (0.68)
Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Amazon investigating Perplexity AI after accusations it scrapes websites without consent

EngadgetJun-28-2024, 13:30:03 GMT

Amazon Web Services has started an investigation to determine whether Perplexity AI is breaking its rules, according to Wired. To, be precise, the company's cloud division is looking into allegations that the service is using a crawler, which is hosted on its servers, that ignores the Robots Exclusion Protocol. This protocol is a web standard, wherein developers put a robots.txt Complying with those instructions is voluntary, but crawlers from reputable companies have generally been respecting them since web developers started implementing the standard in the '90s. In an earlier piece, Wired reported that it discovered a virtual machine that was bypassing its website's robots.txt

perplexity ai, website, wired, (13 more...)

Engadget

Industry: Information Technology > Services (0.60)

Technology:

Information Technology > Communications > Web (0.60)
Information Technology > Artificial Intelligence > Natural Language (0.36)

Add feedback

AI companies are reportedly still scraping websites despite protocols meant to block them

EngadgetJun-22-2024, 13:23:08 GMT

Perplexity, a company that describes its product as "a free AI search engine," has been under fire over the past few days. Shortly after Forbes accused it of stealing its story and republishing it across multiple platforms, Wired reported that Perplexity has been ignoring the Robots Exclusion Protocol, or robots.txt, Technology website The Shortcut also accused the company of scraping its articles. Now, Reuters has reported that Perplexity isn't the only AI company that's bypassing robots.txt Reuters said it saw a letter addressed to publishers from TollBit, a startup that pairs them up with AI firms so they can reach licensing deals, warning them that "AI agents from multiple sources (not just one company) are opting to bypass the robots.txt

perplexity, protocol, website, (13 more...)

Engadget

Country: North America > United States > California (0.06)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.37)

Add feedback

Testing Language Model Agents Safely in the Wild

Naihin, Silen, Atkinson, David, Green, Marc, Hamadi, Merwane, Swift, Craig, Schonholtz, Douglas, Kalai, Adam Tauman, Bau, David

arXiv.org Artificial IntelligenceDec-3-2023

A prerequisite for safe autonomy-in-the-wild is safe testing-in-the-wild. Yet real-world autonomous tests face several unique safety challenges, both due to the possibility of causing harm during a test, as well as the risk of encountering new unsafe agent behavior through interactions with real-world and potentially malicious actors. We propose a framework for conducting safe autonomous agent tests on the open internet: agent actions are audited by a context-sensitive monitor that enforces a stringent safety boundary to stop an unsafe test, with suspect behavior ranked and logged to be examined by humans. We design a basic safety monitor (AgentMonitor) that is flexible enough to monitor existing LLM agents, and, using an adversarial simulated agent, we measure its ability to identify and stop unsafe situations. Then we apply the AgentMonitor on a battery of real-world tests of AutoGPT, and we identify several limitations and challenges that will face the creation of safe in-the-wild tests as autonomous agents grow more capable.

agent, information, whitelist, (17 more...)

arXiv.org Artificial Intelligence

2311.10538

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

New York Times, CNN and ABC block OpenAI's GPTBot web crawler from accessing content

The GuardianAug-25-2023, 00:31:40 GMT

News outlets including the New York Times, CNN, Reuters and the Australian Broadcasting Corporation (ABC) have blocked a tool from OpenAI, limiting the company's ability to continue accessing their content. OpenAI is behind one of the best known artificial intelligence chatbots, ChatGPT. Its web crawler – known as GPTBot – may scan webpages to help improve its AI models. The Verge was first to report the New York Times had blocked GPTBot on its website. The Guardian subsequently found that other major news websites, including CNN, Reuters, the Chicago Tribune, the ABC and Australian Community Media (ACM) brands such as the Canberra Times and the Newcastle Herald, appear to have also disallowed the web crawler.

crawler, gptbot, new york time, (11 more...)

The Guardian

Country:

North America > United States > Illinois > Cook County > Chicago (0.27)
Oceania > Australia > Australian Capital Territory > Canberra (0.26)
Europe > France (0.06)

Industry:

Media > News (0.60)
Government > Regional Government > Oceania Government > Australia Government (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.87)

Add feedback

Week#5 Tracking of Students' Facial Expressions

#artificialintelligenceDec-18-2022, 21:40:07 GMT

What Have We Done Last Week? Last week we did a research on which object detection algorithms we can use to detect facial expressions of students. As a result of this research, we have seen that deep learning-based object detection methods are widely used. These methods use convolutional neural networks to learn the properties of the data and perform object detection models. Among them, we examined the R-CNN, Faster R-CNN and YOLO methods.

dataset, facial expression, repository, (7 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.59)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

FOON Creation and Traversal for Recipe Generation

Patel, Raj

arXiv.org Artificial IntelligenceNov-15-2022

Task competition by robots is still off from being completely dependable and usable. One way a robot may decipher information given to it and accomplish tasks is by utilizing FOON, which stands for functional object-oriented network. The network first needs to be created by having a human creates action nodes as well as input and output nodes in a .txt file. After the network is sizeable, utilization of this network allows for traversal of the network in a variety of ways such as choosing steps via iterative deepening searching by using the first seen valid option. Another mechanism is heuristics, such as choosing steps based on the highest success rate or lowest amount of input ingredients. Via any of these methods, a program can traverse the network given an output product, and derive the series of steps that need to be taken to produce the output.

artificial intelligence, functional unit, node, (16 more...)

arXiv.org Artificial Intelligence

2210.07335

Country: North America > United States > Florida > Hillsborough County > Tampa (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.50)

Add feedback

Build a Named Entity Recognition App with Streamlit

#artificialintelligenceAug-31-2022, 20:30:34 GMT

In my previous article, we fine-tuned a Named Entity Recognition (NER) model, trained on the wnut_17[1] dataset. In this article, we show step-by-step how to integrate this model with Streamlit and deploy it using HugginFace Spaces. The goal of this app is to tag input sentences per user request in real time. Also, keep in mind, that contrary to trivial ML models, deploying a large language model on Streamlit is tricky. We also address those challenges.

data science project, entity recognition app, streamlit, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

So, I made an AI to attend my online classes for me.

#artificialintelligenceApr-22-2022, 18:40:27 GMT

We all now how boring it gets after a while to attend online classes. So, I made an AI to attend them for me. Let's see how will the AI get the data from the class? So from the above image I hope you get the basic gist of how the data collection will work. So, basically what the above code does is take the screenshoted image and make it negative so that the black turns white and the white turns black and then detect the the text and draw rectangles around it on the original image.

data collection, gpt-3, mute unmute, (7 more...)

#artificialintelligence

Genre:

Instructional Material > Online (0.62)
Instructional Material > Course Syllabus & Notes (0.62)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.62)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback