SWAN: A Generic Framework for Auditing Textual Conversational Systems
–arXiv.org Artificial Intelligence
We argue that such frameworks should satisfy the following requirements at least. Alertness They should detect potential problems with extremely high recall (i.e., near-zero misses), while appropriately crediting the benefits of the conversational systems. Moreover, when aiming for high recall, different people involved (i.e., not just users, but also workers who label data for training the system, etc.) should be taken into account; in particular, if the evaluation framework ignores some negative impacts on marginalised people, it does not satisfy the alertness requirement. Specificity By this we mean that the evaluation framework should be specific when locating the problem(s) within conversations. For example, an evaluation result that says"There is a problem somewhere inside this conversation session" is less useful than one that says"There is a problem in this particular system turn," which in turn is less useful than one that says "There is a problem in this particular claim within this system turn."
arXiv.org Artificial Intelligence
May-14-2023
- Country:
- Oceania > Australia
- Victoria > Melbourne (0.04)
- Queensland (0.04)
- North America
- Canada (0.04)
- United States > Texas
- Travis County > Austin (0.04)
- Europe
- Slovenia (0.04)
- Czechia > Prague (0.04)
- United Kingdom > Scotland
- City of Glasgow > Glasgow (0.04)
- Spain > Galicia
- Madrid (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.05)
- Asia
- Taiwan > Taiwan Province
- Taipei (0.04)
- Singapore > Central Region
- Singapore (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- China > Tianjin Province
- Tianjin (0.04)
- Taiwan > Taiwan Province
- Oceania > Australia
- Genre:
- Research Report (0.41)
- Technology: