Now showing 1 - 3 of 3
  • Publication
    Assessing the Robustness of Conversational Agents using Paraphrases
    Assessing a conversational agent’s understanding capabilities is critical, as poor user interactions could seal the agent’s fate at the very beginning of its lifecycle with users abandoning the system. In this paper we explore the use of paraphrases as a testing tool for conversational agents. Paraphrases, which are different ways of expressing the same intent, are generated based on known working input by per- forming lexical substitutions. As the expected outcome for this newly generated data is known, we can use it to assess the agent’s robustness to language variation and detect potential understanding weaknesses. As demonstrated by a case study, we obtain encouraging results as it appears that this approach can help anticipate potential understanding shortcomings and that these shortcomings can be addressed by the generated paraphrases.
      447Scopus© Citations 12
  • Publication
    BoTest: a Framework to Test the Quality of Conversational Agents Using Divergent Input Examples
    Quality of conversational agents is important as users have high expectations. Consequently, poor interactions may lead to the user abandoning the system. In this paper, we propose a framework to test the quality of conversational agents. Our solution transforms working input that the conversational agent accurately recognises to generate divergent input examples that introduce complexity and stress the agent. As the divergent inputs are based on known utterances for which we have the 'normal' outputs, we can assess how robust the conversational agent is to variations in the input. To demonstrate our framework we built ChitChatBot, a simple conversational agent capable of making casual conversation.
      581Scopus© Citations 18
  • Publication
    Training a Chatbot with Microsoft LUIS: Effect of Intent Imbalance on Prediction Accuracy
    Microsoft LUIS is a natural language understanding service used to train Chatbots. Imbalance in the utterance training set may cause the LUIS model to predict the wrong intent for a user's query. We discuss this problem and the training recommendations from Microsoft to improve prediction accuracy with LUIS. We perform batch testing on three training sets created from two existing datasets to explore the effectiveness of these recommendations.
      372Scopus© Citations 2