Knowledge extraction from large corpora of human-human conversation data from web chat services


The goal of the DATCHA project is to perform knowledge extraction from very large databases of WEB chat conversations between operators and clients in customer contact centers. Extracting knowledge from chat corpus is a challenging research issue. Simply applying traditional text mining tools is clearly sub-optimal as it takes into account neither the interaction dimension nor the particular nature of this language which shares properties of both spoken and written language. The DATCHA project will address scientific issues including intra-conversation analysis through a deep semantic analysis (syntactic, semantic, discursive and structural analysis) and inter-conversation analysis (definition of semantic and discursive similarity between conversations). It will propose innovative solutions in various use-cases including analytics report generation, conversation success prediction on the basis of criteria defined by operational units, and online conversation solving.


  • Datcha at "Conférence Olivier Legrain Sciences et Sociétés" on IA and cognition, July 5-6 2018 at ENS
  • Demo of conversation annotations
  • Project meeting at Orange Labs
  • Papers accepted at EACL'17, and TALN'17
  • Project meeting at IRIT
  • Paper accepted at Sigdial'16 and LREC'16


The consortium running the project involves three partners:

This project is funded by Agence Nationale pour la Recherche (ANR) under contract ANR-15-CE23-0003.


Latest publications from or related to the project:


Project coordinator:

  • Frederic Bechet (frederic.bechet at lif.univ-mrs.fr)
  • Aix-Marseille Université, LIF/CNRS, Parc Scientifique et Technologique de Luminy, 163 avenue de Luminy - Case 901, F-13288 Marseille Cedex 9, France.

Last updated on 2018-08-07