Contextual Discourse Vectors (CDV)

CDV is a distributed document representation for efficient answer retrieval from long documents.

This demonstration shows how the CDV vector space model can be used to retrieve information from a large healthcare dataset. The model used is trained on Wikipedia data. See our WWW2020 paper and GitHub for more details on the implementation.

How to use:

  • Search: Enter the name of a disease and optionally a specific aspect into the query field. The system will retrieve the top 25 passages from the dataset that answer your query.
  • Highlight: Browse through the dataset and analyze how relevant each sentence in a document is for your query. The shade of blue visualizes the relevance score of a sentence.

Try some examples:

Datasets used in this demo:

Access a different dataset:

  • Wikipedia (Encyclopedia articles about diseases)
  • CORD-19 (COVID-19 Open Research Dataset)