Introduction

A knowledge-based question-answering system (KB-QA) is one that answers natural language questions with information stored in a large-scale knowledge base (KB). Existing KB-QA systems are either powered by curated KBs in which factual knowledge is encoded in entities and relations with well-structured schemas, or by open KBs, which contain assertions represented in the form of triples (e.g., <subject; relation phrase; arguments>). We show that both approaches fall short in answering questions with complex prepositional or adverbial constraints. We propose using n-tuple assertions, which are assertions with an arbitrary number of arguments, and n-tuple open KB (nOKB), which is an open knowledge base of n-tuple assertions. We present TAQA (n-Tuple Assertion-based Question Answering), a novel KBQA system that is based on an nOKB and illustrate via experiments how TAQA can effectively answer complex questions with rich semantic constraints. Our work also results in a new open KB containing 120M n-tuple assertions and a collection of 300 labeled complex questions, which is made publicly available for further research.

[Paper] | [Slides]

Open Knowledge Base

Our n-tuple open knowledge base (nOKB) contains natural language assertions in n-tuple form. Each assertion has one subject (sub), one relation phrase (rel), and multiple arguments (arg), with the form < sbj; rel; arg_1, ..., arg_k >. An example assertion is:

<Barack Obama; graduated; from Harvard Law School, in 1991>

The n-tuple open KB used in our experiments contains 120 million assertions. This whole KB is split into 10 tsv files (5.4GB compressed, 20GB decompressed), and can be downloaded [here]. Each line contains 10 columns separated by tabs. The format of each line is as follows:

assertion-id, sbj, rel, args (separated by semi-colons), sbj (normalized), rel (normalized), arg (normalized), extraction confidence, extraction frequency, namespace (openie) 

Additional triplet assertions used to build nOKB+ in the experiments can is available from the OQA project homepage of University of Washington.

ComplexQuestions Dataset

The ComplexQuestions dataset contains 300 open domain complex questions with prepositional/adverbial constraints. 80 of the 300 questions come from WebQuestions dataset released by Berant et al. We manually labeled each question with at least one gold-standard answer found in our open KB.

download

Reference

  • Pengcheng Yin, Nan Duan, Ben Kao, Junwei Bao, and Ming Zhou. 2015. Answering Questions with Complex Semantic Constraints on Open Knowledge Bases. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM '15). ACM, New York, NY, USA, 1301-1310.