Open domain question answering is emerging as a benchmark method of measuring computational systems’ abilities to read, represent, and retrieve knowledge expressed in all of the documents on the web.
In this competition contestants will develop a question answering system that contains all of the knowledge required to answer open-domain questions. There are no constraints on how the knowledge is stored—it could be in documents, databases, the parameters of a neural network, or any other form. However, three competition tracks encourage systems that store and access this knowledge using the smallest number of bytes, including code, corpora, and model parameters. There will also be an unconstrained track, in which the goal is to achieve the best possible question answering performance with no constraints. The best performing systems from each of the tracks will be put to test in a live competition against trivia experts during the NeurIPS 2020 competition track.
We have provided tutorials on baselines with a number of different sized baseline models. To be notified when the leaderboard is launched, in July 2020, and for up to date information on the competition and workshop, please sign up to our mailing list.
This competition will be evaluated using the open domain variant of the Natural Questions question answering task. The questions in Natural Questions are real Google search queries, and each is paired with up to five reference answers. The challenge is to build a question answering system that can generate a correct answer given just a question as input.
This competition has four separate tracks. In the unrestricted track contestants are allowed to use arbitrary technology to answer questions, and submissions will be ranked according to the accuracy of their predictions alone.
There are also three restricted tracks in which contestants will have to upload their systems to our servers, where they will be run in a sandboxed environment, without access to any external resources. In these three tracks, the goal is to build:
We will award prizes to the teams that create the top performing submissions in each restricted track.
More information on the task definition, data, and evaluation can be found here.
In practice, five reference answers are sometimes not enough—there are a lot of ways in which an answer can be phrased, and sometimes there are multiple valid answers. At the end of this competition’s submission period, predictions from the best performing systems will be checked by humans. The final ranking will be performed on the basis of this human eval.
We have provided a tutorial for getting started with several baseline systems that either generate answers directly, from a neural network, or extract them from a corpus of text. You can find the tutorial here.
|July, 2020||Leaderboard launched.|
|October 14, 2020||Leaderboard frozen.|
|November 14, 2020||Human evaluation completed and winners announced.|
|December 11-12, 2020||NeurIPS workshop and human-computer competition (held virtually).|
email@example.com with any questions.