The EfficientQA competition has now completed.
Update August 8, 2021: Our paper is published in PMLR (Competitions and Demonstrations Track @NeurIPS2020). Please find the published version of the paper here.
Update July 23, 2021: Based on multiple requests, we have released the EfficientQA test data, with human ratings used in the report. You can find it here.
Update January 1, 2021: Our technical report is available here.
The competition organizers will give an overview of the EfficientQA competition, the results for each of the four tracks, and a post-hoc human analysis of each of the top performing systems’ predictions.
The top performing teams from each track will present a brief description of their submission.
Five top human teams of trivia experts took on the competition’s baseline systems for the opportunity to take on the computer systems in each of the competition’s divisions. The team of humans will compete against the computer on thirty questions from the test set. We will present highlights from the preliminary competition as well as the final showdown between computer systems and the human teams.
Because our slot is limited, you can watch (most of) the human vs. computer showdown videos before the actual event!
|Microsoft Research & Dynamics 365
|Studio Ousia & Tohoku University
|Brno University of Technology
Open domain question answering is emerging as a benchmark method of measuring computational systems’ abilities to read, represent, and retrieve knowledge expressed in all of the documents on the web.
In this competition contestants will develop a question answering system that contains all of the knowledge required to answer open-domain questions. There are no constraints on how the knowledge is stored—it could be in documents, databases, the parameters of a neural network, or any other form. However, three competition tracks encourage systems that store and access this knowledge using the smallest number of bytes, including code, corpora, and model parameters. There will also be an unconstrained track, in which the goal is to achieve the best possible question answering performance with no constraints. The best performing systems from each of the tracks will be put to test in a live competition against trivia experts during the NeurIPS 2020 competition track.
All submissions to the memory constrained tracks must be made to the EfficientQA leaderboard before 2020/11/14. Once the leaderboard is closed, the test set inputs will be released and participants will be given until the end of 2020/11/17 to submit predictions to the unconstrained track. To recieve notifications regarding this process, please sign up to our mailing list.
This competition will be evaluated using the open domain variant of the Natural Questions question answering task. The questions in Natural Questions are real Google search queries, and each is paired with up to five reference answers. The challenge is to build a question answering system that can generate a correct answer given just a question as input.
This competition has four separate tracks. In the unrestricted track contestants are allowed to use arbitrary technology to answer questions, and submissions will be ranked according to the accuracy of their predictions alone.
There are also three restricted tracks in which contestants will have to upload their systems to our servers, where they will be run in a sandboxed environment, without access to any external resources. In these three tracks, the goal is to build:
More information on the task definition, data, and evaluation can be found here.
Each of the restricted tracks comes with a prize of approximately $3000 in Google Cloud credits. Each submitter can win at most one track. Please go to the leaderboard site for full terms and conditions.
In practice, five reference answers are sometimes not enough—there are a lot of ways in which an answer can be phrased, and sometimes there are multiple valid answers. At the end of this competition’s submission period, predictions from the best performing systems will be checked by humans. This human evaluation will be used to calculate an alternate ranking, which will be presented along with the automatic leaderboard rank at NeurIPS 2020.
We have provided a tutorial for getting started with several baseline systems that either generate answers directly, from a neural network, or extract them from a corpus of text. You can find the tutorial here.
|September 14, 2020
|November 14, 2020
|November 30, 2020
|Human evaluation completed and results announced.
|December 11-12, 2020
|NeurIPS workshop and human-computer competition (held virtually).
firstname.lastname@example.org with any questions.