The official website for the open domain question answering challenge at NeurIPS 2020.
HomeEfficientQA has multiple tracks. There are three restricted tracks that judge submission on the basis of their size, as well as the accuracy of their predictions.
All submissions to the restricted tracks must be submitted to the EfficientQA leaderboard. You can also submit to the unrestricted track using the EfficientQA leaderboard. However, if your system will not run on the leaderboard hardware, we will also release the test set input on 2020/11/15, after the leaderboard is frozen, and you will have until the end of 2020/11/17 to send predictions to be evaluated for the unrestricted track only.
This page contains general submission instructions as well as end-to-end walk-through instructions for submitting:
EfficientQA leaderboard submissions are Docker images, uploaded to the
Google Container Registry.
Submissions must contain an executable script ~/submission.sh
that will be run
with the following command.
./submission.sh <input_file> <output_file>
Where <input_file>
contains one JSON example per line, where each example only
contains a question
field.
{"question": "who won the women's australian open 2018"}
{"question": "kuchipudi is a dance form of which state"}
{"question": "who did stephen amell play in private practice"}
{"question": "who created the chamber of secrets in harry potter"}
And /submission.sh
should write predictions to <output_file>
, one per line,
as follows:
{"question": "who won the women's australian open 2018", "prediction": "Caroline Wozniacki"}
{"question": "kuchipudi is a dance form of which state", "prediction": "Tamil Nadu"}
{"question": "who did stephen amell play in private practice", "prediction": "a pedestrian"}
{"question": "who created the chamber of secrets in harry potter", "prediction": "the Heir of Salazar Slytherin"}.
The Docker image containing /submission.sh
must be fully self contained. It
will not be allowed to pull in libraries or any other external resources.
Before submitting your image, please test it locally using the following commands
INPUT_DIR=/tmp/efficientqa_input
OUTPUT_DIR=/tmp/efficientqa_output
EVAL_DIR=/tmp/efficientqa_eval
mkdir ${INPUT_DIR}
mkdir ${OUTPUT_DIR}
mkdir ${EVAL_DIR}
wget https://raw.githubusercontent.com/google-research-datasets/natural-questions/master/nq_open/NQ-open.efficientqa.dev.no-annotations.jsonl -P "${INPUT_DIR}"
wget https://raw.githubusercontent.com/google-research-datasets/natural-questions/master/nq_open/NQ-open.efficientqa.dev.jsonl -P "${EVAL_DIR}"
docker pull gcr.io/<your_project_id>/<your_image_name>:<your_image_tag>
docker run -v ${INPUT_DIR}:/input -v ${OUTPUT_DIR}:/output \
--network="none" \
gcr.io/<your_project_id>/<your_image_name>:<your_image_tag> \
./submission.sh \
/input/NQ-open.efficientqa.dev.no-annotations.jsonl \
/output/predictions.jsonl
cd ${EVAL_DIR}
git clone https://github.com/google-research/language.git
pip3 install tensorflow
python3 -m language.orqa.evaluation.evaluate_predictions \
--references_path=${EVAL_DIR}/NQ-open.efficientqa.dev.jsonl \
--predictions_path=${OUTPUT_DIR}/predictions.jsonl
and ensure that you have set the permissions correctly, as detailed below.
Test submissions are run using Google Cloud, and you will need to create a
Google Cloud Platform account at
Once you have a Google Cloud account, you will need to create a project for your submissions in your console, and you will need to give us permission to access this project as follows:
artifacts.<project-name>.appspot.com
.Now you can either upload your submission directly, or you can use the Cloud SDK to build it as follows.
cd "${SUBMISSION_DIR}"
MODEL_TAG=latest
gcloud auth login
gcloud config set project <your_project_id>
gcloud services enable cloudbuild.googleapis.com
gcloud builds submit --tag gcr.io/<your_project_id>/${MODEL}:${MODEL_TAG} .
And you can then submit this image by following the instructions on the leaderboard submission page.
We suggest that you first run your submission with the test
option, to
ensure that it runs on a 100 example dev-set sample, before submitting an
official
attempt.
This walk-through uses the
T5 baseline. We
are using the smallest t5.1.1.small_ssm_nq
model, for efficiency, but you
could replace this with t5.1.1.xl_ssm_nq
to get the 3B parameter model.
First, download the EfficientQA development set input examples. We will be using these to test our submission locally. This file is not the same as the standard development set file because it does not contain answers. If your submission expects an answer field in the input examples it will crash. Please make sure you test your submission with these examples as input.
INPUT_DIR=~/efficientqa_input
mkdir ${INPUT_DIR}
wget https://raw.githubusercontent.com/google-research-datasets/natural-questions/master/nq_open/NQ-open.efficientqa.dev.no-annotations.jsonl -P ${INPUT_DIR}
Create a submission directory and follow the instructions to download and export a T5 model.
# Create the submission directory.
SUBMISSION_DIR=~/t5_efficientqa_submission
MODEL_DIR="${SUBMISSION_DIR}/models"
SRC_DIR="${SUBMISSION_DIR}/src"
mkdir -p "${MODEL_DIR}"
mkdir -p "${SRC_DIR}"
# Install t5
pip install -qU t5
# Select one of the models below by un-commenting it.
MODEL=t5.1.1.small_ssm_nq
#MODEL=t5.1.1.xl_ssm_nq
# Export the model.
t5_mesh_transformer \
--model_dir="gs://t5-data/pretrained_models/cbqa/${MODEL}" \
--use_model_api \
--mode="export" \
--export_dir="${MODEL_DIR}/${MODEL}"
Our example makes use of
tensorflow serving to serve our
model. So all we need to do is to create an inference script that will call the
model server for each input example, and output predictions in the required
format. Create a file predict.py
in your ${SRC_DIR}
that contains the code
below.
# Prediction script for T5 running with TF Serving.
from absl import app
from absl import flags
import json
import requests
flags.DEFINE_string('server_host', 'http://localhost:8501', '')
flags.DEFINE_string('model_path', '/v1/models/t5.1.1.small_ssm_nq',
'Path to model, TF-serving adds the `v1` prefix.')
flags.DEFINE_string('input_path', '', 'Path to input examples.')
flags.DEFINE_string('output_path', '', 'Where to output predictions.')
flags.DEFINE_bool('verbose', True, 'Whether to log all predictions.')
FLAGS = flags.FLAGS
def main(_):
server_url = FLAGS.server_host + FLAGS.model_path + ':predict'
with open(FLAGS.output_path, 'w') as fout:
with open(FLAGS.input_path) as fin:
for l in fin:
example = json.loads(l)
predict_request = '{{"inputs": ["nq question: {0}?"]}}'.format(
example['question']).encode('utf-8')
response = requests.post(server_url, data=predict_request)
response.raise_for_status()
predicted_answer = response.json()['outputs']['outputs'][0]
if FLAGS.verbose:
print('{0} -> {1}'.format(example['question'], predicted_answer))
fout.write(
json.dumps(
dict(question=example['question'], prediction=predicted_answer))
+ '\n')
if __name__ == '__main__':
app.run(main)
We can test this locally using the tensorflow-serving Docker image.
docker pull tensorflow/serving:nightly
docker run -t --rm -p 8501:8501 \
-v ${MODEL_DIR}:/models -e MODEL_NAME=${MODEL} tensorflow/serving:nightly &
python3 "${SRC_DIR}/predict.py" \
--input_path="${INPUT_DIR}/NQ-open.efficientqa.dev.no-annotations.jsonl" \
--output_path="/tmp/predictions.jsonl"
Now we just need to create our executable submission.sh
that will call
predict.py
. This uses the tensorflow-serving binary, which will be packaged in
our Docker submission (instructions below). As with predict.py
, create this in
your ${SRC_DIR}
.
# Path to T5 saved model.
MODEL_BASE_PATH='/models'
MODEL_NAME='t5.1.1.small_ssm_nq'
MODEL_PATH="${MODEL_BASE_PATH}/${MODEL_NAME}"
# Get predictions for all questions in the input.
INPUT_PATH=$1
OUTPUT_PATH=$2
# Start the model server and wait, to give it time to come up.
tensorflow_model_server --port=8500 --rest_api_port=8501 \
--model_name=${MODEL_NAME} --model_base_path=${MODEL_PATH} "$@" &
sleep 20
# Now run predictions on input file.
echo 'Running predictions.'
python predict.py --model_path="/v1/models/${MODEL_NAME}" \
--verbose=false \
--input_path=$INPUT_PATH --output_path=$OUTPUT_PATH
echo 'Done predicting.'
Make sure that submission.sh
is executable, and then create the following dockerfile in ${SUBMISSION_DIR}/Dockerfile
. This defines a Docker image that
contains all of our code, libraries, and data.
ARG TF_SERVING_VERSION=2.3.0
ARG TF_SERVING_BUILD_IMAGE=tensorflow/serving:${TF_SERVING_VERSION}-devel
FROM ${TF_SERVING_BUILD_IMAGE} as build_image
FROM python:3-slim-buster
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
&& \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Install TF Serving pkg.
COPY --from=build_image /usr/local/bin/tensorflow_model_server /usr/bin/tensorflow_model_server
# Install python packages.
RUN pip install absl-py
RUN pip install requests
ADD src .
# The tensorflow serving Docker image expects a model directory at `/models` and
# this will be mounted at `/v1/models`.
ADD models models/
Build and then test this Docker image.
docker build --tag "$MODEL" "${SUBMISSION_DIR}/."
docker run -v "${INPUT_DIR}:/input" -v "/tmp:/output" "${MODEL}" bash \
"submission.sh" \
"input/NQ-open.efficientqa.dev.no-annotations.jsonl" \
"output/predictions.jsonl"
And you can find the size of the image as follows.
docker run "${MODEL}" du -h /
Please don’t override the du
command. We will also use other methods of
checking the size of your submission, and will remove any submissions that have
modified the standard definition of du
.
Evaluate your predictions using the instructions above, to ensure that they are in the correct format and that the accuracy is as expected. If everything looks good locally, you are ready to upload your image to the submission system. Instructions above.
This is an example of upload an ORQA-based model to EfficientQA. Unlike the T5 tutorial above, it’s much less optimized in that it keeps many unnecessary dependencies and doesn’t use the provided GPUs.
Your working directory should look like this by the time you build the docker:
Dockerfile
src/
language/
common/
...
orqa/
...
model
params.json
blocks.tfr
bert/
...
embedder/
...
export/
best_default/
checkpoint/
...
submission.sh
compile_custom_ops.sh
Download the language repository which contains the ORQA code and remove
everything at the top level that isn’t either common
or orqa
:
git clone git@github.com:google-research/language.git
rm -r -v !("common"|"orqa")
Download the model corresponding to the model_dir
flag in the ORQA codebase
(see the README in
https://github.com/google-research/language/tree/master/language/orqa for
details).
The ORQA model fine-tuned from REALM pre-training can be found on Google
Cloud Storage: gs://realm-data/orqa_nq_model_from_realm
.
params.json
in the model directory containings paths to important files that
are typically on GCS. Since EfficientQA does not permit downloading from the
internet, we need to download those files and rewrite those paths to local
directories. This includes the files block_records_path
(which contains the
Wikipedia text), reader_module_path
, and retriever_module_path
. For example
since the original params.json
contains
"block_records_path": "gs://orqa-data/enwiki-20181220/blocks.tfr",
We would need to run
gsutil cp gs://orqa-data/enwiki-20181220/blocks.tfr .
and rewrite that line as:
"block_records_path": "blocks.tfr",
submission.sh
should contain the following command:
python3.7 -m language.orqa.predict.orqa_predict \
--dataset_path=$1 \
--predictions_path=$2 \
--print_prediction_samples=false \
--model_dir=/
ORQA uses a few custom ops written in C++ that should be compiled in the Docker
environment. compile_custom_ops.sh
should contain:
#!/bin/bash
TF_CFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))') )
TF_LFLAGS=( $(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))') )
g++ -std=c++11 -shared language/orqa/ops/orqa_ops.cc -o language/orqa/ops/orqa_ops.so -fPIC ${TF_CFLAGS[@]} ${TF_LFLAGS[@]} -O2
Finally we can put everything together into the Dockerfile
.
FROM tensorflow/tensorflow:2.1.1-gpu
COPY src /
COPY model /
COPY compile_custom_ops.sh /
COPY submission.sh /
RUN add-apt-repository ppa:ubuntu-toolchain-r/test && \
apt-get update && \
apt-get upgrade -y libstdc++6
RUN apt-get update && \
apt-get install -y --no-install-recommends ca-certificates && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN add-apt-repository ppa:deadsnakes/ppa && apt-get update && apt-get install -y python3.7
RUN python3.7 -m pip install --upgrade pip
RUN pip install tensorflow-text~=2.1.0
RUN pip install tf-models-official==2.1.0.dev2
RUN pip install bert-tensorflow==1.0.4
RUN pip install tf-hub-nightly
RUN pip install sentencepiece==0.1.91
RUN pip install https://storage.googleapis.com/scann/releases/1.0.0/scann-1.0.0-cp37-cp37m-linux_x86_64.whl
RUN ./compile_custom_ops.sh
Unfortunately in order to ScaNN (Google’s MIPS library), we need to upgrade
libstdc++6
and that somehow interferes with the ability to use GPUs.
Once everything is in place, you should follow the instructions above to build your image and submit it to the EfficientQA leaderboard.