PeKo: A Large Scale Precondition Knowledge Dataset

Overview

PeKo (Precondition Knowledge) is a large scale crowdsourced event precondition knowledge dataset introduced in our paper “Modeling Preconditions in Text with a Crowd-sourced Dataset” at EMNLP Findings 2020

Preprint is available from here

Crowdsourcing Precondition Knowledge

Crowdsourcing Task

Data Preparation

We extract events and their temporal relations from news articles using CAEVO (Chambers et al., 2014), a temporal relation extraction system. We used CAEVO on a random sample of 6,837 articles inthe New York Times Annotated Corpus (Sandhaus, 2008). On average CAEVO extracted around 63 events per article, which yielded a total of 3,906 possible relation candidates per document. We filtered these to retain only pairs of events that have a BEFORE or AFTER temporal relation between them. We call the temporally preceding event the candidate precondition, and the temporally subsequent event in the pair the target event.

Crowdsourcing Task

The annotators were presented with a text snippet and two event mentions highlighted as shown below. To prune out event extraction errors from CAEVO, the annotators were first asked if the highlighted text denoted valid events. If both triggers were deemed valid, then the annotators evaluated whether or not the candidate precondition event was an actual precondition for the target event. Specifically they check if the candidate event is necessary for the target event to happen.

HIT example

As the result of crowdsouring, we have 10,806 preconditions out of 28,948 instances in total.

Tasks

We now propose two tasks that test for the ability to recognize and generate preconditions in textual contexts. Here we describe evaluations to benchmark the performance of current models on these tasks and to better understand the challenges involved.

PeKo Task 1: Precondition Identification

Given a text snippet with a target and candidate event pair, the task is to classify if the candidate event is a precondition for the target in the context described by the text snippet. This is a standard sentence-level classification task.

Result Table

PeKo Task 2: Precondition Generation Task

Here we introduce Precondition Generation as a more general challenge that a dataset like PeKo now enables. Given a target event t, generate an event p that is a precondition for t. We benchmark performance on evaluation instances drawn from both PeKo and an out-of-domain dataset ATOMIC.

Generation Result Table

Download

The dataset can be downloaded from here

Citation

Please use the following bibtex entry:

@article{kwon2020modeling,
  title={Modeling Preconditions in Text with a Crowd-sourced Dataset},
  author={Kwon, Heeyoung and Koupaee, Mahnaz and Singh, Pratyush and Sawhney, Gargi and Shukla, Anmol and Kallur, Keerthi Kumar and Chambers, Nathanael and Balasubramanian, Niranjan},
  journal={arXiv preprint arXiv:2010.02429},
  year={2020}
}

Dataset Information

data
 ├── peko_all.jsonl             # PeKo dataset
 ├── peko_gen_train.txt         # PeKo generation instances
 ├── peko_gen_dev.txt
 ├── peko_gen_test.txt
 ├── temp_gen_train.txt         # Generation instances for temporal model
 ├── temp_gen_dev.txt
 ├── LM_gen_train.txt           # Generation instances for plain language model
 ├── LM_gen_dev.txt
 └── atomic_samples.txt         # ATOMIC samples for generation task

Contributors