View on GitHub

BioNLI

Biomedical Natural Language Inference Dataset

What is BioNLI?

BioNLI is a biomedical NLI dataset using controllable text generation

This is the official page for the paper:

BioNLI: Generating a Biomedical NLI Dataset Using Lexico-semantic Constraints for Adversarial Examples
accepted at EMNLP2022 (Findings).

BioNLI is the first dataset in biomedical natural language inference. This dataset contains abstracts from biomedical literature and mechanistic premises generated with nine different strategies.

Example

In the following example we see an example of an entry in the BioNLI dataset. Some supporting text was removed to save space. The premise is a set of sentences talking about two biomedical entiteis. The consistent hypothesis is the original conclusion sentence from the abstract paper, the inconsistent hypothesis is the generated sentence with one of the different nine strategies.

Coming Soon

Dataset Statistics

There are two different versions of this dataset. One is the large distribution which contains all possible perturbations and the other is the balanced distirbution. They both share the same test set. For the full distribution, we generate as many perturbations as possible for dev and test set, but for training each instance is perturbed once.

Full Distribution:

Image of full stats

Balanced Distribution:

Image of balanced stats

Download the data

The dataset can be downloaded here:

The full set can be downloaded from here.

The balanced set can be downloaded from here.

To access the test set please contact me.

License

BioNLI is distributed under CC BY 4.0 License.

Liked us? Cite us!

Please use the following bibtex entry:

@inproceedings{bastan-etal-2022-bionli,
    title = "{B}io{NLI}: Generating a Biomedical {NLI} Dataset Using Lexico-semantic Constraints for Adversarial Examples",
    author = "Bastan, Mohaddeseh  and
      Surdeanu, Mihai  and
      Balasubramanian, Niranjan",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-emnlp.374",
    pages = "5093--5104",
    
}