Skip to content

MedicationQA

Bases: DatasetBuilder

Medication Question Answering created using real consumer questions.

Paper: Bridging the Gap between Consumers’ Medication Questions and Trusted Answers. 2019 * Asma Ben Abacha, Yassine Mrabet, Mark Sharp, Travis Goodwin, Sonya E. Shooshan and Dina Demner-Fushman http://ebooks.iospress.nl/publication/51941

No dataset splitting (only "train" split).

Dataset version used: https://huggingface.co/datasets/truehealth/medicationqa/viewer/default/train

Source code in medplexity/benchmarks/medicationqa/medicationqa_dataset_builder.py
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
class MedicationQADatasetBuilder(DatasetBuilder):
    """Medication Question Answering created using real consumer questions.

    Paper: Bridging the Gap between Consumers’ Medication Questions and Trusted Answers.
    2019 * Asma Ben Abacha, Yassine Mrabet, Mark Sharp, Travis Goodwin, Sonya E. Shooshan and Dina Demner-Fushman
    <http://ebooks.iospress.nl/publication/51941>

    No dataset splitting (only "train" split).

    Dataset version used: <https://huggingface.co/datasets/truehealth/medicationqa/viewer/default/train>
    """

    def build_dataset(
        self,
        split_type: str = "train",
        config=None,
    ) -> Dataset[MedicationQADataPoint]:
        dataset = self.loader.load("truehealth/medicationqa", split=split_type)

        questions = [MedicationQAEntry(**row) for row in dataset]

        data_points = [
            MedicationQADataPoint(
                id=f"{split_type}-{i}",
                input=question.question,
                expected_output=None,
                metadata=MedicationQAMetaData(
                    answer=question.answer,
                    focus=question.focus,
                    question_type=question.question_type,
                    section_title=question.section_title,
                    url=question.url,
                ),
            )
            for i, question in enumerate(questions)
        ]

        return Dataset[MedicationQADataPoint](
            data_points=data_points, description=self.__doc__
        )