Bases: DatasetBuilder
VQA-RAD (Visual Question Answering in Radiology) is a dataset comprising 3,515 question-answer pairs on 315 radiology images, designed for visual question answering tasks.
Train/test splits available.
"A dataset of clinically generated visual questions and answers about radiology images"
· 2018 · Jason J. Lau, Soumya Gayen, Asma Ben Abacha, Dina Demner-Fushman
https://www.nature.com/articles/sdata2018251
Dataset version used: https://huggingface.co/datasets/flaviagiammarino/vqa-rad
Source code in medplexity/benchmarks/vqarad/vqarad_dataset_builder.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71 | class VQARadDatasetBuilder(DatasetBuilder):
"""
VQA-RAD (Visual Question Answering in Radiology) is a dataset comprising 3,515 question-answer pairs on 315 radiology images, designed for visual question answering tasks.
Train/test splits available.
Paper: "A dataset of clinically generated visual questions and answers about radiology images"
· 2018 · Jason J. Lau, Soumya Gayen, Asma Ben Abacha, Dina Demner-Fushman
<https://www.nature.com/articles/sdata2018251>
Dataset version used: <https://huggingface.co/datasets/flaviagiammarino/vqa-rad>
"""
def build_dataset(
self,
split_type: VQARadSplitTypes = "test",
config=None,
) -> Dataset[VQARadDataPoint]:
vqa_raw_data = self.loader.load("flaviagiammarino/vqa-rad", split=split_type)
entries = [VQARadEntry(**row) for row in vqa_raw_data]
data_points = [
VQARadDataPoint(
id=f"{split_type}-{i}",
input=VQARadInput(
image=entry.image,
question=entry.question,
),
expected_output=None,
metadata=VQARadMetadata(
expected_answer=entry.answer,
),
)
for i, entry in enumerate(entries)
]
return Dataset[VQARadDataPoint](
data_points=data_points, description=self.__doc__
)
|