An Open Chest X-ray Dataset with Benchmarks for Automatic Radiology Report Generation in French

Hichem Metmer1, Xiaoshan Yang2
1,2 State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS),
Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, 100190, China,
School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS), Beijing, 101408, China.

Abstract

Medical report generation (MRG), which aims to automatically generate a textual description of a specific medical image (e.g., a chest X-ray), has recently received increasing research interest. Building on the success of image captioning, MRG has become achievable, though generating language-specific radiology reports remains challenging for data-driven models due to their reliance on paired image-report chest X-ray datasets - a resource that is labor-intensive, time-consuming, and costly to produce. In this paper, we introduce CASIA-CXR, a chest X-ray benchmark dataset consisting of high-resolution chest radiographs accompanied by narrative reports originally written in French, representing (to our knowledge) the first public chest radiograph dataset with medical reports in this language. We propose a simple yet effective multimodal encoder-decoder contextually-guided framework for French medical report generation, validated through both intra-language and cross-language contextual analysis along with expert evaluation by radiologists. The dataset is freely available at: https://www.casia-cxr.net.


Abstract in Chinese (中文摘要)
开放式胸部 X 射线数据集,具有自动生成法语放射学报告的基准 医疗报告生成(MRG)旨在自动生成特定医学图像(例如胸部X光片)的文本描述,最近受到了越来越多的研究兴趣。在图像字幕的成功基础上,MRG 已经成为可能。然而,生成特定语言的放射学报告对数据驱动模型提出了挑战,因为它们依赖于配对图像报告胸部 X 射线数据集,这些数据集是劳动密集型、耗时且昂贵的。在本文中,我们介绍了一个胸部 X 射线基准数据集,即由高分辨率胸部 X 光照片以及最初用法语编写的叙述报告组成。据我们所知,这是第一个包含这种特定语言的医疗报告的公共胸部放射线照片数据集。重要的是,我们提出了一个简单而有效的多模态编码器-解码器上下文引导框架,用于法语医疗报告的生成。我们通过语言内和跨语言上下文分析验证了我们的框架,并辅以放射科医生进行的专家评估。该数据集可在以下位置免费获取: https://www.casia-cxr.net

Citation

If you use our dataset in your research, please cite our paper:

@article{METMER2024128478, title = {An open chest X-ray dataset with benchmarks for automatic radiology report generation in French}, journal = {Neurocomputing}, volume = {609}, pages = {128478}, year = {2024}, issn = {0925-2312}, doi = {https://doi.org/10.1016/j.neucom.2024.128478}, url = {https://www.sciencedirect.com/science/article/pii/S0925231224012499}, author = {Hichem Metmer and Xiaoshan Yang}, keywords = {Chest X-ray dataset, Medical report generation in French}, abstract = {Medical report generation (MRG), which aims to automatically generate a textual description of a specific medical image (e.g., a chest X-ray), has recently received increasing research interest. Building on the success of image captioning, MRG has become achievable. However, generating language-specific radiology reports poses a challenge for data-driven models due to their reliance on paired image-report chest X-ray datasets, which are labor-intensive, time-consuming, and costly. In this paper, we introduce a chest X-ray benchmark dataset, namely CASIA-CXR, consisting of high-resolution chest radiographs accompanied by narrative reports originally written in French. To the best of our knowledge, this is the first public chest radiograph dataset with medical reports in this particular language. Importantly, we propose a simple yet effective multimodal encoder–decoder contextually-guided framework for medical report generation in French. We validated our framework through intra-language and cross-language contextual analysis, supplemented by expert evaluation performed by radiologists. The dataset is freely available at: https://www.casia-cxr.net/.} }