Harnessing AI–human synergy for deep learning research analysis in ophthalmology with large language models assisting humans

来源期刊： Eye Science | 2024年1月第1卷第1期： 7-25 发布时间：2024-03-28 收稿时间：2024/4/7 15:05:50 阅读量：6164

作者：: MingJie Luo,

Weixing Zhang,

Zheming Zhang,

Jianyu Pang,

Zhenzhe Lin,

Lanqin Zhao,

Duoru Lin,

Haotian Lin,

关键词：: large language model AI–human collaboration research trends ophthalmology model performance; large language model AI–human collaboration research trends ophthalmology model performance

DOI：: 10.12419/es24030101

Received date：: 2024-02-15
Revised date：: 2024-03-20
Accepted date：: 2024-03-28
Published online：: 2024-03-28

Background: Research innovations inoculardisease screening, diagnosis, and management have been boosted by deep learning (DL) in the last decade. To assess historical research trends and current advances, we conducted an artifcial intelligence (AI)–human hybrid analysis of publications on DL in ophthalmology.

Methods: All DL-related articles in ophthalmology, which were published between 2012 and 2022 from Web of Science, were included. 500 high-impact articles annotated with key research information were used to fne-tune alarge language models (LLM) for reviewing medical literature and extracting information. After verifying the LLM's accuracy in extracting diseases and imaging modalities, we analyzed trend of DL in ophthalmology with 2 535 articles.

Results: Researchers using LLM for literature analysis were 70% (p= 0.000 1) faster than those who did not, while achieving comparable accuracy (97% versus 98%, p = 0.768 1). The field of DL in ophthalmology has grown 116% annually, paralleling trends of the broader DL domain. The publications focused mainly on diabetic retinopathy (p = 0.000 3), glaucoma (p = 0.001 1), and age-related macular diseases (p = 0.000 1) using retinal fundus photographs (FP, p = 0.001 5) and optical coherence tomography (OCT, p = 0.000 1). DL studies utilizing multimodal images have been growing, with FP and OCT combined being the most frequent. Among the 500 high-impact articles, laboratory studies constituted the majority at 65.3%. Notably, a discernible decline in model accuracy was observed when categorizing by study design, notwithstanding its statistical insignificance. Furthermore, 43 publicly available ocular image datasets were summarized.

Conclusion: This study has characterized the landscape of publications on DL in ophthalmology, by identifying the trends and breakthroughs among research topics and the fast-growing areas. This study provides an efcient framework for combined AI–human analysis to comprehensively assess the current status and future trends in the feld.

Background: Research innovations inoculardisease screening, diagnosis, and management have been boosted by deep learning (DL) in the last decade. To assess historical research trends and current advances, we conducted an artifcial intelligence (AI)–human hybrid analysis of publications on DL in ophthalmology.

Methods: All DL-related articles in ophthalmology, which were published between 2012 and 2022 from Web of Science, were included. 500 high-impact articles annotated with key research information were used to fne-tune alarge language models (LLM) for reviewing medical literature and extracting information. After verifying the LLM's accuracy in extracting diseases and imaging modalities, we analyzed trend of DL in ophthalmology with 2 535 articles.

Results: Researchers using LLM for literature analysis were 70% (p = 0.000 1) faster than those who did not, while achieving comparable accuracy (97% versus 98%, p = 0.768 1). The field of DL in ophthalmology has grown 116% annually, paralleling trends of the broader DL domain. The publications focused mainly on diabetic retinopathy (p = 0.000 3), glaucoma (p = 0.001 1), and age-related macular diseases (p = 0.000 1) using retinal fundus photographs (FP, p = 0.001 5) and optical coherence tomography (OCT, p = 0.000 1). DL studies utilizing multimodal images have been growing, with FP and OCT combined being the most frequent. Among the 500 high-impact articles, laboratory studies constituted the majority at 65.3%. Notably, a discernible decline in model accuracy was observed when categorizing by study design, notwithstanding its statistical insignificance. Furthermore, 43 publicly available ocular image datasets were summarized.

Conclusion: This study has characterized the landscape of publications on DL in ophthalmology, by identifying the trends and breakthroughs among research topics and the fast-growing areas. This study provides an efcient framework for combined AI–human analysis to comprehensively assess the current status and future trends in the feld.

INTRODUCTION

There has been a significant advancement in the field of biomedical artificial intelligence (AI) since 2012, particularly in the medical applications of deep learning (DL) algorithms. Medical AI plays an increasingly important role in the development of medicine and improvement of the health and longevity of the population. This period witnessed the transition from the maturity of DL algorithms to their incorporation into various medical domains, marking a complete phase for biomedical AI development. Ophthalmology has experienced significant growth and development in the field of DL, with both early-stage and well-established DLapplications being utilized in real-world scenarios.^[1] With the technical breakthrough of DL algorithms in the last decade, the field have gradually achieved intelligent diagnosis of diferent ocular diseases such as diabetic retinopathy (DR),^[2-3] cataract^[4] and glaucoma,^[5] and have expanded the coverage of different image modalities from retinal photograph to optical coherence tomography (OCT) and other modalities. Therefore further clarification and refinement are needed to fully comprehend the developmental patterns and trends in the feld of DL in ophthalmology.

Recent publications on DL in ophthalmology has shown explosive growth. These publications provide wealth of valuable information about the changing trends in DL-assisted diagnostic systems for ocular diseases, along with a comparative analysis of the development status in different countries worldwide. This field acts as an excellent demonstration of the evolution of medical AI, showcasing changes in various aspects and providing insights into future trends. Nonetheless, published literature reviews or expert consensuses have largely focused on specifc research areas or highly cited articles,^[6–9] limiting their scope and providing only a narrow perspective on the overall changes in the field of DL in ophthalmology. Consequently, comprehensive insights into the broader dimensions of ophthalmic AI have been difcult to obtain, such as the trend of research in ocular diseases, changes in data modalities, quantities and quality of research data, along with the factors affecting the DL model performance.^[10] Additionally, manually reading and analyzing through such a large volume of literature is a time-consuming and challenging task, and might lead to a biased representation of the overall perspective.^[11] It is urgently needed to conduct a comprehensive and practical overview of this fast-evolving feld.

Recently, large language models (LLM) have shown significant success in following instructions and producing human-like responses.^[12-13] Using LLM for natural language processing and applying the state-of-the-art bibliometric analysis in joint with human experts, this study presented a comprehensive overview of the evolution in this rapid changing research feld. Base on this, we have identified trends and challenges among common ophthalmic DL research and further provided prospects for future applications. Additionally, this study offers a practical approach to comprehensively investigate current status and future trends in the field, making it a valuable reference for other researchers.

MATERIALS AND METHODS

Search Strategy and Data Collection

All publications related to DL in ophthalmology in the English language in the Web of Science (WOS) database were queried. The search included terms related to eye diseases, eye structures and common imaging examinations in ophthalmology, as well as DL-related terms such as convolutional neural networks and transfer learning. The detailed search terms used are included in the supplementary text (Supplementary text). The literature search was carried out on October 15, 2022, and studies published from January 1, 2012, to September 30, 2022 were included. A total of 6,345 articles were initially identified, 3,810 articles among them were excluded for the following reasons: duplicated records, articles categorized as reviews and comments, and articles tangential to the core focus of ophthalmology. 2,535 articles among them met the inclusion criteria after screening (Figure 1A). Among the 2,535 articles, 1,260 articles were indexed by the PubMed database. To utilize the rich information in the PubMed database, we obtained additional metadata such as the MeSH terms for these articles. The metadata were downloaded and extracted using PubMed’s E-Utilities API tools (www.pubmed.ncbi.nlm.nih.gov/help). The article title, abstract, MeSH terms and bibliometrics, including WOS citation numbers, countries, regions and institutes, were used for downstream publication analysis.

AI-human Hybrid Publication Analysis

These articles were then analyzed using the LLM-based text analysis method (Figure 1B). Out of the 2,535 articles, the top 500 most impactful articles were selected for manual in-depth analysis based on their WOS citation counts on October 30, 2022. Three individual researchers independently annotated key information in the articles, including disease, study design, data modality, data quality and quantity and public datasets. The annotated data were then used to fne-tune the bioBERT model,^[14] and the tuned LLM was then used to automatically recognize disease and data modality information in the rest of articles. Additionally, BERN2,^[15] a validated disease recognition model, was used to verify the disease recognition outputs from both models. In the LLM–assisted comparison experiment, 20 articles were randomly selected from the 2,535 articles. Three researchers were then assigned to read the titles and abstracts, extract information described above, and record the time spent and accuracy achieved.

Using the LLM-based text analysis method, we successfully extracted the studied eye diseases from 2,300 of 2,535 (90.7%) articles in the field of DL ophthalmology and categorized numerous disease types based on PubMed's disease MeSH terms, and the data modalities used were extracted and summarized from 1,611 of 2,535 (63.5%) articles. Assessing the quality and quantity of study data was relatively challenging, and required manual analysis given the limitations of LLMs in mathematical processing. To ensure accuracy, the data quality and quantity analysis included results from manual review of the 500 highest impact articles. In addition, 115 DR-related DL studies were further selected to analyze the dataset used including whether the studies had external validation datasets, used public datasets, a dataset with images of healthy controlsand provided pixel-level annotations (identifying the boundaries of disease lesions).

Research Type Classification

The articles were classified into three categories: laboratory, preclinical and clinical research. Laboratory studies involve constructing and validating algorithms and models using public data. Clinical studies utilize prevalidated AI algorithms and models in real clinical scenarios to assist in disease diagnosis and intervention. Preclinical studies, a type between laboratory and clinical studies, focus on the algorithm or model validation through public or limited private datasets to assess their future suitability for large-scale clinical use. Preclinical and clinical research studies were classified into three categories according to their design: retrospective, cross-sectional, and prospective. Study types were initially classified by two junior researchers (>3 years research experience). In cases of disagreement, adjudication was performed by a senior researcher (>8 years research experience) to fnalize the study type.

Public Ophthalmic Image Datasets

Original papers were retrieved corresponding to the public ophthalmic image datasets used in the 500 highest-impact papers. Then, we compiled and summarized the following information: database name, article DOI\URL, year of publication, type of disease, number of images, number of healthy controls, image disease annotations, image quality assessment, and pixel-level annotations.

Statistical Analysis

All statistical analyses in this study were conducted with R 4.1.1 (R Core Team, 2021). Research trends were analyzed with Sen’s slope analysis. The normality of the distribution of data quantity was tested with the Shapiro-Wilk test. Linear regression was employed to analyze the correlations between research publication date and data quantity. The model performances were compared with the Mann-Whitney U test. A two-tailed p-value less than 0.05 was considered statistically significant for all analyses.

RESULTS

Efficiency Improvement of the LLM-assistant Approach

To evaluate the efficiency of using an LLM as an auxiliary tool in literature reading and information extraction, two groups of researchers, each consisting of three individuals, were tasked with extracting information from a set of 20 research articles (Supplementary Table S1). The group utilizing the LLM completed the task in an average time of 39 (range 35–45) minutes, while the group without the LLM took an average of 128 (119–137) minutes, yielding a 70% (p = 0.000 1) increase in efficiency. The accuracies of the two groups were comparable, 97% (95%–100%) versus 98% (95%–100%, p = 0.7681).

Overview of DL Applications in Ophthalmology

Atotal of 2 535 articles were included in the LLM–assisted analysis. Over the past decade, both papers describing DLapplications in ophthalmology and those describing DL in general have been witnessed a dramatic surge in publication volumes. Ophthalmic DL articles experienced explosive growth rates between 2012 and 2019, with yearly growth rates ranging from 111% to 192%. This growth rate was moderate after 2020, with 659 articles published in 2021; moreover, this trend of DL applications in ophthalmology parallels the changes in the overall DL feld (Figure 2A). There were several technology breakthroughs related to the growth of the ophthalmic DL feld (Figure 2B). The U.S., China, and the U.K. published the most articles, with the former two countries leading by a considerable margin (Figure 2C). Additionally, the top ten institutions with the highest publication numbers mainly comprised comprehensive universities and eye hospitals (Figure 2D).

Diseases and Data Modalities in Ophthalmic DL Research

The 10 most commonly studied diseases were shown in Figure 3A, including DR, glaucoma, macular degeneration, cataract and fundus diseases. As stated here, researchers related to various ocular diseases began to gradually increase in 2016 and showed a significant increase between 2017 and 2020. The most studied on ocular diseases are DR, glaucoma and macular degeneration, which have also shown an increasing trend in the last 10 years, with Sen’s slopes of 20.8 (p = 0.000 3), 14.9 (p = 0.001 1) and 8.2 (p = 0.000 1), respectively (Figure 3B).

With similar methods, the data modalities used were extracted and summarized. The top 10 data modalities in terms of growth rate were ranked on Sen’s slope (Figure 3C). As stated, DL studies using fundus photographs (FP) and OCT began to rapidly increase in number and gain prominence in 2018, with Sen’s slopes of 31.1 (p = 0.001 5) and 30.8 (p = 0.000 1), respectively (Figure 3D). Many other data modalities including fluorescein angiography (FA), optical coherence tomography angiography (OCTA), visual acuity (VA), visual fields (VF) and corneal topography (CT), have been increasing applied since 2020. The data modalities used in studies of different ocular diseases varied considerably (Figure 3E). Specifically, studies on DR mainly used OCT and FP; studies on glaucoma relied mainly on OCT, FP, and VF; and macular degeneration studies relied on OCT and FP. Additionally, several DL studies used multimodal ophthalmic data (Table 1). 27 (33%) multimodality studies involved both FP and OCT images, 17 (21%) used OCT and VF, and 7(9%) studies utilized FPand one of the modalities such as OCTA, slit lamp (SL), and VA.

Research Type Analysis of the Top 500 Highest-impact Articles

Among the top 500 most impactful articles, all three types of research (laboratory, preclinical and clinical) were found a rise in number of studies (Figure 4A). The number of laboratory research studies increased noticeably from 7 in 2012 to 108 in 2020. Furthermore, preclinical studies experienced a discernible rise in number to over 50 per year, whereas the number of clinical studies remained stagnant, with fewer than 10 conducted annually. Among preclinical and clinical studies, retrospective studies accounted for 77.4% (123), with only 21 (12.6%) cross-sectional studies, and 15 (10.1%) prospective studies (Figure 4B).

Changes in Fundus Photograph Data and DL Model Performance for DR

Given the extensive researches on DR and its early prominence, out of 500, 115 DR-related articles were selected to evaluate changes in quantity and quality for ophthalmic data and DL model performance. The dataset characteristics of these studies were summarized in Table 2. All criteria showed a general increase since 2015, despite occasional volatility. Although the early dearth of available datasets and use of healthy control images, the number of studies fulflling these criteria have increased to 14 (33.3%) and 25 (59.5%), respectively. Additionally, an increasing number of studies tended to provide pixel-level lesion annotations over bulk image annotations, showing an increase in the data granularity.

The dataset size and fnal model performance used in the 115 DR studies were compiled. This is to determine whether a large data size is necessary to train a model with adequate performance (Figure 4C). The interquartile range (IQR) of the data size for DR classifcation models is 462–74,198 images, and the average data size has grown from 70,817 images in 2016 to 122,810 in 2021, with an average annual increase of 10,398 (14.7%) images. By dividing the presented performances in these studies according to the study design, it was revealed that the average accuracy of the models declined consistently (Figure 4D).

To provide a reference of publicly available datasets, the ocular disease image databases used in these 500 highest-impact articles were summarized with the pertinent information (Table 3 and Supplementary Table S2).

Table 1 Summary of multimodal deep learning studies in ophthalmology

Table 2 Data characteristics from the 115 most impactful deep learning studies of diabetic retinopathyData characteristics from the 115 most impactful deep learning studies of diabetic retinopathy

Table 3 Summary of the open access ophthalmological datasets used in deep learning studies

DISCUSSION

DL in ophthalmology is evolving rapidly and the status of the field is difficult to summarize comprehensively using conventional methods.^[7–9] This study presents a comprehensive overview of all existing research using LLM-based methods combined with in-depth manual analysis to uncover trends in DL in ophthalmology from a large amount of publication data. The development of DL in ophthalmology is in sync with the overall progress in DL domain, relying heavily on the foundational support provided by the latter. In this study, we fine-tuned the BERT-based large-scale language model with manually annotated paper data to build an intelligent LLM capable of reviewing medical literature and extracting key information. Its accuracy was verifed in extracting disease and image modality information, suggesting that incorporating an LLM into the literature analysis workflow could enhance the efficiency and productivity of researchers in the field. The number of articles and development trends of various eye diseases were categorized and summarized using the LLM–assisted approach.

Advances in the DL field have accelerated DL research in ophthalmology. From 2012 to 2015, there were revolutionary developments in the DL field, including the construction of benchmark image datasets ImageNet,^[16] COCO^[17] and development of convolutional neural network architectures such as AlexNet^[18] and ResNet.^[19] The subsequent rapid growth of DL research in ophthalmology occurred as a result of these breakthroughs in DL advancing the state-of-the-art and laying the foundation for further research and development of DL in ophthalmology. Many of the leading ophthalmic deep learning studies have come from technologically advanced countries including the United States, China, the United Kingdom, and India. With major research contributions from these countries, deep learning-based diagnostic systems for ocular diseases have gained approval from global regulatory bodies. In 2018, the first U.S.license for such a system was issued,^[20] followed by the Chinese license in 2020.^[21] This widespread regulatory approval refects the progress in deep learning research and validation of its ability to accurately detect and diagnose eye diseases, signaling a shift from laboratory studies to clinical applications.

Ophthalmic DL research has gradually transitioned from initial feasibility studies to real-world clinical applications. Early feasibility DL studies focused on DR based on FP, as the large patient population and extensive labeled image datasets enabled algorithm development and validation with minimal technical complexity.^[22] By establishing capabilities and clinical value on a common ocular disease (e.g. DR) with abundant training data, researchers laid the groundwork to then investigate expanding DL to other ocular diseases.^[1] The initial successes have led to further research broadening real-world deployment of DL models across diverse clinical applications, such as telemedicine screening and smartphone-based diagnostic tools at point-of-care.^[23-24]

Moving beyond reliance on single data modalities like FP and OCT, there is instead increasing utilization of multimodal approaches that integrate information from diverse clinical exams and tests to enable more comprehensive and accurate diagnosis. This transformation is driven by complex ocular conditions like glaucoma that require assimilating data from basic ophthalmic exams, visual feld testing, cup-to-disc ratios from fundus imaging, and optic nerve layer thickness from OCT to facilitate robust clinical judgments.^[25]Moreover, combining other data sources like genomic tests with imaging data has been proven efficacy for predictive modeling in diseases like age-related macular degeneration.^[26]As algorithms become more sophisticated, they can synergistically combine disparate inputs from various ophthalmic subspecialties, testing modalities, and data types. By amalgamating these diverse datasets, DL models can mimic multifaceted clinical decision making and enable more precise disease diagnosis and prognosis across ophthalmology.

Data quantity and quality are crucial in DL applications in ophthalmology. However, the necessary sample size is contingent upon the complexity of the disease and detection task, as well as the intricacy of the model. In an early experiment, the impact of dataset size on DL algorithm performance in detecting DR was analyzed.^[3] The results revealed that peak performance was reached at approximately 60,000 images, suggesting that increasing the dataset size beyond this point did not improve algorithm performance. However, with advancements in DL algorithms, it is now unclear what amount of data is required for optimal performance. By investigating high-impact DL articles on DR detection, our findings indicate that the DL model’s performance decreased markedly from laboratory to clinical studies. This observation implies that the self-reported AUC and other evaluation criteria employed in these studies may not adequately represent the real-world performance of the DL models due to the reproducibility issues.[27]Alternatively, it could be that the data analyzed, extracted from previously published DR-related studies, lack broad applicability to other ocular diseases, which necessitates more rigorous experimental investigations in future studies. Additionally, we found that the proportion of studies using externally validated datasets and public datasets was not high (20–35%) which might be due to the difculty in obtaining resources for external validation datasets and public datasets. The question about the sample size required for deep learning is the one without a standard answer. Determining the optimal sample size for DL studies is challenging due to the disease complexity, specific medical tasks, and the complexity of DL models.^[28-29] Given that the results derived from different deep learning algorithms on various datasets cannot be directly compared, it is imperative to establish a unifed and objective standard evaluation method. This will ensure a greater consistency in the assessment of model performance, thereby enhancing the reliability of the outcomes.

This study has several limitations. This study is based on previously published articles, and therefore may not capture emerging trends in research that has not yet been published. Additionally, the citation time-frame used in this study is restricted to high-impact articles published mainly before 2021, resulting in fewer articles from 2022 onward that were included in the refined analysis. To obtain more concrete conclusions, a more robust study with a larger sample size is warranted. Although we analyzed ophthalmology research trends, data modality, volume, and types of studies, some other aspects of deep learning in ophthalmology research were not addressed in this paper. These aspects include data privacy and security, interpretability and transparency of AI models, and regulation and standardization of AI in practical applications. There are some articles that are not indexed in PubMed and do not have MeSH words, which is mitigated by our thorough analysis of titles and abstracts, ensuring a comprehensive review that minimizes the impact of this limitation on our study's fndings.

In conclusion, we showed that an LLM combined with in-depth manual analysis were capable of reviewing medical literature and extracting information. Using the LLM–assisted approach, we have identified trends and challenges among common ophthalmic DL research and further provided prospects for future applications. This includes the necessity of validating AI models via real world clinical setting, and creating standardized, public accessible datasets to enhance collaboration, benchmarking for DL applications. Additionally, this study offers a practical approach to comprehensively investigate current status and future trends in the field, making it a valuable reference for other researchers.

Correction notice

None

Acknowledgement

None

Author Contributions

(I) Conception and design: HTL,DRL,MJL,ZZL
(II)Administrative support: HTL, DRL
(III) Provision of study materials or patients: HTL, DRL
(IV) Collection and assembly of data: MJL, WXZ, ZMZ, JYP
(V) Data analysis and interpretation: MJL, JYP, ZZL, LQZ
(VI) Manuscript writing: MJL, WXZ
(VII) Final approval of manuscript:All authors

Funding

This study was supported by the National Natural Science Foundation of China (82000946), Guangdong Natural Science Funds for Distinguished Young Scholar (2023B1515020100), the Natural Science Foundation of Guangdong Province (2021A1515012238), and the Science and Technology Program of Guangzhou (202201020522 and 202201020337).

The funding organizations had no role in the following aspects: design and conduct of the study; the collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and the decision to submit the manuscript for publication.

Confict of Interests

None of the authors has any conflicts of interest to disclose. All authors have declared in the completed the ICMJE uniform disclosure form.

Patient consent for publication

None

Ethical Statement

This study does not contain any studies with human or animal subjects performed by any of the authors.

Provenance and Peer Review

This article was a standard submission to our journal. The article has undergone peer review with ouranonymous review system.

Data Sharing Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

OpenAccess Statement

This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

1、Ting DSW, Peng L, Varadarajan AV, et al. Deep learning in ophthalmology: The technical and clinical considerations. Progress in Retinal and Eye Research. 2019;72:100759.

2、Ting DSW, Cheung CY-L, Lim G, et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. Jama. 2017;318:2211–2223.

3、Gulshan V, Peng L, Coram M, et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA. 2016;316:2402–2410.

4、Long E, Lin H, Liu Z, et al. An artificial intelligence platform for the multihospital collaborative management of congenital cataracts. Nature Biomedical Engineering. 2017;1:0024.

5、Dong L, He W, Zhang R, et al. Artificial Intelligence for Screening of Multiple Retinal and Optic Nerve Diseases. JAMA Network Open. 2022;5:e229960.

6、Sebastian A, Elharrouss O, Al-Maadeed S, Almaadeed N. A Survey on Deep-Learning-Based Diabetic Retinopathy Classification. Diagnostics. 2023;13:345.

7、Yang%20J%2C%20Wu%20S%2C%20Dai%20R%2C%20et%20al.%20Publication%20trends%20of%20artificial%20intelligence%20in%20retina%20in%2010%20years%3A%20Where%20do%20we%20stand%3F%20Frontiers%20in%20Medicine.%202022%3B9.

8、Lim WX, Chen Z, Ahmed A. The adoption of deep learning interpretability techniques on diabetic retinopathy analysis: a review. Medical & Biological Engineering & Computing. 2022;60:633–642.

9、Jin K, Ye J. Artificial intelligence and deep learning in ophthalmology: Current status and future perspectives. Advances in Ophthalmology Practice and Research. 2022;2:100078.

10、M%C3%BCnchmeyer%20J%2C%20Woollam%20J%2C%20Rietbrock%20A%2C%20et%20al.%20Which%20picker%20fits%20my%20data%3F%20A%20quantitative%20evaluation%20of%20deep%20learning%20based%20seismic%20pickers.%20Journal%20of%20Geophysical%20Research%3A%20Solid%20Earth.%202022%3B127%3Ae2021JB023499.

11、Lee CJ, Sugimoto CR, Zhang G, Cronin B. Bias in peer review. Journal of the American Society for Information Science and Technology. 2013;64:2–17.

12、Stokel-Walker C, Van Noorden R. What ChatGPT and generative AI mean for science. Nature. 2023;614:214–216.

13、Sarraju A, Bruemmer D, Van Iterson E, et al. Appropriateness of Cardiovascular Disease Prevention Recommendations Obtained From a Popular Online Chat-Based Artificial Intelligence Model. JAMA. 2023.

14、Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36:1234–1240.

15、Sung M, Jeong M, Choi Y, et al. BERN2: an advanced neural biomedical named entity recognition and normalization tool. Bioinformatics. 2022;38:4837–4839.

16、Russakovsky O, Deng J, Su H, et al. Imagenet large scale visual recognition challenge. International journal of computer vision. 2015;115:211–252.

17、Lin T-Y, Maire M, Belongie S, et al. Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer; 2014:740–755.

18、Krizhevsky A, Sutskever I, Hinton GE. ImageNet Classification with Deep Convolutional Neural Networks. In: Advances in Neural Information Processing Systems.Vol 25. Curran Associates Inc; 2012.

19、He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition.; 2016:770–778.

20、U.S. Food and Drug Administration (FDA). FDA permits marketing of artificial intelligence-based device to detect certain diabetes-related eye problems. https://www.fda.gov/news-events/press-announcements/fda-permits-marketing-artificial-intelligence-based-device-detect-certain-diabetes-related-eye; 2018 Accessed June 13, 2023.

21、National Medical Products Administration (NMPA). Diabetic retinopathy fundus image assisted diagnosis software product approved for marketing. https://www.nmpa.gov.cn/yaowen/ypjgyw/20200810093435157.html; 2020 Accessed June 13, 2023.

22、Abràmoff MD, Lavin PT, Birch M, et al. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ digital medicine. 2018;1:39.

23、Natarajan S, Jain A, Krishnan R, et al. Diagnostic Accuracy of Community-Based Diabetic Retinopathy Screening With an Offline Artificial Intelligence System on a Smartphone. JAMA Ophthalmology. 2019;137:1182–1188.

24、Lin D, Xiong J, Liu C, et al. Application of Comprehensive Artificial intelligence Retinal Expert (CARE) system: a national real-world evidence study. The Lancet Digital Health. 2021;3:e486–e495.

25、Li F, Su Y, Lin F, et al. A deep-learning system predicts glaucoma incidence and progression using retinal photographs. The Journal of Clinical Investigation. 2022;132:e157968.

26、Yan Q, Weeks DE, Xin H, et al. Deep-learning-based Prediction of Late Age-Related Macular Degeneration Progression. Nature machine intelligence. 2020;2:141–150.

27、Chen B, Wen M, Shi Y, et al. Towards training reproducible deep learning models. In: Proceedings of the 44th International Conference on Software Engineering.; 2022:2202–2214. 28 Rajput D, Wang W-J, Chen C-C. Evaluation of a decided sample size in machine learning applications. BMC Bioinformatics. 2023;24:48.

28、Figueroa RL, Zeng-Treitler Q, Kandula S, Ngo LH. Predicting sample size required for classification performance. BMC Medical Informatics and Decision Making. 2012;12:8.