HUBEI AGRICULTURAL SCIENCES ›› 2025, Vol. 64 ›› Issue (7): 203-206.doi: 10.14088/j.cnki.issn0439-8114.2025.07.035

• Biological Engineering • Previous Articles     Next Articles

Optimization of genotype imputation for low-depth sequencing data and performance analysis of regression models

XIANG Chong, CHEN Can   

  1. School of Data and Information, Changjiang Polytechnic, Wuhan 430070, China
  • Received:2025-04-14 Online:2025-07-25 Published:2025-08-22

Abstract: A new method suitable for analyzing low-depth sequencing genomic data was established by optimizing genotype imputation algorithms and screening optimal regression models.The results showed that compared to the pre-optimization algorithm, the accuracy of the optimized genotype imputation algorithm increased from 95% to 98%. Meanwhile, parameter tuning and efficient algorithm selection reduced the single imputation time from 24 hours to 12 hours, significantly improving processing efficiency.For continuous phenotypic analysis (e.g., quantitative traits in GWAS), the ridge regression model and linear regression model performed well. At 1.0× sequencing depth, their MSEs were 0.07 and 0.08, and Accuracies were 0.82 and 0.80, respectively.When handling classification problems (e.g., genomic selection), the Logistic regression model demonstrated significant advantages due to its probabilistic modeling characteristics. This model showed good Classification performance (AUC=0.90), significantly outperforming the Linear regression model (AUC=0.85).

Key words: low-depth sequencing data, genotype imputation, ridge regression models, performance analysis, linear regression model, Logistic regression model

CLC Number: