Population structure identification of Turkmen and Darehshori horses using PCA, DAPC, and SPC methods

Document Type : Research Paper

Authors

1 MSc Student, Department of Animal Science, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran.

2 Professor, Department of Animal Science, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran.

3 Assistant Professor, Department of Animal Science, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran.

4 Assistant Professor, Animal Sciences Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Karaj, Iran.

5 Assistant Professor, Department of Animal Science, Faculty of Agriculture, University of Zanjan, Zanjan, Iran.

Abstract

Objective
Conservation of the genetic diversity of indigenous animals is very important. For the sustainable use of genetic resources, it is necessary to first study the genetic structure of populations. The main goals of this research were to identify the population structure of Turkmen and Darehshori horses using dense SNP markers and to compare the effectiveness of PCA, DAPC, and SPC methods in clustering these populations.
Materials and methods
For this purpose, 67 Turkmen and 39 Darehshori horses were genotyped using Illumina EquineSNP70 BeadChip. After applying quality control steps, five Turkmen horses and one Darehshori horse were removed. Then, the structure of populations was identified by three methods of principal component analysis (PCA), discriminant analysis of principal components (DAPC), and superparamagnetic clustering (SPC). These methods do not depend on previous assumptions and make it possible to analyze very large genome databases without prior knowledge of individual ancestry. These methods are also very fast and efficient.
Results
This study compared the efficiency of these three clustering methods in identifying population structures. All three methods were successful in separating the two breeds, and Turkmen and Darehshori breeds were grouped into separate genetic groups. The difference is that the DAPC method only separated the two main populations, but the PCA and SPC methods could identify several subpopulations in each breed. The results of this study showed that the SPC method for studying the population structure of indigenous breeds with unknown information can be more useful than other methods. Therefore, using this method, a suitable program can be designed to conserve and use genetic resources.
Conclusions
PCA, DAPC, and SPC methods were able to successfully identify the genetic structure of Turkmen and Darehshori breeds, and in general, it can be said that the information obtained from dense SNP markers can be a powerful tool for identifying the population structure of indigenous breeds.

Keywords


رحمانی نیا جواد، میرائی آشتیانی سیدرضا، مرادی شهربابک حسین (1394) بررسی ساختارهای جوامع و خرده جوامع دامی به روش خوشه‌بندی شبکه‌ای بدون نظارت با استفاده از نشانگرهای ژنتیکی متراکم. علوم دامی ایران 46، 287-277.
زرگر محمدرضا، فیاضی جمال، بیگی نصیری محمدتقی، مرادی شهربابک حسین (1397) بررسی ژنومی ساختار جمعیتی و ارتباط فیلوژنتیکی گاومیش نژاد خوزستانی. پژوهش‌های علوم دامی 28، 194-181.
سیدشریفی رضا، بادبرین سجاد، خمیس آبادی حسن، هدایت ایوریق نعمت، سیف دواتی جمال (1398) بررسی ساختار ژنتیکی و دقت انتساب افراد به پنج جمعیت اسب با استفاده از نشانگرهای ریزماهواره. پژوهش‌های تولیدات دامی 10، 126-120.
سیدشریفی رضا، بادبرین سجاد، هدایت ایوریق نعمت، ساورسفلی سیما، سیف دواتی جمال، خمیس آبادی حسن (1398) بررسی ساختار ژنتیکی و روابط فیلوژنی اسب‌های کاسپین، عرب و تالشی. پژوهش‌های علوم دامی ایران 11، 232-223.
عبدلی محمد، زندی محمدباقر، هرکی نژاد طاهر، خلیلی مسعود (1400) بررسی ساختار ژنتیکی اسب‌های بومی ایران با استفاده از نشانگرهای ریزماهواره. تولیدات دامی 23، 163-155.
عزیزی زهرا، مرادی شهربابک حسین، مرادی شهربابک محمد (1396) مقایسه روش‌های PCA و DAPC در تجزیه و تحلیل ساختار جمعیتی گاومیش های ایران با تراشه‌های اسنیپ 90k. علوم دامی ایران 48، 161-153.
عسکری ناهید، باقی زاده امین، محمدآبادی محمدرضا (1389) مطالعه تنوع ژنتیکی در چهار جمعیت بز کرکی راینی با استفاده از نشانگرهای ISSR. ژنتیک نوین 5، 56-49.
محمدی فر آمنه، فقیه ایمانی سید علی، محمدآبادی محمدرضا، سفلایی محمد (1392) تأثیر ژن TGFb3 بر ارزش‌های فنوتیپی و ارثی صفات وزن بدن در مرغ بومی استان فارس. بیوتکنولوژی کشاورزی  5(4)، 136-125.
مریدی میثاق، مسعودی علی‌اکبر، واعظ ترشیزی رسول (1391) مطالعه ساختار ژنتیکی اسب‌های بومی ایران با استفاده از توالی D-loop ژنوم میتوکندری. علوم دامی ایران 43، 182-172.
مقصودی صابرمحمد، مهربانی یگانه حسن، نجاتی جوارمی اردشیر، یوسفی مشعوف نوید (1396) بررسی ساختار جمعیت و شناسایی نواحی تحت انتخاب در ژنگان اسب های کرد و عرب ایرانی. علوم دامی ایران 48، 438-429.
مولادوست کیومرث، حسینی سیدصفدر، مقدسی رضا (1398) مدیریت تقاضای واردات اسب در ایران. پژوهش در حسابداری و علوم اقتصادی (3) 17، 13-24.
هدایت ایوریق نعمت، آزادمرد الهام، سیدشریفی رضا، نیک بین سعید، شکوری میرداریوش، خلخالی ایوریق رضا (1398) بررسی تنوع ژنتیکی جمعیت اسب های شمالغرب ایران با استفاده از نشانگرهای ریزماهواره ای. بیوتکنولوژی کشاورزی 11، 50-35.
References
Abdoli M, Zandi MB, Harkinezhad T et al. (2021) Genetic structure survey of Iranian native horse breeds by microsatellite markers. J Anim Pro 23, 155-163. (In Persian).
Askari N, Baghizadeh A, Mohammadabadi MR (2010) Study of genetic diversity in four populations of Raeini cashmere goat using ISSR markers. Modern Genet J 5 (2), 49-56 (In Persian).
Askari N, Baghizadeh A, Mohammadabadi MR (2008) Analysis of the genetic structure of Iranian indigenous Raeni cashmere goat populations using microsatellite markers. Biotechnol 2 (3), 1-4.
Azizi Z, Moradi Shahrbabak H, Moradi Shahrbabak M (2017) Comparison of PCA and DAPC methods for analysis of Iranian buffalo population structure using SNPchip90k data. Iran J animal Sci 48, 153-161. (In Persian).
Blatt M, Wiseman S, Domany E (1996) Superparamagnetic clustering of data. Physical review letters 76, 3251.
Campoy JA, Lerigoleur-Balsemin E, Christmann H et al. (2016) Genetic diversity, linkage disequilibrium, population structure and construction of a core collection of Prunus avium L. landraces and bred cultivars. BMC Plant Biol 16, 1-15.
Colli L, Milanesi M, Talenti A et al. (2018) Genome-wide SNP profiling of worldwide goat populations reveals strong partitioning of diversity and highlights post-domestication migration routes. Genet Sel  50, 1-20.
Ding C, He X (2004) K-means clustering via principal component analysis. In: Proceedings of the twenty-first international conference on Machine learning. pp. 29.
Dinno A, Dinno MA (2018) Package ‘paran’. CRAN.
Eusebi PG, Martinez A, Cortes O (2020) Genomic tools for effective conservation of livestock breed diversity. Diversity 12(1), 8.
Gao X, Starmer J (2007) Human population structure detection via multilocus genotype clustering. BMC Genet 8, 1-11.
García-Girón J, García P, Fernández-Aláez M et al. (2019) Bridging population genetics and the metacommunity perspective to unravel the biogeographic processes shaping genetic differentiation of Myriophyllum alterniflorum DC. Sci Rep 9, 1-10.
Ghasemi M, Baghizadeh A, Abadi MRM (2010) Determination of genetic polymorphism in Kerman Holstein and Jersey cattle population using ISSR markers. Aust J Basic Appl Sci 4 (12), 5758-5760.
Greenbaum G, Templeton AR, Bar-David S (2016) Inference and analysis of population structure using genetic data and network theory. Genetics 202, 1299-1312.
Hassan F-u, Khan MS, Saif-ur-Rehman M et al. (2019) Genetic diversity among some horse breeds in Pakistan. Pak J Zool 51, 1203-1209.
Hedayat-Evrigh N, Azadmard E, Seyed Sharifi R et al. (2020) Investigation of genetic diversity of Iran northwest horses using microsatellite markers. Agric Biotechnol J 11, 35-50. (In Persian).
 Holland SM (2008) Principal components analysis (PCA). Department of Geology, University of Georgia, Athens, GA, 1-12.
Jemaa SB, Boussaha M, Mehdi MB et al. (2015) Genome-wide insights into population structure and genetic history of tunisian local cattle using the illumina bovinesnp50 beadchip. BMC Genom 1, 1-12.
Jeon J-Y, Choi J-S, Byun H-G (2016) Implementation of Elbow method to improve the gases classification performance based on the RBFN-NSG algorithm. J Sens Sci Technol 25, 431-434.
Jolliffe I (2003) Principal component analysis. Technometrics 45, 276.
Jombart T, Collins C (2015) A tutorial for discriminant analysis of principal components (DAPC) using adegenet 2.0. 0. London: Imperial College London, MRC Centre for Outbreak Analysis and Modelling.
Jombart T, Devillard S, Balloux F (2010) Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet 11, 1-15.
Jombart T, Kamvar ZN, Collins C et al. (2018) Package ‘adegenet’. CRAN.
Karimi K, Strucken EM, Moghaddar N et al. (2016) Local and global patterns of admixture and population structure in Iranian native cattle. BMC Genet 17, 1-14.
Kassambara A (2017) Practical guide to cluster analysis in R: Unsupervised machine learning. Sthda.
Kassambara A, Mundt F (2017) Package ‘factoextra’. Extract and visualize the results of multivariate data analyses 76.
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.
Khadka R (2010) Global horse population with respect to breeds and risk status. In: Department of Animal Breeding and Genetics. Swedish University of Agricultural Sciences.
Khamisabad H, Badbarin S, Seyedsharifi R (2020) Genetic structure and assignment tests of Kurdish horse based on microsatellite markers. Mod Genet 14, 337-344.
Kijas JW, Lenstra JA, Hayes B et al. (2012) Genome-wide analysis of the world's sheep breeds reveals high levels of historic mixture and strong recent selection. PLoS Biol 10(2)
Laliotis GP, Avdi M (2017) Genetic diversity assessment of an indigenous horse population of Greece. Biotechnol Anim Husb 33, 81-90.
Lavine BK, Mirjankar N (2006) Clustering and classification of analytical data. Encyclopedia of Analytical Chemistry: Applications, Theory and Instrumentation.
Liu N, Zhao H (2006) A non-parametric approach to population structure inference using multilocus genotypes. Hum Genomics 2, 1-12.
Maghsoodi SM, Mehrabani Yeganeh H, Nejati Javaremi A et al. (2017) Investigating population structure and identifying signatures of selection in Iranian Kurdish and Arabian horses. Iran J animal Sci 48, 429-438. (In Persian).
Marwal A, Sahu AK, Gaur R (2014) Molecular markers: Tool for genetic analysis, in animal biotechnology. Elsevier 289-305.
Mohammadabadi M, Bordbar F, Jensen J et al. (2021) Key genes regulating skeletal muscle development and growth in farm animals. Animals 11 (3), e835.
Mohammadabadi MR (2017) Inter-simple sequence repeat loci associations with predicted breeding values of body weight in Kermani sheep. Genet millenn 14 (4), 4383-4390.
Mohammadabadi MR, Esfandyarpoor E, Mousapour A (2017) Using inter simple sequence repeat multi-loci markers for studying genetic diversity in Kermani sheep. J Res Develop 5 (2), e154.
Mohammadifar A, Mohammadabadi MR (2011) Application of microsatellite markers for a study of Kermani sheep genome. Iran J animal Sci 42 (4), 337-344.
Mohammadifar A, Faghih Imani SA, Mohammadabadi MR, Soflaei M (2014) The effect of TGFb3 gene on phenotypic and breeding values of body weight traits in Fars native fowls. Agric Biotechnol J 5 (4), 125-136.
Mohammadifar A, Mohammadabadi M (2018) Melanocortin-3 receptor (MC3R) gene association with growth and egg production traits in fars indigenous chicken. Malays Appl Biol 47 (3), 85-90.
Moladoust K, Hosseini SS, Moghadasi R (2020) Horse import demand management in Iran. Research in Accounting and Economic Sciences 17(3), 13-24. (In Persian).
Moridi M, Masoudi Aa, Vaez torshizi R (2012) A study of the genetic structure of Iranian native horses using mitochondrial DNA sequence. Iran J animal Sci 43, 172-182. (In Persian).
Neuditschko M (2011) A whole-genome population structure analysis within cattle breeds. Department of Veterinary Sciences. Ludwig-Maximilians-University München.
Neuditschko M, Maxa J, Russ I et al. (2010) Spinnet: a new tool to study the population structure with a genome-wide SNP survey. In: Proceedings of the 9 th World Congress on Genetics Applied to Livestock production Leipzig, Germany.
Newman ME (2006) Modularity and community structure in networks. Proc Natl Acad Sci U.S.A. 103, 8577-8582.
Petersen JL, Mickelson JR, Cleary KD et al. (2014) The American Quarter Horse: population structure and relationship to the Thoroughbred. J Hered 105, 148-162.
Petersen JL, Mickelson JR, Cothran EG et al. (2013) Genetic diversity in the modern horse illustrated from genome-wide SNP data. PLoS One 8, e54997.
Pham DT, Dimov SS, Nguyen CD (2005) Selection of K in K-means clustering. Proceedings of the Institution of Mechanical Engineers, Part C: J Mech Eng Sci  219, 103-119.
Purcell S, Neale B, Todd-Brown K et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81, 559-575.
Rahimi-Mianji G, Nejati-Javaremi A, Farhadi A (2015) Genetic diversity, parentage verification, and genetic bottlenecks evaluation in Iranian Turkmen horse.        Russ J Genet 51(9), 916-924.
Rahmaninia J, Miraei-Ashtiani SR, Moradi Shahrbabak H (2015) Unsupervised clustering analysis of population and subpopulation structure using dense SNP markers. Iran J animal Sci 46, 277-287. (In Persian).
Reddy CK (2018) Data clustering: Algorithms and applications. Chapman and Hall/CRC.
Reich D, Price AL, Patterson N (2008) Principal component analysis of genetic data. Nat Genet 40, 491-492.
Sadeghi R, Moradi Shahrbabak M, Miraei Ashtiani SR et al. (2019) Genetic diversity of Persian Arabian horses and their relationship to other native Iranian horse breeds. J Hered 110, 173-182.
Seyedsharifi R, Badbarin S, Hedayat N et al. (2019a) Investigation of the genetic structure and phylogenic relationships of Caspian, Arabic and Taleshi horses. Iranian Journal of Animal Science Research 11, 223-232. (In Persian).
Seyedsharifi R, Badbarin S, Khamisabadi H et al. (2019b) Study of genetic structure and accuracy of assignment of individuals to five horse populations using microsatellite markers. Research on Animal Production 10, 120-126. (In Persian).
Tsafrir D, Tsafrir I, Ein-Dor L et al. (2005) Sorting points into neighborhoods (SPIN): data analysis and visualization by ordering distance matrices. Bioinformatics 21, 2301-2308.
Tsiafouli MA, Drakou EG, Orgiazzi A et al. (2017) Optimizing the delivery of multiple ecosystem goods and services in agricultural systems. Frontiers Media.
Visser C, Lashmar SF, Van Marle-Köster E et al. (2016) Genetic diversity and population structure in South African, French and Argentinian Angora goats from genome-wide SNP data. PLoS One 11, e0154353.
Zargar M, Fayazi J, Beigi M et al. (2018) Genomic study of population structure and phylogenetic relationship of Khuzestani buffaloes. Journal of Animal Science Research 28, 181-194. (In Persian).