New Record Species of Puntius (Pisces: Cyprinidae) from West Sumatra based on Cytochrome Oxidase1Gene

Biodiversity study on Puntius has been conducted in West Sumatera using a molecular technique. From the genetic analysis using the sequence of CO1 gene, the study discovered: (1) A new record species of Puntius in Diatas Lake, Batang Lembang, Batang Gumanti, Muara Pingai rivers (located in the eastern part of the Bukit Barisan mountain range) which is Barbodes binotatus banksi or B. banksi. (2). A new record of subspecies in Maninjau Lake and its tributary (located in the western part of the Bukit Barisan mountain range) which is Barbodes banksi maninjau. (3) A new record of subspecies in Batang Kuranji, Batang Katik, Batang Tarok and Lubuk Paraku rivers (located in the western part of the Bukit Barisan mountain range) which is Barbodes banksi kuranji. The results of this study add the evidence that the presence of Bukit Barisan mountain range in Sumatra Island contributed to genetic diversity, evolutionary process and speciation mechanism of freshwater fish in Sumatra. It is important to pay attention to the development of district or area in Sumatra. Keywords— biodiversity; CO1 gene; Puntius; Barbodes


I. INTRODUCTION
The genus of Puntius in the Indonesia waters consists of 19 species [1].The generic name of Puntius was made familiar by Hamilton in 1822. Later several synonym names like Barbonymus, Barbus, Barbodes, Systomus, Capoeta, and Hypsibarbus, have used by researchers as the generic names although these have not been accepted [2], [3], [ 4]. The genus Puntius contained some 120 valid species and suspected to be polyphyletic [5]. It is commonly known as Silver Barb or Spotted Barb and one of the most important commercial fish for food and freshwater-aquarium [6], [7]. P. binotatus, the most widely distributed and perhaps most variable species of Puntius in Southeast Asia [8]. It is a native fish of Sumatra, Java and Kalimantan waters. P. binotatus is a benthopelagic species that occur live in medium to large rivers at an altitude of 0-2000 meters above sea level [4]. In the IUCN Red List, this species with Barbodes binotatus as the synonym name assessed as Least Concern due to its wide distribution, ability to occupy some habitats and the lack of any known major widespread threats [9], [10]. Morphological study of P.cf. binotatus from several locations in West Sumatra showed that there were differences in morphological characters between samples from the highlands, middle and lowlands [10]. The results of the analysis on P.cf. binotatus in West Sumatra using the cytochrome b gene estimated that there are cryptic species among them (Roesma, 2017 unpublished). To explore the taxonomy of P.cf. binotatus in West Sumatera and others locations, an analysis was performed using the Cytochrome oxidase subunit 1 (CO1) gene.
Here, we used COI DNA sequences to associate field collected fish of P. cf. binotatus with other that have been obtained by others authors. The barcode region of the COI gene chosen, as it is a conserved region of the gene and there is already information for its use [11]. This particular part of DNA has been shown to be very good for separating specimens at a species level, even when there is cryptic morphology [12], [13], [14]. Which is no less important is that it already exists an international Consortium for the Barcode of Life (CBOL), and data collected by various collaborators deposited in the Barcode of Life Database (BOLD) system [15]. The data serves as a useful reference library to further identify to a particular species. It emphasized that the DNA barcode sequences are allowing us to diagnose taxa through phylogenetic analysis [16]. There were two primary goals of this study. The first to analyse the genetic relationships in Puntius in West Sumatra and the second to collect the barcoding data of Cyprinids from Sumatra.

A. Sample Sources and DNA Extraction
The samples used in the study obtained from several populations. Maninjau Lake (4 individuals) and its tributary (Asam River) (2 individuals), Diatas Lake (2 individuals) and its outlet (Batang Gumanti river) (2 individuals), Batang Lembang river which is an outlet of the Dibawah Lake (4 individual). Populations from other waters which has no connection to the lakes also included. They were populations from Batang Kuranji river (3 individuals), Batang Tarok River (1 individual), Batang Katiak river (3 individuals) and Lubuk Paraku rivers (1 individual) (Fig. 1)

Fig. 1 Sampling areas
The samples collected with cast-nets and backpack electrofishing gear (12 Volt) apparatus following standard procedures [17]. Sampling at each location done for approximately one hour. A piece of the tissue sample used for molecular analysis stored in an Eppendorf tube that already contains 96 % ethanol PA. Individual samples were preserved with formalin 10%. All specimen were preserved in 70% ethanol after which they took to the laboratory at Andalas University in Padang. Identifications based on the principal keys for freshwater fishes [18], [19], and [1] DNA extraction performed on tissue following the standard protocol Kit INVITROGEN PureLinkTM Genomic DNA Mini Kit. The quality and the approximate yield of DNA determined by electrophoresis in 1% agarose gel containing ethidium bromide ran a 90V for 30 minutes and visualized under UV light.

B. Polymerase Chain Reaction (PCR) and DNA Sequencing
The analysis was targeted to the 5' region of the CO-1 gene. The sequence was amplified with forward (FishF1: 5'TCAACCAACCACAAAGACATTGGCAC3'), reverse (FishR1:5'TAGACTTCTGGGTGGCCAAAGAATCA3') primers [20]. Double-stranded templates were amplified in a total 25 µl PCR reaction using Research PCR Eppendorf™ thermocycler. The cycle parameters were: the initial temperature at 95 °C (2.0 min); 35 cycles of denaturing 94°C for 0.5 min; annealing 54°C for 0.5 min; extension 72°C for 1 min; final extension 72°C for 10 min. PCR products visualized on a 2% agarose gel with ethidium bromide staining. The PCR products visualized on 2% agarose gels and the most powerful products selected for sequencing in MacroGen USA DNA Sequencing Laboratory.

C. Data Analysis
Partial CO-1 gene sequences of 52 samples examined. Twenty-four of them were new sequences from Lakes and rivers populations in West Sumatra, and the rest were extracted from Gen Bank, included for out-group species (Table 1).
DNA sequences (R and F) were assembled and edited by visual examination of electro phenograms with (DNA STAR program) [21]. DNA sequences of P.cf. binotatus from West Sumatra compared with observing the similarity DNA sequences in GenBank NCBI http://blastncbi.nlm.nih.gov/Blast. All of them aligned with CLUSTAL X program [22]. Nucleotide sequences of individuals and sites at which they differed were compared and identified with DNA Sequence Polymorphism 5.10 program [23]. Genetic distances between pairs of populations computed by applying the Kimura 2-parameter (K2P) model of sequence evolution from MEGA 6.0 program [24].
The phylogenetic tree recovered according to the distance-based neighbour-joining methods (NJ) [25], maximum parsimony (MP) and maximum likelihood (ML). The statistical significance of branching orders or phylogenetic confidence assessed by the bootstrapping resampling technique (1000 replicates data sets). All phylogenetic analyses were performed using MEGA software [26]. Phylogenetic trees were visualized using Tree View program.

III. RESULTS AND DISCUSSION
Partial CO-1 gene sequences of 52 samples examined. Twenty four of them were new sequences. Out of 650-655 bp necessary taxonomic sequence length, it was able to get 564 bp from a total of them. The sequence analysis revealed average nucleotide frequencies as A 27.3%, C 27.3%, T 29.4% and G 16%. These value of base composition were similar to observations reported in other studies of CO1 in different taxa of Cyprinids [27]. The A+T content (56.7%) in this study higher than G+C (43.3%). The characteristics of gene base compositions in all vertebrate classes have A and T greater than G and C [28]. Base composition calculated across all sample for 1st, 2nd, 3rd + Noncoding and the evolutionary analyses conducted in MEGA 6. In general, the formation of nucleotides T, C, G and A of each genus is almost the same (data not shown). It shows the stability of the base composition of the group of taxa as one of the characteristics of CO1 gene. Considering a total of 564 bp (characters) for analysis, 350 bp (62.05%) were conserved sites, 214 bp (37.95%) were variable sites including 187 bp (87.38%) were parsimony sites, and 27 bp (12.62%) were singleton sites. Kimura two-parameter method was used to the calculated genetic distance from CO1 sequences. The average pair-wise sequence divergence between all samples of P.cf. binotatus in West Sumatra was 0.0%-8.3%. The phylogenetic tree constructed by combining Maximum Likelihood (ML), Neighbour-joining (NJ), Minimum Evolution (ME) and Maximum Parsimony (MP) methods to see the relationship between Puntius species in West Sumatra and Puntius from other regions (Fig. 2.). A total of 21 species (in-group and out-group) with a total of 52 sequences were analysed. Each in construction with a confidence level bootstrapping 1000 times using the MEGA Program 6 [26]. Based on the tree constructed with the four methods from 52 sequences analysed, two main clusters were obtained, supported by highly bootstrap values, 96/97/96/96 for ME / NJ / ML / MP respectively. Each of them consists of four subclusters. All clusters are rooted in the out-group (Rasbora daniconius, Danio choprai, and Botia rostrata).
From the whole tree, it can observe that cluster I separated from cluster II (which consists of Barbus, Barbonymus, Puntius and Systomus) with the sequence divergences 12.5% -23.6%. The results show that there are four genera in cluster II although there is still an unclear separation between P.serana and S. serana as well as between P. orphoides and S. orphoides. Based on [20], [8], [29], and [30] that value of sequence divergences has already shown that members in cluster I should be the species of different genera at the same Family with the members of cluster II. All the P.cf. binotatus in subcluster 1, 2 and 3 of the first group have 11.6% -13.5% sequence divergences to P. binotatus Malaysia. Therefore, with the references of [20], [8], [29], [30], assigned all P.cf. binotatus in the subcluster 1, 2 and 3 of the first cluster as P. binotatus becomes incongruence; they should be from different genera.
In the first cluster of the tree, P. binotatus Malaysia with the accession number JN646096 is the common ancestor of P.cf. binotatus group from West Sumatra and two others Puntius (P. banksi Malaysia with accession number JF781235 and P. binotatus Lampung with accession number JQ665834). In NCBI list Classification of the organism, both of P. binotatus from Malaysia and Lampung (South of Sumatra) record as Barbodes binotatus. Furthermore, P. banksi assigns as B. banksi with a synonym of P. binotatus banksi (B. binotatus banksi) [31]. Because of the sequence divergence between two main clusters were 12.5% -23.6%, assigned of P. binotatus from Malaysia and Lampung and P. banksi from Malaysia as B. binotatus and B. banksi respectively were strongly supported. In the first subcluster, P.cf. binotatus from Diatas Lake, Batang Lembang, Batang Gumanti, Muara Pingai rivers have the sequence divergences 0.00%-0.04% between them and 0.9%-1.3% to P. banksi from Malaysia. The bootstrapping value between these two groups is high, 99/99/99/99 for ME / NJ / ML / MP respectively. The table of percentage of sequence divergence with Pairwise Genetic Distance does not show in this article. Based on the scores of average p-distances for compared fish groups [20], [8], [29], [30] the value of our analysis sequence divergences concludes that all of the P.cf. binotatus in the first subcluster should be a B. banksi, and this is the first record for Sumatra.
As stated earlier, Batang Lembang River is an outlet of the Dibawah Lake while the Batang Gumanti River is an outlet of the Diatas Lake. Both of lakes and outlet rivers have no connection at all. The Diatas and Dibawah Lakes separated from each other by 1.08 miles distance. According to Katili [32], as an active area, the Bukit Barisan mountains range stretching from North to South of Sumatra Island always experience by volcanic tectonics. Such geologic dynamics may form specific gaps at certain times. It was possible that Diatas and Dibawah lakes concatenate in the past. Thus, the genetic mixing of species located two separate regions within a distance may be occurring at any given time so that the genetic similarities between the two populations can maintain. Batang Lembang river (through Batang Sumani river) is one of an inlet of Singarak lake beside Muara Pingai river. Therefore we can conclude that the populations of Puntius in the first subcluster (excluded P. banksi from Malaysia) are sympatric populations although Siatas and Dibawah Lake became separated now. All of that water flowing to the east of Bukit Barisan mountain range.
The member of second subcluster in the first cluster is the P.cf. binotatus populations of Maninjau lake and its tributary. They have sequence divergences of 0.00% -0.004% between them and 7.2% -7.4% to P. banksi Malaysia, 6.7% -7.4% to P.cf. binotatus from Diatas and Dibawah lakes, Batang Lembang, Batang Gumanti, Muara Pingai rivers, 6.6% -9.6% to B. binotatus group. The bootstrapping value between these groups were 93/92/93 for ME / NJ / ML respectively. Based on the references [20], [8], [29], [30], all of that values indicated that P.cf. binotatus populations of lake Maninjau and its tributary was a subspecies, and it was reasonable to propose them as Barbodes banksi maninjau and also become the first report for Sumatra. Fishes of Maninjau lake seems to have specific variations. In a previous study [33], it has reported that there was Rasbora nsp in Maninjau Lake. That species has a high morphological similarity with other Rasboras, but under the phylogenetic analysis using Cytochrome b gene, it showed that it has significant genetic differences.
The member of the third subcluster in the first cluster is the P.cf. binotatus populations from other rivers which has no connection to the Sumatra lakes. That located in the western of Bukit Barisan Mountain range and all of flowing to the west coast. They have sequence divergences of 0.00% -0.002% between them and 6.5% -6.7% to the population from Diatas and Dibawah lakes, Batang Lembang, Batang Gumanti, Muara Pingai rivers and 7.8%-8.3% to P.cf. binotatus populations of Maninjau lake and its tributary.
That population separated into two other subclusters by the highly bootstrapping value (97/97/97 for ME / NJ / ML respectively). Based on the references [20], [8], [29], and [30] that value indicated a subspecies populations from other rivers which has no connection to the Sumatra lakes, and it was reasonable to propose them as Barbodes banksi kuranji.
In general, the taxonomy of Puntius is still confusing because of the morphological closeness that needs to solve by using genetic markers [2], [34]. From previous report [31] and our study we can conclude that with the intensification of research on this genus, the naming and the number of species has changed.
The result of the tree analysis supported by haplotype number. From the 52 sequences data analysis including 28 nucleotides sequences of Genbank, eight haplotypes of P.cf. binotatus from West Sumatra were obtained ( Samples from Maninjau Lake (H 04) taken from different locations around the lake. Maninjau Lake is a caldera lake which is 97.9 km2 wide. Located in the western part of the Bukit Barisan mountains range. The water source of Maninjau lake comes from small rivers from the catchment area around the lake with one outlet (Antokan river) which flows to the west coast. Freshwater fauna that is in this lake absolutely isolated because of there is no connection with other. This lake often experiences by upwelling which is frequently causing massive death to the fish that live in it. The small rivers or tributaries that supply the lake is a shelter for small fish when upwelling occurs. Presumably, this is the case with P. cf binotatus fish in the Asam river which then survives with a suitable haplotype (H 05 and H 06) for its new habitat in the tributaries. It reported [2] that in Asia, the genus Puntius has the most significant number of species among the Cyprinidae and occupies various types of freshwater.
Vice versa, Batang Lembang, and Batang Gumanti rivers as an outlet of Dibawah and Diatas Lake respectively located in the eastern part and its outlet flows to the east coast. The geographical distinction has triggered the emergence of genetic up to species diversity in the island of Sumatra. This also proved by the results of the research which reported that there are variations in the number of fish species found in 11 tributaries of Batang Toru (North Sumatra) as the main river [35]. It also reported that there is genetic variation intra and inter populations of Tor douronensis (Cyprinidae) from 21 rivers in West Sumatra [36]. Based on our study in Sumatra fishes, we considered that it is essential to combine the morphological data and molecular data in taxa grouping. Therefore the determination of taxa can be properly. It is very closely related to the purposes of the conservation and breeding fish, especially for food and ornamental fish because of the success of the crossing is dependent on the genetic similarity.

IV. CONCLUSION
Our study concludes that (1) P.cf. binotatus from Diatas lake, Batang Lembang, Batang Gumanti and Muaro Pingai rivers reasonable be B. binotatus banksi or B. banksi (2) P.cf. binotatus from Maninjau lake and its tributary appropriate as subspecies of Barbodes banksi maninjau. (3) P.cf. binotatus from other rivers which has no connection to the lakes (Batang Kuranji, Batang Katik, Batang Tarok and Lubuk Paraku rivers) propose as a subspecies of Barbodes banksi kuranji. (4) They all a new record. We also recommend to the Government who decides district development to preserve waters in Sumatra as long as each region contributes to species and genetic diversity and evolutionary processes in speciation. All the sequences of this study will be deposit into the International Barcode of Life Data (BOLD) System.