Having determined that Gm-AID exhibits similar WRC specificity to other AID orthologs (Figs. 3B and 9A), we reasoned that if the functional impairment of AID is a purposeful event in the evolution of Gadidae species, there ought to have been a lower degree of evolutionary pressure to maintain enrichment of WRC motifs in the CDR regions of their IgV genes. We annotated the IgH loci in the Atlantic cod genome, which was then used to extract and annotate the IgVH of Gadidae species. We also included IgVH of Japanese puffer fish (Tr-IgVH), nurse shark (Gc-IgVH), human (Hs-IgVH), mouse (Mm-IgVH), chicken (Gg-IgVH), South African toad (Xl-IgVH), catfish (Ip-IgVH), salmon (Ss-IgVH), and zebrafish (Dr-IgVH). The motif enrichment was calculated as the ratio of the average normalized index (i.e., the number of WRC/GYW or WGCW motifs divided by the numbers of analyzed nucleotide) in CDRs vs. FRs. We excluded CDR3 since the VDJ recombination is responsible for forming CDR3.

All variable (V) regions for the Atlantic cod were downloaded from NCBI and aligned against known variable regions of other bony fish (Supplementary Table 7). Sequences with similarity to other variable regions were blasted to gadMor2 genome with near to no restrictions. The blast output was sorted on scaffold/LG. Then, within each scaffold/LG, sequences were sorted on start position and IgV regions were extracted using bedtools software [136]. These sequences were aligned against reported cod variable regions in Mega software [137]. Reciprocal blast was used to filter out any non-IgV sequence. Short sequences that did not cover the entire variable region or contained many insertions or deletions were discarded. Cod IgV CDRs were mapped from T. rubripes Ig gene variable regions [138, 139]. The Atlantic cod IgVH sequences were used to retrieve the IgVH regions of other Gadidae species (the European Nucleotide Archive (ENA) accession number: PRJEB12469 and the Dryad repository: ).

For WRC motif analysis, Japanese puffer fish IgVH (Tr-IgVH), and nurse shark IgVH (Gc-IgVH) sequences were obtained from NCBI (Supplementary Table 7). The nurse shark complementarity-determining regions (CDRs) were mapped from Tr-Ig gene variable regions [138, 139]. Hs-IgVH, mouse IgVH (Mm-IgVH), chicken IgVH (Gg-IgVH), South African toad IgVH (Xl-IgVH), Ip-IgVH, salmon IgVH (Ss-IgVH), Dr-IgVH, and Gm-IgVH sequences were obtained from IMGT (the international ImMunoGeneTics information system) database ( ). [140,141,142,143,144,145,146,147,148]. For these sequences, the CDRs and framework regions (FRs) were identified using IMGT database. In these analyses, the number of motifs were counted in each region using Python (Version 3.8) [149]. For WRC/GYW motifs, TGC, TAC, AGC, AAC, GCA, GTA, GCT, and GTT and for WGCW motifs AGCA, AGCT, TGCA, and TGCT were counted. Then, the sum of WRC/GYW or WGCW motifs for each region was divided to the number of nucleotides analyzed for that given region to normalize for the variation in the length of each region. The average of these normalized WRC/GYW or WGCW indexes were calculated for CDRs and FRs. The enrichment of the motifs in CDRs was estimated by dividing the average index of CDR 1 and 2 by the average index of FR 1, 2, and 3. Also, the GC content of the coding sequences was retrieved from Codon and Codon-Pair Usage Tables (CoCoPUTs) server [150]. This database is available on

We used RAxML package version 8.2.9. First the best substitution model was selected. The GTRCAT substitution model (i.e., the General Time Reversible model with the CAT model of rate heterogeneity) gave us the highest ML in the model test runs. Then, the initial rearrangement settings (i.e., -i) and the number of categories (i.e., -c) were calculated. The best ML tree and bootstrap values were estimated using -i 10 and -c 55. Therefore, ancestral sequences were predicted using the GTRCAT substitution model, -i 10, -c 55, and the best ML tree obtained in this thesis or the species tree previously published.

The datasets supporting the conclusions of this article are included within the article, its additional supplementary figures, and supplementary data files. The raw data used for enzyme assay analysis (e.g., quantitated enzyme assay gels and excel files) and reagents are available by request to the corresponding author. The accession numbers for the aicda sequences of the 67 analyzed teleost species as well as the teleost Ig sequences used for WRC enrichment analyses are available in supplementary table 7. The Gm-AID cDNA sequence reported in this study is available at Genbank with the accession number OP856785 [158].

Comparison of the aicda genomic structure amongst vertebrates. Supplementary Figure 2. Comparison of the aicda synteny amongst vertebrates. Supplementary Figure 3. Atlantic cod AID purification and enzymatic characterization. Supplementary Figure 4. Expression and testing of Gm-AID produced in HEK293T cells. Supplementary Figure 5. Deciphering the basis of the absolute catalytic death of the polar cod AID. Supplementary Figure 6. Amino acid alignment of extant AIDs used for ASR analyses and predicted ancestral sequences. Supplementary Figure 7. Determination of the basic biochemical properties of resurrected ancestral AIDs to determine conditions for measurement of catalytic efficiency. Supplementary Table 1. Comparison of DNA interaction with substrate binding grooves on the surface of AID orthologs. Supplementary Table 2. Comparison of Gm-AIDH136 residue in interaction with -1 position nucleotide upstream of the target dC and total interactions with substrate to its equivalent residue in other AID orthologs. Supplementary Table 3. WRC/GYW enrichment in complementarity determining regions (CDRs) vs. frameworks (FRs) of IgVH genes of various Gadidae and vertebrate species. Supplementary Table 4. WGCW enrichment in complementarity determining regions (CDRs) vs. frameworks (FRs) of IgVH genes of various Gadidae and vertebrate species. Supplementary Table 5. AID hotspot abundance in the entire IgVH genes and GC content of annotated complete protein coding genes (CDSs) of various Gadidae and vertebrate species. Supplementary Table 6. The sequence of primers used in this study. Supplementary Table 7. GenBank accession number of the teleost aicda and Ig genes used in this study.

