Skip to content

HIV-1 HIV-2 SIV Genomic Analysis

The goal is to uncover genetic differences that may explain variations in viral behavior, pathogenicity, and potential interactions with host genomes of HIV-1, HIV-2, and SIV.

  • python
  • stats
  • bio
  • ds
  • da

Last modified:

Link to GitHub

Code by Michela Tjan Effendie and Qisong Zheng.

Research Question

Introduction

HIV-1 and HIV-2 are two types of the Human Immunodeficiency Virus (HIV) with differences in their geographical distribution, disease progression, and genetic composition (Shah et al., 2023). HIV-1 is the dominant strain worldwide and is associated with a faster progression to AIDS, while HIV-2 is mainly restricted to West Africa and progresses more slowly (Williams et al., 2023). Despite their differences, the two viruses share similar structural and functional proteins, making them ideal candidates for comparative analysis.

Simian Immunodeficiency Virus (SIV), the precursor of HIV, is a retrovirus found in over 40 species of African nonhuman primates. Cross-species transmission of SIV to humans is believed to have led to the emergence of HIV-1 and HIV-2, with chimpanzees and sooty mangabeys serving as the respective sources (Sharp & Hahn, 2011; Keele et al., 2006).

Data Sources

Protein and RNA sequences for HIV-1, HIV-2, and SIV were downloaded in FASTA and GenBank formats from the National Center for Biotechnology Information (NCBI) database. A total of 200 sequences were randomly selected for each virus. The sequences were validated for format integrity and redundancy using a custom Python script.

Methods

Sequence Alignment

Pairwise global alignment was conducted to identify conserved and divergent regions among HIV-1, HIV-2, and SIV proteins using Biopython’s pairwise2 module (Source). Alignment scores and regions were analyzed to assess structural and accessory protein conservation. RNA genome sequences were aligned using Biopython’s Align.PairwiseAligner. Outputs included detailed alignment files and summary scores.

GC Content Analysis

GC content analysis was performed on RNA using a custom script that utilized Biopython’s gc_fraction utility ([Benjamini & Speed, 2012]Benjamini & Speed, 2012). The script computed the percentage of guanine (G) and cytosine (C) nucleotides in the coding regions of each protein. Results were visualized with Matplotlib and Seaborn to highlight the differences between HIV-1, HIV-2, and SIV datasets. Statistical comparisons were conducted using SciPy.

Amino Acid Composition Analysis

Amino acid composition analysis examines the proportion and frequency of amino acids within a protein sequence (Source). Protein sequences were analyzed with Biopython’s ProteinAnalysis module to calculate amino acid composition. Percentages for each amino acid were aggregated and visualized as bar charts to compare HIV-1 and HIV-2 proteins. Text files summarizing amino acid frequencies per protein were also generated.

Vpu and Vpx Gene Content

The Vpu (Viral protein U) and Vpx (Viral protein X) genes are accessory genes in certain retroviruses, including HIV-1, HIV-2, and SIV (Melim & Bieniasz, 2012). These genes encode proteins that play key roles in viral replication, immune evasion, and pathogenesis. GenBank files for HIV-1, HIV-2, and SIV were parsed to identify the presence of Vpu and Vpx genes. The locations and sequences of these genes were extracted and documented in text files.

Results

Sequence Alignment

Pairwise global alignment was performed to identify conserved and divergent regions between the protein sequences (Figs. 1, 2, 3, 4). This analysis highlights the evolutionary relationships among HIV-1, HIV-2, and SIV proteins by comparing structural and accessory protein regions. Structural proteins, such as Reverse Transcriptase and Protease, exhibited a higher degree of conservation due to their critical roles in viral replication and function.

Further analysis For instance, alignments such as AYX72568.1 vs. ACO81762.1, which achieved a score of 52, revealed moderately conserved motifs essential for enzymatic activity. These conserved regions are likely necessary to maintain viral replication efficiency across diverse hosts.

On the other hand, accessory proteins showed significant divergence which reflect their adaptability to distinct host environments and immune responses. For example, the AYX72568.1 vs. CAB63460.1 alignment, with a lower score of 31, demonstrated sparse conservation. This variability is particularly pronounced in regions associated with host adaptation which shows potential that enables the virus to evade immune detection or optimize replication in specific hosts.

Alignments involving RNA sequences, such as HIV-1 and SIV (Fig. 3) or HIV-2 and SIV (Fig. 4), further revealed similar patterns where structural proteins remained conserved, while accessory regions displayed evolutionary divergence. These findings suggest that while structural proteins are under strong purifying selection to preserve their functions, accessory proteins are more prone to positive selection to allow for adaptation to distinct ecological and immunological pressures.

Figure 1. Sample HIV-1 and HIV-2 protein sequence alignment results Figure 1. Sample HIV-1 and HIV-2 protein sequence alignment results

More figures

Figure 2. Sample RNA HIV-1 and HIV-2 alignment results Figure 2. Sample RNA HIV-1 and HIV-2 alignment results

Figure 3. Sample RNA HIV-1 and SIV alignment results Figure 3. Sample RNA HIV-1 and SIV alignment results

Figure 4. Sample RNA HIV-2 and SIV alignment results Figure 4. Sample RNA HIV-2 and SIV alignment results

GC Content Analysis

The results of GC content analysis revealed distinct patterns of GC content distribution among the viral types, HIV-1, HIV-2, and SIV (Figs. 5, 6, 7, 8). The variation in GC content may hint at selective pressures from different host immune responses or cellular environments.

Further analysis HIV-1 RNA exhibited moderate GC content with slightly higher variability, as seen across the samples. This variability suggests that HIV-1 may have undergone adaptations to thrive in a broader range of host environments which again, reflects its global spread and host versatility.

HIV-2 RNA demonstrated lower and more consistent GC content across all samples. This consistency aligns with the narrower host range of HIV-2 which indicates evolutionary constraints that have stabilized its genomic composition. Similarly, SIV displayed a GC content pattern closely resembling HIV-2; therefore, further supporting its evolutionary and functional relationship with HIV-2. Based on the bar charts, there is higher variability in HIV-1 compared to the more uniform GC content in HIV-2 and SIV.

Figure 5. GC content distribution sample 1 Figure 5. GC content distribution sample 1

More figures

Figure 6. GC content distribution sample 2 Figure 6. GC content distribution sample 2

Figure 7. GC content distribution sample 3 Figure 7. GC content distribution sample 3

Figure 8. GC content distribution sample 4 Figure 8. GC content distribution sample 4

Amino Acid Composition Analysis

The results reveal that both HIV-1 and HIV-2 share similar median compositions for several amino acids, such as alanine (A), glycine (G), and serine (S). This finding suggests common structural or functional requirements. However, HIV-2 exhibits broader variability in amino acid composition especially for residues like isoleucine (I), leucine (L), and proline (P), which may reflect evolutionary adaptations to specific host environments or differences in selective pressures.

Further analysis Distinct differences between the two viral types were observed in residues such as lysine (K) and threonine (T), where HIV-1 displayed slightly higher composition percentages. These differences may influence protein charge or hydrophilicity which has a potential to affect viral replication and immune evasion strategies. The analysis revealed several outliers in both groups, particularly for hydrophobic residues like leucine (L) and polar residues like asparagine (N). These outliers could correspond to specific proteins or functional domains unique to each virus.

Figure 9. Amino acid composition comparison of HIV-1 and HIV-2 Figure 9. Amino acid composition comparison of HIV-1 and HIV-2

Vpu and Vpx Gene Content

These findings highlight significant genomic differences between HIV-1, HIV-2, and SIV, particularly in the presence and distribution of the Vpu and Vpx genes. The absence of Vpu in HIV-2 and the lack of Vpx in HIV-1 are consistent with known distinctions between these viruses.

Further analysis In the case of HIV-1, samples 1, 2, and 3 displayed identifiable sequences for the Vpu gene, with the sequences being located consistently within defined genomic regions (Figs. 10, 11, 12). These findings suggest the functional preservation of Vpu across most HIV-1 samples. However, sample 4 of HIV-1 did not yield any results for Vpu or Vpx which could indicate the absence of these genes or significant genomic rearrangements in this sample.

For HIV-2, results show identifiable Vpx gene sequences in samples 2, 3, and 4 (Figs. 13, 14, 15). The gene sequences were localized within specific genomic positions which shows the unique evolutionary presence of Vpx in HIV-2, a gene that distinguishes it from HIV-1. HIV-2 sample 1 failed to produce any detectable Vpu or Vpx sequences which suggests variability or loss of these genes in certain isolates.

The SIV samples, samples 1, 3, and 4 all contained identifiable Vpx gene sequences located within defined genomic regions while sample 4 contained Vpu gene sequence (Figs. 16, 17, 18, 19). This is consistent with the fact that Vpu is found in some SIV isolates while Vpx is found in most SIV isolates.

HIV-1 figures

Figure 10. HIV-1 Vpu and Vpx content sample 1 Figure 10. HIV-1 Vpu and Vpx content sample 1

Figure 11. HIV-1 Vpu and Vpx content sample 2 Figure 11. HIV-1 Vpu and Vpx content sample 2

Figure 12. HIV-1 Vpu and Vpx content sample 3 Figure 12. HIV-1 Vpu and Vpx content sample 3

HIV-2 figures

Figure 13. HIV-2 Vpu and Vpx content sample 2 Figure 13. HIV-2 Vpu and Vpx content sample 2

Figure 14. HIV-2 Vpu and Vpx content sample 3 Figure 14. HIV-2 Vpu and Vpx content sample 3

Figure 15. HIV-2 Vpu and Vpx content sample 4 Figure 15. HIV-2 Vpu and Vpx content sample 4

SIV figures

Figure 16. SIV Vpu and Vpx content sample 1 Figure 16. SIV Vpu and Vpx content sample 1

Figure 17. SIV Vpu and Vpx content sample 2 Figure 17. SIV Vpu and Vpx content sample 2

Figure 18. SIV Vpu and Vpx content sample 3 Figure 18. SIV Vpu and Vpx content sample 3

Figure 19. SIV Vpu and Vpx content sample 4 Figure 19. SIV Vpu and Vpx content sample 4

Synthesis

This study highlights the evolutionary divergence and functional adaptations of HIV-1, HIV-2, and SIV through sequence alignment, GC content analysis, amino acid composition, and accessory gene comparisons. Structural proteins showed high conservation across all viruses which shows their importance as therapeutic targets, while accessory proteins exhibited significant divergence which contributes to host-specific adaptations. GC content analysis revealed variability in HIV-1 due to its broader host range, while greater uniformity in HIV-2 and SIV which have narrower host ranges. Amino acid composition and gene content analyses emphasized distinct genomic features, such as the absence of Vpu in HIV-2 and Vpx in HIV-1 which supports their unique evolutionary paths.

Future Directions

Future research should expand sequence datasets and include functional analyses to validate genomic findings. Comparative studies across additional SIV lineages could be useful to analyze cross-species transmission and adaptation. Further investigation into accessory gene functions and host-specific interactions could expand our understanding of viral evolution and development of targeted therapies.