Phylogenetic Assignment of Named Global Outbreak (PANGO) lineages are similar to formal scientific species names and are assigned according to a suite of detailed evolutionary and epidemiological criteria.
New PANGO lineages are evaluated by a committee based on criteria such as novel evolutionary features, transmissibility, pathogenicity, notable increase in frequency, new movement across regions, etc.
PANGO lineage labels are assigned chronologically with the following:
A separate system is used to indicate recombinant (hybrid) variants, in this case beginning with X followed by another letter(s) (proceeding in sequencing XA, XB,... then XAA, XAB...) and containing no numerical suffix except for unambiguous descendants (e.g., XB, XB.1).
PANGO labels have a maximum depth of three layers of descendants, and then the label rolls over to a new prefix. Descendants of lineages with three layers (tertiary) suffixes are assigned to the next available prefix in alphabetical order (following the criteria above). This new prefix acts as an "alias" for the name of the parent lineage.
B.Q.1.1 is an alias for B.1.1.5184.108.40.206.220.127.116.11.1
This naming structure can lead to confusion, especially when discussing different variants in non-technical settings or when trying to communicate information to the public.
What could be helpful would be for the WHO to designate BA.2 and BA.5 - the two primary Omicron parent lineages - as Pi and Rho. That way, at least it would be easy to quickly identify the ancestry of a lineage, even when the nomenclature is confusing.
Where this has become especially concerning is regarding the XBB recombinant descendants. In February 2023 two new PANGO lineages appeared, EG and EK. What is not clear from these names is that they are in fact descendants of XBB, but since PANGO has a 3 layer limit, the XBB family now has descendants beyond these 3 layers.
However, while XBB descendants are getting new alias, as are other variant descendants, that are NOT recombinant lineages:
See the problem? Can you easily distinguish between EG.1 and EF.1? Do you know which is recombinant and which is not?
It has been proposed for PANGO to include a prefix or suffix to designate recombinant lineages.
Nextstrain.org, provides real-time snapshots of evolving pathogen populations. It uses interactive visualizations to enable exploration of curated datasets and analyses which are continually updated when new genomes are available. This offers a powerful pathogen surveillance tool to virologists, epidemiologists, public health officials, and community scientists.
Nextstrain introduced informal clade designations for SARS-CoV-2 on 4 March 2020, largely to aid internal discussions and to create URL links allowing ‘automatic zoom’ to an area of the tree that was of interest. These clades names were ad-hoc letter-number combinations (e.g. A2a) and were never intended to be a permanent naming system (and never visible by default).
In June 2020 we put forth an initial Nextstrain clade naming strategy. This basic strategy of flat “year-letter” names was borne out of work with seasonal influenza, where the nested names of 3c2.A1b (etc…) can become unwieldy. In the “year-letter” scheme, years are there to make it easy to know what’s being discussed in ~5 years when, for example, clade
20A is referenced. Our June strategy called for naming of a clade when it reached >20% global frequency for more than 2 months.
However, as the pandemic progressed, lack of international travel made it so that no clades beyond the initial clades
20C made it past 20% global frequency. Instead, we’ve seen “regional” clades that hit appreciable frequency in different continent-level regions of the world.
Consequently, we propose an updated strategy, where major (year-letter) clades are named when any of the following criteria are hit:
The rapid dominance and increased diversity of the Omicron variant and its constituent sub-lineages has triggered another update of the Nextstrain clade naming guidelines and labels. Once again, we propose a backwards-compatible update, this time to allow more flexible and faster designation of clades as new variants appear and spread.
However, with the dominance and diversification of the Omicron family of variants, we are seeing trends in rising variants that are notable, of public and scientific interest, and genetically distinct, yet do not yet meet previous criteria to be labeled. In order to allow easier reference to such variants at an earlier time, we propose an update to the current guidelines. This will allow designation of a clade if it shows consistent >0.05 per day growth in frequency where it is circulating, in addition to reaching >5% regional frequency.
Our revised set of guidelines will therefore now be to designate a clade when any of the following criteria are met:
The GISAID Initiative promotes the rapid sharing of data from all influenza viruses and the coronavirus causing COVID-19. Established in 2008, it was created in response to the H5N1 influenza pandemic to provide open access to genomic data of influenza viruses. This includes genetic sequence and related clinical and epidemiological data associated with human viruses, and geographical as well as species-specific data associated with avian and other animal viruses, to help researchers understand how viruses evolve and spread during epidemics and pandemics.
GISAID does so by overcoming disincentive hurdles and restrictions, which discourage or prevented sharing of virological data prior to formal publication.
The Initiative ensures that open access to data in GISAID is provided free-of-charge to all individuals that agreed to identify themselves and agreed to uphold the GISAID sharing mechanism governed through its Database Access Agreement.
All bonafide users with GISAID access credentials agreed to the basic premise of upholding a scientific etiquette, by acknowledging the Originating laboratories providing the specimens, and the Submitting laboratories generating sequence and other metadata, ensuring fair exploitation of results derived from the data, and that all users agree that no restrictions shall be attached to data submitted to GISAID, to promote collaboration among researchers on the basis of open sharing of data and respect for all rights and interests.
After COVID-19 was identified as a newly emerging viral respiratory disease and the first hCoV-19 genomes were made available on 10th January 2020 to the scientific community on GISAID’s newly established EpiCoV™ platform, prestigious institutions around the globe came together by contributing experts to GISAID'S team of curators to ensure vast amounts of data could be reviewed and curated in real-time and annotated, prior to release. Their remarkable contribution remains key to the unprecedented speed enabling real-time progress in the understanding of the new COVID-19 disease and in the research and development of candidate medical countermeasures.
GISAID maintains the world's largest repository of SARS-CoV-2 sequences, with over 14,500,000 sequences submitted.