Whole genome sequencing (WGS) has opened up a lot of new possibilities for virus surveillance, however, many applications have remained unexploited in the context of routine surveillance, which is the core business of national reference centres and public health institutes such as Sciensano. The virus surveillance aims to monitor the circulating strains by collecting viral genomic data, characterising the viruses and if possible to link it to patient data. It is in particular important to evaluate the pathogenicity, vaccine and antiviral drug susceptibility of these circulating strains. WGS enables tracking viral outbreaks and estimating the virus spread in a given population, how fast the virus is mutating, as well as the impact of genetic modifications on human disease. This thesis aims to explore the added value and challenges of this genomic approach for virus surveillance and how to overcome these challenges in order to receive the highest benefit from these approaches. Within this thesis, we were mainly focused on respiratory samples in the influenza surveillance and wastewater samples in the SARS-CoV-2 surveillance.
Regarding the influenza genomic surveillance, we have first evaluated the possible added value of using WGS to identify viruses with one or more mutations that are associated with antiviral drug resistance. Due to the extensive use of neuraminidase inhibitors to treat influenza, the antiviral influenza drug susceptibility has long focussed on the neuraminidase segment. We showed that with the emergence of new antiviral drugs that target other segments, there is an increased need to obtain information about the whole influenza genome and we evaluated how WGS could be implemented to detect drug resistance mutations in clinical influenza virus isolates.
Secondly, we assessed how WGS of clinical samples can improve the routine surveillance of circulating influenza strains in humans. Indeed, the hemagglutinin segment has been the principal target region in classical influenza surveillance programmes, besides neuraminidase. Consequently, relatively little information is available about the other segments. Much needed improvement of the influenza genetic surveillance can be provided by using WGS, which can facilitate inferring potential links between genomic data from the whole genome and disease and host characteristics. In this thesis, a new way of classifying the influenza viruses based on the whole genome was proposed. This may allow an improved vaccine strain selection, but also for the future next-generation antiviral drugs and vaccines that will not solely focus on neuraminidase and hemagglutinin. Moreover, mutations across the whole genome and reassortments could be detected and linked to patient data with significant associations as a result. Furthermore, because of the high diversity within influenza subtypes, a new approach was proposed that classifies the influenza viruses based on their phylogenetic relatedness and reduces the viral genetic background before analysing the mutations in relation to the patient data. Besides the advantage of obtaining the whole genome in one reaction, WGS also offers the opportunity to sequence patient-derived virus population at sufficiently high depths to identify low-frequency variants present in a viral quasispecies. However, due to experimental errors from the PCR and NGS, it is a challenge to distinguish these low-frequency variants from the experimental errors while considering the limiting circumstances of a routine setting where it is improbable that samples will be sequenced multiple times in order to identify more easily the experimental errors. Therefore, we proposed a general approach to identify these low-frequency variants that ensures high-quality results and remains feasible using clinical samples in routine surveillance. Although the approaches were successfully developed in this thesis, the results are presented as a proof of concept because of the limited number of influenza samples that were available within the Belgian influenza dataset. The challenge of having a middle-sized collection will probably be encountered by most countries due to the cost of WGS. This highlights the need for a public worldwide database that contains patient data that is linked in a harmonised way with genomic data. This enables the analysis of results obtained at a local level, and compare them at the global level.
Not only the influenza surveillance on clinical samples can benefit from WGS, WGS can also benefit the SARS-CoV-2 surveillance. We focused on monitoring SARS-CoV-2 using wastewater surveillance. First, we looked at how the global effort to sequence SARS-CoV-2 whole genomes could be an improvement over the current surveillance based on polymerase chain reaction techniques through the design and in silico evaluation of primers and probes while considering a broad spectrum of variants. Therefore, to deal with this challenge we propose an approach that allows the evaluation of the in silico specificity of the assay based on publicly available WGS data combined with minimal experimental testing to evaluate the in vitro performance of the assay for respiratory and wastewater samples. Moreover, wastewater samples that contain human faeces have the advantage that it includes multiple variants and reflects the circulating strains in a given population at a specific time. Therefore, by developing an analytical sequencing strategy that identifies and measures all circulating variants in a sample, a good global picture of the epidemiological spread and evolution of circulating strains is obtained without a priori knowledge. In this thesis, we evaluated whether high-throughput sequencing of a sample that has been enriched by PCR and that targets the whole SARS-CoV-2 genome, would be able to identify and quantify low-frequency variants. To start to address this question, an in silico dataset has been constructed by mixing wild-type sequencing data obtained by PCR enrichment and by introducing mutations of interest in raw wild-type sequencing data. The SARS-CoV-2 B.1.1.7 lineage was used as a case study. The use of such in silico datasets to mimic the diversity in SARS-CoV-2 variants in wastewater, allowed the development of a workflow and the set-up of minimal quality criteria in order to take full advantage of the opportunities of NGS to try and define the population of SARS-CoV-2 and its variants present in wastewater. This will enable trying out this approach on real wastewater samples in the near future.