Clc genomics workbench number of reads too low

3/17/2023

Short reads are effective for applications aimed at counting the abundance of specific sequences, identifying variants within otherwise well-conserved sequences, or for profiling the expression of particular transcripts. Longer read lengths are also essential for capturing insertions and deletions or for sequencing regions with a lot of redundancy, such as those that contain transposons. For whole genome sequencing or species identification, longer reads are preferable. If assembling the reads into the reconstructed DNA sequence is like doing a puzzle, long reads equate to larger puzzle pieces. Read length describes the average length of the sequencing reads produced (i.e., the number of base pairs sequenced) and is sequencing-platform specific. Over-sequencing can also lead to a build-up of low frequency sequencing errors. Assigning too many reads to such a sample wastes expensive flow cell real estate unnecessarily. In this case, the cell count is irrelevant when determining the number of reads needed because the sequence can be obtained from very few reads. For instance, a Jurkat cell line has only one TCR-beta rearrangement. If the sample, or the RNA source, is expected to have a restricted repertoire, it may be unnecessary to sequence very deeply. This means for samples with generally lower cell counts, there is an opportunity to pool more samples per lane (or flow cell), with the limit being the number of available molecular ids or barcodes for the chain of interest. Therefore, for a sample containing 100,000 cells, a minimum of 500,000 reads should be allocated. We generally recommend allocating a minimum of 5-10x the number of reads per the number of cells in the sample. Cells are lost during all processing steps along the way, so it is important to have an estimation of the final cell count of the sample prior to RNA extraction. The average human white blood cell count ranges from 4,000 to 11,000 cells/µL. Sufficient reads need to be allotted for each sample to cover the potential diversity based on the input cell counts. In an immune repertoire sequencing application, the sequencing coverage is determined based on the starting cell numbers of the sample and an estimate of sample diversity, if known. During a targeted VDJ sequencing application, the VDJ rearrangement, and in particular the CDR3 region, is captured and sequenced directly so that no additional bioinformatic assembly of short read fragments is required (beyond potentially read stitching, discussed in the “Sequencing Read Length” section below). Thus, each cell represents a de novo rearrangement. Immune repertoire sequencing is distinct from other sequencing applications because each cell has the potential to contain its own unique VDJ sequence.

For instance, if you want to capture rare RNA sequences within the transcriptome, deeper sequencing will be needed to detect low abundance variants. It is necessary to determine the sequencing coverage needed for your application to minimize the probability of false results. Coverage is variable within a sample and typical coverage ranges from 30 or less to >1000 reads for typical human genetic and cancer applications, respectively. In NGS, sequence reads need to cover each base many times to increase the confidence individual sequencing read errors are statistically irrelevant when they are outnumbered by correct reads. Therefore, if each nucleotide is sequenced multiple times, the base call shared by the majority of reads (the consensus) will reflect the correct nucleotide. This approach is employed under the assumption that sequencing errors are random. Because sequencing is error prone, higher coverage is used to increase confidence in the bases called. Sequence coverage (depth) describes the average number of reads that align to a known reference at a particular location within the target transcript or genome. For additional background information, please see our intro to NGS post. This post will help you navigate these considerations, particularly with regards to immune repertoire sequencing projects. Because the cost of sequencing will vary based on these decisions, it’s important to plan your experiment for the appropriate amount of sequencing data required to answer your experimental questions. These considerations include: the depth of the sequencing coverage, the length of the sequencing reads, whether to conduct single-end versus paired-end sequencing, and multiplexing options. There are several factors to consider when planning a next generation sequencing (NGS) experiment.

0 Comments

I'm James. This is my year of travel.

Clc genomics workbench number of reads too low

Leave a Reply.

Author

Archives

Categories