首页 > 技术文章 > RNA-seq建库技术 | RNA sequence library construction

leezx 2020-07-04 18:38 原文

为什么必须了解建库技术?

转录组有哪些分类?

有哪些常见的RNA-seq建库技术?各有什么特点

活体真核细胞里有哪些RNA,各自有什么功能,占比多少?

PCR对RNA-seq的影响?PCR free是什么意思?

为什么RNA-seq总有一部分reads无法比对到转录组?

RNA-seq有哪些常规分析?


 

为什么必须了解建库技术?

俺们玩高通量测序的,连分析的数据怎么来的都不知根知底,怎么配得上高手的称号。

 

转录组有哪些分类?

直接看国内市场份额第一的科技服务公司的转录组服务:

转录调控测序 - 诺禾致源

  • 全长转录组(三代)
  • 真核有参转录组测序
  • 真核无参转录组测序
  • 原核转录组测序
  • 宏转录组测序
  • 医学转录组测序
  • 单细胞转录组测序
  • 比较转录组与泛转录组测序
  • lncRNA测序
  • circRNA测序
  • small RNA测序

再看一下俺们的CPOS的测序服务:

Sample Requirement and Submission

 

有哪些常见的RNA-seq建库技术?各有什么特点

two commonly used RNA-seq library preparation protocols:

  • poly-A-tailed mRNA selection (PA)
  • ribo-depletion (RD)

PA showed better performance in the expression-based classification

PA protocol better represented total RNA compared to the RD protocol.

RD protocol detected a higher number of non-coding RNA features and had better alignment efficiency.

RD protocol also recovered more known fusion-gene events, although variability was seen in fusion gene predictions.

 

RD protocol captures a wide repertoire of transcripts [15, 16] and works efficiently with degraded RNA [12, 15].

The high number of intron mapping reads in RD datasets may also be advantageous in understanding pre-mRNA dynamics and the post-transcriptional impact of microRNAs [17].

 

PA libraries to contain less intronic reads than RD libraries [12] thereby offering a more cost-effective solution for gene expression studies [18].

The PA method also appears to outperform the RD protocol in detecting differentially expressed genes [15, 18].

 

活体真核细胞里有哪些RNA,各自有什么功能,占比多少?

List of RNAs - 大致功能里面也描述了 - 太多了吓死人啦

主要三种RNA的含量:Section 11.6Processing of rRNA and tRNA - 这本分子细胞生物学可以一读。

Approximately 80 percent of the total RNA in rapidly growing mammalian cells (e.g., cultured HeLa cells) is rRNA, and 15 percent is tRNA; protein-coding mRNA thus constitutes only a small portion of the total RNA.

 

PCR对RNA-seq的影响?PCR free是什么意思?

在文库制备过程中,我们通过标准的PCR来扩增随机的基因组片段。对于这些模板而言,有的二级结构复杂,有的则热稳定性不好,这些因素都会影响PCR扩增效率。因此,并非所有基因组序列都能在PCR扩增文库中同等体现。

在高通量测序中,之所以引入PCR方法,是为了对微量DNA样本进行扩增,提高文库产量,放大DNA荧光信号,提高测序准确度。但是PCR在带来一些好处的同时,也会带来一些问题。例如,PCR聚合酶有一定的比例会引入错误,然后通过PCR, 这些引入的错误会持续累积下来,造成假阳性的变异,难以区分和识别。此外,PCR聚合酶具有一定的扩增偏向性,一些区域,尤其是高GC或者低GC,二级结构等区域,扩增效率低,难以覆盖到;还会引入大量错误的InDel;同时带来大量的Duplication,造成DNA数据量的浪费,增加测序成本。

the high GC-content of the missing genes could have caused problems in the PCR amplification in next-generation sequencing library preparation, as GC-rich genes are extremely difficult to amplify [20].

 

为什么RNA-seq总有一部分reads无法比对到转录组?

Our analyses revealed that meaningful biological information can be found when further exploring unmapped reads. For instance, it is possible to discover sequences that are either absent or misassembled in the reference genome, and sequences that indicate infection or sample contamination. In this study we also propose strategies to aid the capture and interpretation of this information from unmapped reads.

可以深挖一下,做人RNA-seq数据分析的时候,没有考虑到人种的问题,能不能搞个每个人种的附加转录本,本质就是hg19和hg38的补充,以及注释的完善。

人种特异的转录本,需要高效的算法,按人种合并到一起,这些转录本是否有人种特异的表征,肤色?

添加这些novel的转录本后mapping rate有多少提升?能否最终把unmapped rate降到5%以下?

找一个三代数据最多的人种,组装出较为完整的转录本

将所有组装出来的转录本归类,赋予命名简称代码

这些转录本是怎么产生的,是从哪里被转录出来的,探寻原因,hg19和hg38哪里有问题,是否能因此找到基因组人种特异的区域?

这些转录本在cancer中的差异表达,有没有novel的发现?

 

Google search:unmapped reads human RNA

Comprehensive assembly of novel transcripts from unmapped human RNA-Seq data and their association with cancer

CAFU: a Galaxy framework for exploring unmapped RNA-Seq data

 

RNA-seq有哪些常规分析?

常规分析

i) expression of protein coding and non-coding RNAs,

ii) differential gene expression analysis,

iii) pathway analysis,

iv) fusion gene detection

v) expressed variant calling.

 

参考

Google search:RNA-seq library construction

The impact of RNA sequence library construction protocols on transcriptomic profiling of leukemia

 

Introduction to RNA-Seq for Researchers - YouTube - 整个过程讲得比较全面

Get more from your core: RNA-Seq techniques in action | Illumina Video - 13分

 

PCR-free的测序文库制备

智造微课】PCR-Free到底有啥“佛瑞”?

 

Exploring the unmapped DNA and RNA reads in a songbird genome

From trash to treasure: detecting unexpected contamination in unmapped NGS data

 

转录组测序(RNA-seq)详细建库步骤与原理 - 里面的几个图不错

RNA sequencing: the teenage years

A survey of best practices for RNA-seq data analysis

Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud

 

推荐阅读