一行python画Manhattan plot与QQ plot

之前介绍过曼哈顿图与QQ图的画法,但自己画终究还是有点麻烦,有很多数据的时候就很头疼,于是自己写了一个非常简单的python package,一行代码画好对齐的Manhattan plot 与 QQ plot,并基于一个给定的滑动窗口自动检测top SNP并注释。

Package: gwaslab

安装方法: pip install gwaslab

使用方法:myplot = gl.mqqplot(insumstats,"CHR","POS","PVALUE")

当前版本:0.0.5

依赖的包:scipy,numpy,matplotlib,pandas,seaborn

(注意:非专业开发人员,完全用爱发电,自用的scripts改成的包,难免有bug,欢迎在github上指正,或提意见或建议, https://github.com/Cloufield/gwaslab )

使用示例:

实例数据下载: http://jenger.riken.jp/result

ID 1 Atrial Fibrillation

读入数据

输入文件是一般的gwas sumstats就可以,有chr,pos,pvalue三列就可以画:

import pandas as pd
import gwaslab as gl

#读取输入文件
insumstats = pd.read_csv("AF_1000G_rsq09_FINAL.txt.gz","\\s+")

本次使用的示例数据如下

使用mqqplot函数画图:

输入gwas sumstats的DataFrame,并指定染色体,碱基位置,p值的列名即可:

myplot = gl.mqqplot(insumstats,"CHR","POS","PVALUE")

#log
Basic settings:
  - Genome-wide significance level is set to 5e-08 ...
  - Raw input contains 5018048 variants...
Start conversion and QC:
  - P values are being converted to -log10(P)...
  - Sanity check: 0 variants with P value outside of (0,1] will be removed...
  - Sanity check: 0 na/inf/-inf variants will be removed...
  - Maximum -log10(P) values is 134.014573525917 .
Plotting 5018048 variants:
  - Found 14 significant variants with a sliding window size of 500 kb...
  - Skip annotating
  - Created Manhattan plot successfully!
  - Created QQ plot successfully!

lambda gc 自动计算并注释, qq plot与manhattan plot的y轴对应,,看着是舒服一点了,

可是突出的那个loci太高,其他的loci都快看不见了,有点不协调啊,显著的loci也没有注释,怎么办?

那么加几个option好了:

1.用cut 截断一下,cut以上的轴按cutfactor成比例缩小,这样就能在不丢失信息的情况下看清所有显著的loci

2.anno选项是用来自动注释lead snp的,默认基于一个500kb的滑动窗口,可以通过windowsize改变

gl.mqqplot(insumstats,"CHR","POS","PVALUE",cut=20,cutfactor=20,anno=True)

#log
Basic settings:
  - Genome-wide significance level is set to 5e-08 ...
  - Raw input contains 5018048 variants...
Start conversion and QC:
  - P values are being converted to -log10(P)...
  - Sanity check: 0 variants with P value outside of (0,1] will be removed...
  - Sanity check: 0 na/inf/-inf variants will be removed...
  - Maximum -log10(P) values is 134.014573525917 .
  - Minus log10(P) values above 20 will be shrunk with a shrinkage factor of 20...
Plotting 5018048 variants:
  - Found 14 significant variants with a sliding window size of 500 kb...
  - Annotating using column CHR:POS...
  - Created Manhattan plot successfully!
  - Created QQ plot successfully!

嗯嗯,看着舒服多了,

如果 只指定anno=True, 那么注释就只是chr: pos,当然也可以指定用于注释的列名,这里用SNP这一列注释:

gl.mqqplot(insumstats,"CHR","POS","PVALUE",cut=20,cutfactor=20,anno="SNP")

颜色不好看怎么办:

用colors选项,传个颜色的list进去,给图换个颜色,

qqplot的颜色,为了保持视觉统一,设定为曼哈顿图chr1的颜色,

例如传个彩虹进去:

myplot = gl.mqqplot(insumstats,"CHR","POS","PVALUE",
                    cut=20,cutfactor=20,anno=True,
                    colors=["#ff0000","#fc4444","#fc6404","#fcd444","#8cc43c","#029658","#1abc9c","#5bc0de","#6454ac","#fc8c84"])

其他可以选的 option

加title,调大小,什么的,目前还没有优化,后续会精简一下

mqqplot(insumstats,
          chrom,
          pos,
          p,
          scaled=False,
          cut=0,
          cutfactor=10,
          cut_line_color="#ebebeb",
          windowsizekb=500,
          anno=None,
          sig_level=5e-8,
          sig_line_color="grey",
          suggestive_sig_level=5e-6,
          title =None,
          mtitle=None,
          qtitle=None,
          figsize =(15,5),
          fontsize = 10,
          colors = ["#000042", "#7878BA"],
          layout="qqm",
          verbose=True,
          repel_force=0.03,
          gc=True,
          save=None,
          saveargs={"dpi":300,"facecolor":"white"}        
          )

保存的话 save=“./yourpath/myplot.png” ,或者save=True就行

分别画Manhattan plot 与 QQ plot

当然也可以分开画 mplot

mymplot = gl.mplot(insumstats,"CHR","POS","PVALUE",cut=20,cutfactor=10,anno=True)

qqplot

参考:

https://github.com/Cloufield/gwaslab

GWAS Gallery: gwaslab

发表评论

Fill in your details below or click an icon to log in:

WordPress.com 徽标

您正在使用您的 WordPress.com 账号评论。 注销 /  更改 )

Twitter picture

您正在使用您的 Twitter 账号评论。 注销 /  更改 )

Facebook photo

您正在使用您的 Facebook 账号评论。 注销 /  更改 )

Connecting to %s