VDJtools Data¶
An example of importing from VDJtools formated input.
Import VDJtools formated input .tsv
or .tsv.gz file
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | import pandas as pd import numpy as np import os from tcrdist.paths import path_to_base from tcrdist.vdjtools_funcs import import_vdjtools from tcrdist.repertoire import TCRrep # Reformat vdj_tools input format for tcrdist3 vdj_tools_file_beta = os.path.join(path_to_base, 'tcrdist','data','formats','vdj.M_15_CD8_beta.clonotypes.TRB.txt.gz') df_beta = import_vdjtools( vdj_tools_file = vdj_tools_file_beta , chain = 'beta', organism = 'human', db_file = 'alphabeta_gammadelta_db.tsv', validate = True) assert np.all(df_beta.columns == ['count', 'freq', 'cdr3_b_aa', 'v_b_gene', 'j_b_gene', 'cdr3_b_nucseq','valid_v', 'valid_j', 'valid_cdr3']) # Can be directly imported into a TCRrep instance. tr = TCRrep( cell_df = df_beta[['count', 'freq', 'cdr3_b_aa', 'v_b_gene', 'j_b_gene']], chains = ['beta'], organism = 'human', compute_distances = False) |
Input file must have columns ['count', 'freq', 'cdr3aa', 'v', 'j','cdr3nt']
.
This is a powerful place to start because there are many tools for converting outputs of popular tools to this format. See section on ‘Formats supported for conversion’ in the VDJtools DOCS for more details on conversion to this format from
- MiTCR,
- MiGEC,
- IgBlast (MIGMAP),
- ImmunoSEQ,
- VDJdb,
- Vidjil, and
- MiXCR