VDJtools Data

An example of importing from VDJtools formated input.

Import VDJtools formated input .tsv or .tsv.gz file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import pandas as pd
import numpy as np
import os
from tcrdist.paths import path_to_base
from tcrdist.vdjtools_funcs import import_vdjtools
from tcrdist.repertoire import TCRrep

# Reformat vdj_tools input format for tcrdist3
vdj_tools_file_beta = os.path.join(path_to_base, 'tcrdist','data','formats','vdj.M_15_CD8_beta.clonotypes.TRB.txt.gz')
df_beta = import_vdjtools(   vdj_tools_file = vdj_tools_file_beta ,
                    chain = 'beta',
                    organism = 'human',
                    db_file = 'alphabeta_gammadelta_db.tsv',
                    validate = True)
assert np.all(df_beta.columns == ['count', 'freq', 'cdr3_b_aa', 'v_b_gene', 'j_b_gene', 'cdr3_b_nucseq','valid_v', 'valid_j', 'valid_cdr3'])

# Can be directly imported into a TCRrep instance.
tr = TCRrep(
    cell_df = df_beta[['count', 'freq', 'cdr3_b_aa', 'v_b_gene', 'j_b_gene']], 
    chains = ['beta'], 
    organism = 'human', 
    compute_distances = False)

Input file must have columns ['count', 'freq', 'cdr3aa', 'v', 'j','cdr3nt'].

This is a powerful place to start because there are many tools for converting outputs of popular tools to this format. See section on ‘Formats supported for conversion’ in the VDJtools DOCS for more details on conversion to this format from

  • MiTCR,
  • MiGEC,
  • IgBlast (MIGMAP),
  • ImmunoSEQ,
  • VDJdb,
  • Vidjil, and
  • MiXCR