Adaptive Biotechnology Data¶
Cleaning Adaptive Biotechnology Files¶
import_adaptive_file¶
1 2 3 4 5 | import pandas as pd
from tcrdist.repertoire import TCRrep
from tcrdist.adpt_funcs import import_adaptive_file, adaptive_to_imgt
df = import_adaptive_file(adaptive_filename = 'Adaptive2020.tsv')
|
Input¶
rearrangement | extended_rearrangement | bio_identity | amino_acid | templates | frame_type | rearrangement_type | productive_frequency | cdr1_start_index | cdr1_rearrangement_length | cdr2_start_index | cdr2_rearrangement_length | cdr3_start_index | cdr3_length | v_index | n1_index | d_index | n2_index | j_index | v_deletions | n2_insertions | d3_deletions | d5_deletions | n1_insertions | j_deletions | chosen_j_allele | chosen_j_family | chosen_j_gene | chosen_v_allele | chosen_v_family | chosen_v_gene | d_allele | d_allele_ties | d_family | d_family_ties | d_gene | d_gene_ties | d_resolved | j_allele | j_allele_ties | j_family | j_family_ties | j_gene | j_gene_ties | j_resolved | v_allele | v_allele_ties | v_family | v_family_ties | v_gene | v_gene_ties | v_resolved |
GATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGGGGAGGATCTAGACCTACGAGCAGTACTTCGGGCCG | unknown | X+TCRBV28-01+TCRBJ02-07 | na | 1135 | Out | VDJ | naunknown | unknown | unknown | unknown | unknown | 38 | 43 | -1 | 51 | 57 | 64 | 9 | no data | 1 | 9 | 7 | 2 | no data | no data | no data | no data | no data | no data | 02 | no data | TCRBD02 | no data | TCRBD02-01 | no data | TCRBD02-01*02 | 01 | no data | TCRBJ02 | no data | TCRBJ02-07 | no data | TCRBJ02-07*01 | 01 | no data | TCRBV28 | no data | TCRBV28-01 | no data | TCRBV28-01*01 | |
TTGGAGCTGGACGACTCGGCCCTGTATCTCTGTGCCAGCAGCTTGGGTATGGGGACAGCCGCTAACTATGGCTACACCTTCGGTTCG | ATGGGCCCTGGGCTCCTCTGCTGGGCGCTGCTTTGTCTCCTGGGAGCAGGCTCAGTGGAGACTGGAGTCACCCAAAGTCCCACACACCTGATCAAAACGAGAGGACAGCAAGTGACTCTGAGATGCTCTTCTCAGTCTGGGCACAACACTGTGTCCTGGTACCAACAGGCCCTGGGTCAGGGGCCCCAGTTTATCTTTCAGTATTATAGGGAGGAAGAGAATGGCAGAGGAAACTTCCCTCCTAGATTCTCAGGTCTCCAGTTCCCTAATTATAGCTCTGAGCTGAATGTGAACGCCTTGGAGCTGGACGACTCGGCCCTGTATCTCTGTGCCAGCAGCTTGGGTATGGGGACAGCCGCTAACTATGGCTACACCTTCGGTTCGGGGACCAGGTTAACCGTTGTAG | CASSLGMGTAANYGYTF+TCRBV05-04+TCRBJ01-02 | CASSLGMGTAANYGYTF | 1300 | In | VDJ | 0.0012208108813691113 | 135 | 15201 | 18 | 327 | 51 | 30 | 46 | 51 | 58 | 61 | no data | 5 | 5 | no data | 3 | no data | 01 | TCRBJ01 | 02 | 01 | TCRBV05 | 04 | 01 | no data | TCRBD01 | no data | TCRBD01-01 | no data | TCRBD01-01*01 | 01 | no data | TCRBJ01 | no data | TCRBJ01-02 | no data | TCRBJ01-02*01 | 01 | no data | TCRBV05 | no data | TCRBV05-04 | no data | TCRBV05-04*01 |
Output¶
subject | productive_frequency | templates | epitope | cdr3_b_aa | v_b_gene | j_b_gene | valid_cdr3 | cdr3_b_nucseq |
Adaptive2020.tsv | 0.0012208108813691113 | 1300 | X | CASSLGMGTAANYGYTF | TRBV5-4*01 | TRBJ1-2*01 | True | TTGGAGCTGGACGACTCGGCCCTGTATCTCTGTGCCAGCAGCTTGGGTATGGGGACAGCCGCTAiACTATGGCTACACCTTCGGTTCG |
Adaptive2020.tsv | 0.0015044146399640895 | 1602 | X | CASSQPGRTLYEQYF | TRBV14*01 | TRBJ2-7*01 | True | CAGCCTGCAGAACTGGAGGATTCTGGAGTTTATTTCTGTGCCAGCAGCCAACCGGGACGGACCTTGTiACGAGCAGTACTTCGGGCCG |
Loading Adaptive Biotechnology Files¶
1 2 3 4 5 6 7 8 9 10 11 | import pandas as pd
from tcrdist.repertoire import TCRrep
from tcrdist.adpt_funcs import import_adaptive_file, adaptive_to_imgt
df = import_adaptive_file(adaptive_filename = 'Adaptive2020.tsv')
# For larger datasets, make sure compute_distances is set to False,
# see: https://tcrdist3.readthedocs.io/en/latest/bulkdata.html
tr = TCRrep(cell_df = df,
organism = 'human',
chains = ['beta'],
db_file = 'alphabeta_gammadelta_db.tsv', compute_distances = False)
|
Look Up Adaptive Conversion¶
1 2 3 |
"""Lookup *01 IMGT allele corresponding with an Adaptive gene name"""
assert adaptive_to_imgt['human']['TCRBV30'] == 'TRBV30*01'
|