Inputs¶
Data¶
The tcrdist3 standard input is a Pandas DataFrame.
The header and first line of a typical input for a beta-chain analysis would look like this:
subject | epitope | count | v_b_gene | j_b_gene | cdr3_b_aa | cdr3_b_nucseq |
---|---|---|---|---|---|---|
s1 | NP | 1 | TRBV1*01 | TRBJ1-1*01 | CACDSLGDKSSWDTRQMFF | TGTGCCTGTGACTCGCTGGGGGATAAGAGCTCCTGGGACACCCGACAGATGTTTTTC |
Column names reflect the chain under investigation. - a : alpha - b : beta - g : gamma - d : delta
- One or more of the following columns are required and are case-sensitive
- ‘v_a_gene’, ‘v_b_gene’, ‘v_g_gene’, or ‘v_d_gene’
- ‘j_a_gene’, ‘j_b_gene’, ‘j_g_gene’, or ‘j_d_gene’
- ‘cdr3_a_aa’, ‘cdr3_b_aa’, ‘cdr3_g_aa’, or ‘cdr3a_d_aa’
- ‘cdr3_a_nucseq’, ‘cdr3_b_nucseq, ‘cdr3_g_nucseq’, or ‘cdr3a_d_nucseq’
For v_x_gene, include the full IMGT gene name and allele (e.g., TRBV1*01). If you don’t know the allele, use *01. But an allele must be present to infer v_b_genes based on matching one of the id rows in this table.
Tip
Two of each can be supplied for paired analysis. tcrdistances can be calculated without nucleotide sequences, but some other features require them.
- The following is required.
- ‘count’
- The following are optional:
- ‘epitope`
- ‘subject’
- The following are usually inferred from germline reference v-gene but can be supplied by the user in some advanced use-cases only!
- ‘cdr1_a_aa’, ‘cdr1_b_aa’, ‘cdr1_g_aa’, or ‘cdr1_d_aa’
- ‘cdr2_a_aa’, ‘cdr2_b_aa’, ‘cdr2_g_aa’, or ‘cdr2_d_aa’
- ‘pmhc_a_aa’, ‘pmhc_a_aa’, ‘pmhc_a_aa’, or ‘pmhc_a_aa’ (pmhc = cdr 2.5)
Tip
CDR2.5, the pMHC-facing loop between CDR2 and CDR3, are referred to in tcrdist3 as pmhc_a and phmc_b, respectively.
Arguments¶
chain(s)¶
Most classes and functions in tcrdist3 require specification of the appropriate t cell receptor chains:
- [‘alpha’], [‘beta’], [‘gamma’], or [‘delta’] for single-chain analysis,
- [‘alpha’, ‘beta’] or [‘gamma’, ‘delta’] for paired-chain analyis
organism¶
Most classes and functions in tcrdist3 require specification of an appropriate host organism. Currently only ‘human’ or ‘mouse’ are supported. This is required because reference TCR genes are organism specific.
db_file¶
The db_file is used by tcrdist3 to supply updated information about reference TCR germline sequences.
Tip
Getting new database files: Reference json https://github.com/repseqio/library-imgt/releases Data coming from IMGT server may be used for academic research only, provided that it is referred to IMGT®, and cited as “IMGT®, the international ImMunoGeneTics information system® http://www.imgt.org (founder and director: Marie-Paule Lefranc, Montpellier, France).”