Trees

Interactive tree diagrams can be easily produced in tcrdist3. To automate the processes decsribed in more detail on the CDR3 Motifs page, initiate a TCRtree class as shown below. The result is a html page with an Interactive Hierdiff Tree. Hovering on the nodes reveals information about each node including a sequence logo.

I am happy to use the defaults

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
"""
An example showing how to create an interactive
tree from a sample of mouse TCRs 
"""
import os
import pandas as pd
from tcrdist.repertoire import TCRrep
from tcrdist.tree import TCRtree

df = pd.read_csv("dash.csv").sample(100, random_state=1).reset_index(drop = True)

tr = TCRrep(cell_df = df, 
            organism = 'mouse', 
            chains = ['beta'], 
            db_file = 'alphabeta_gammadelta_db.tsv')

tcrtree = TCRtree(tcrrep = tr, 
                  html_name = 'dash.mouse.b.tree.html')

tcrtree.build_tree()

assert os.path.isfile('dash.mouse.b.tree.html')

I’d like to tweak a default parameter

There are three core processes executed by TCRtree.build_tree(): * hcluster_diff - hierarchically clusters TCRs, and tallies how many TCRs in each node of the hiearchical tree have particular catagorical label values (i.e. number that are CD8+ vs CD4+, or number coming from a pre-vaccine vs. post-vaccine sample).

  • member_summ - summarizes cluster meta data such as the % of TCRs with a given V gene
  • plot_hclust (part of hierdiff package) - make the D3 interactive tree

When invoked via .build_tree(), each of these functions is controlled by keyword arguments (‘kwargs’) dictionaries stored as TCRree default attributes. Attribute values for each can be found in the docstrings for ?TCRtree. Here we mention some of the most important. First ‘x_cols’ passed to hcluster_diff

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import os
import pandas as pd
from tcrdist.repertoire import TCRrep
from tcrdist.tree import TCRtree

df = pd.read_csv("dash.csv").sample(100, random_state=1).reset_index(drop = True)

tr = TCRrep(cell_df = df, 
            organism = 'mouse', 
            chains = ['beta'], 
            db_file = 'alphabeta_gammadelta_db.tsv')

tcrtree = TCRtree(tcrrep = tr, 
      html_name = 'dash.mouse.b.tree.html')

tcrtree.default_hcluster_diff_kwargs['x_cols'] = ['epitope']

tcrtree.default_member_summ_kwargs['addl_cols'] : ['subject', 'epitope']

tcrtree.default_plot_hclust_props['alpha_col'] = 'pvalue'
tcrtree.default_plot_hclust_props['alpha'] = 1.0

tcrtree.build_tree()

I want more control

If you want full control, you can pass an entire dictionary to TCRtree.default attributes prior to running build_tree(). For instance remove any of ‘tootips_cols’ to simplify what is displayed when one hovers over a tree node.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import os
import pandas as pd
from tcrdist.repertoire import TCRrep
from tcrdist.tree import TCRtree

df = pd.read_csv("dash.csv").sample(100, random_state=1).reset_index(drop = True)

tr = TCRrep(cell_df = df, 
                        organism = 'mouse', 
                        chains = ['beta'], 
                        db_file = 'alphabeta_gammadelta_db.tsv')

tcrtree = TCRtree(tcrrep = tr, 
            html_name = 'dash.mouse.b.tree.html')

tcrtree.default_hcluster_diff_kwargs = \
        {'clone_df': None,
         'pwmat': None,
         'x_cols': ['epitope'],
         'Z': None,
         'count_col': 'count',
         'subset_ind': None,
         'hclust_method': 'complete',
         'optimal_ordering': True,
         'test_method': 'fishers'}

tcrtree.default_member_summ_kwargs = \
        {'key_col': 'neighbors_i',
        'count_col': 'count',
        'addl_cols': ['subject'],
        'addl_n': 1}

tcrtree.default_plot_hclust_props = \
        {'title': '',
        'alpha_col': 'pvalue',
        'alpha': 0.05,
        'tooltip_cols': ['subject',
        'mean_dist',
        'pct_dist_75',
        'pct_dist_50',
        'pct_dist_25',
        'fuzzy_simpson_diversity_75',
        'fuzzy_simpson_diversity_50',
        'fuzzy_simpson_diversity_25',
        'cdr3_b_aa',
        'v_b_gene',
        'j_b_gene',
        'svg_beta',
        'svg_raw_beta',
        'ref_size_beta',
        'ref_unique_beta',
        'percent_missing_beta']}

tcrtree.build_tree()

I want a paired alpha and beta tree

This is accomodated. But you may have to decide which chain’s distance matrix you wish to use for the purpose of clustering.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import os
import pandas as pd
from tcrdist.repertoire import TCRrep
from tcrdist.tree import TCRtree

df = pd.read_csv("dash.csv").sample(100).reset_index(drop = True)

tr = TCRrep(cell_df = df, 
            organism = 'mouse', 
            chains = ['alpha','beta'], 
            db_file = 'alphabeta_gammadelta_db.tsv')

tcrtree = TCRtree(tcrrep = tr, html_name = 'dash.mouse.ab.tree.html')

tcrtree.build_tree()

assert os.path.isfile('dash.mouse.ab.tree.html')

Changing the background

Perhaps you have a larger dataset and you don’t need all the SVG logos.

I prefer to write my own tree from scratch