Hello tcrdist3

tcrdist3 can be run interactively. One way to get started is to try a few examples. Examples use the data file (dash.csv).

Hello tcrdist3

This is a hello world video for tcrdist3. It’s an easy test to make sure your installation is working. It also computes 3,686,400 pairwise paired chain distances in less than a few seconds, which isn’t a bad place to start.

Try it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
"""
If you just want a 'tcrdistances' using pre-set default setting.

    You can access distance matrices:
        tr.pw_alpha     - alpha chain pairwise distance matrix
        tr.pw_beta      - alpha chain pairwise distance matrix
        tr.pw_cdr3_a_aa - cdr3 alpha chain distance matrix
        tr.pw_cdr3_b_aa - cdr3 beta chain distance matrix
"""
import pandas as pd
from tcrdist.repertoire import TCRrep

df = pd.read_csv("dash.csv")
tr = TCRrep(cell_df = df, 
            organism = 'mouse', 
            chains = ['alpha','beta'], 
            db_file = 'alphabeta_gammadelta_db.tsv')

tr.pw_alpha
tr.pw_beta
tr.pw_cdr3_a_aa
tr.pw_cdr3_b_aa

For details on options on computing distances, see TCR Distances.

Sparse Representation

For large datasets, you may want to set compute_distances to False and then use a sparse implementation. First, set tr.cpus as appropriate to your system. When computing distances with the sparse implementation, the argument radius is the maximum distance to be stored. All distances greater than radius will be converted to 0, reducing the memory required in a sparse format. The argument chunk_size tells tcrdist3 how many rows to compute at a time. For instance, if you have 100,000 x 100,000 clones, then a chunk size of 100 will compute distances 100x100,000 on each node and store each of the 1000 intermediate results in a sparse format before recombining them it a single sparse scipy.sparse.csr_matrix. Larger chunk sizes will result in less overhead, but chunk size should be tuned based on available memory. The results are object attributes rw_beta and rw_alpha. True 0 distances are represented as -1.

Try it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import pandas as pd
from tcrdist.repertoire import TCRrep

import pandas as pd
from tcrdist.repertoire import TCRrep

df = pd.read_csv("dash.csv")
tr = TCRrep(cell_df = df,
            organism = 'mouse',
            chains = ['alpha','beta'],
            db_file = 'alphabeta_gammadelta_db.tsv',
            compute_distances = False)

tr.cpus = 2
tr.compute_sparse_rect_distances(radius = 50, chunk_size = 100)
tr.rw_beta
"""<1920x1920 sparse matrix of type '<class 'numpy.int16'>'
	with 108846 stored elements in Compressed Sparse Row format>
"""
print(tr.rw_beta)
"""
  (0, 0)  -1
  (1, 1)  -1
  (1, 470)  24
  (1, 472)  24
  (2, 2)  -1
  : :
  (1919, 1911)  24