Hello tcrdist3¶
tcrdist3 can be run interactively. One way to get started is to try a few examples. Examples use the data file (dash.csv).
Hello tcrdist3¶
This is a hello world video for tcrdist3. It’s an easy test to make sure your installation is working. It also computes 3,686,400 pairwise paired chain distances in less than a few seconds, which isn’t a bad place to start.
Try it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | """ If you just want a 'tcrdistances' using pre-set default setting. You can access distance matrices: tr.pw_alpha - alpha chain pairwise distance matrix tr.pw_beta - alpha chain pairwise distance matrix tr.pw_cdr3_a_aa - cdr3 alpha chain distance matrix tr.pw_cdr3_b_aa - cdr3 beta chain distance matrix """ import pandas as pd from tcrdist.repertoire import TCRrep df = pd.read_csv("dash.csv") tr = TCRrep(cell_df = df, organism = 'mouse', chains = ['alpha','beta'], db_file = 'alphabeta_gammadelta_db.tsv') tr.pw_alpha tr.pw_beta tr.pw_cdr3_a_aa tr.pw_cdr3_b_aa |
For details on options on computing distances, see TCR Distances.
Sparse Representation¶
For large datasets, you may want to set compute_distances to False and then use a sparse implementation.
First, set tr.cpus
as appropriate to your system. When computing distances with the sparse implementation, the argument radius
is the maximum distance to be stored. All distances greater than radius
will be converted to 0, reducing the memory required in a sparse format. The argument chunk_size
tells tcrdist3 how many rows to compute at a time. For instance, if you have 100,000 x 100,000 clones, then a chunk size of 100 will compute distances 100x100,000 on each node and store each of the 1000 intermediate results in a sparse format before recombining them it a single sparse scipy.sparse.csr_matrix
. Larger chunk sizes will result in less overhead, but chunk size should be tuned based on available memory. The results are object attributes rw_beta
and rw_alpha
. True 0 distances are represented as -1.
Try it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | import pandas as pd from tcrdist.repertoire import TCRrep import pandas as pd from tcrdist.repertoire import TCRrep df = pd.read_csv("dash.csv") tr = TCRrep(cell_df = df, organism = 'mouse', chains = ['alpha','beta'], db_file = 'alphabeta_gammadelta_db.tsv', compute_distances = False) tr.cpus = 2 tr.compute_sparse_rect_distances(radius = 50, chunk_size = 100) tr.rw_beta """<1920x1920 sparse matrix of type '<class 'numpy.int16'>' with 108846 stored elements in Compressed Sparse Row format> """ print(tr.rw_beta) """ (0, 0) -1 (1, 1) -1 (1, 470) 24 (1, 472) 24 (2, 2) -1 : : (1919, 1911) 24 |