Pygor Tutorial

Welcome to the pygor3 Tutorial.

Pygor3 is an open source project and Python package that allows to analyze infer, evaluate and generate V(D)J sequences, by using IGoR’s.

Pygor3 could help you to get simple calculations and visualizations of the statistics in VDJ recombination

Introduction

alt text

An IGoR model’s encapsulates the Bayesian network probabilistic parameters of a V(D)J recombination process. IGoR is shipped with a set of default models.

As an example lets load the recombination model for a human \(\beta\) T-cell receptor

Loading default IgorModel

[1]:
import pygor3 as p3
mdl_hb = p3.get_default_IgorModel("human", "tcr_beta")
Reading Parms filename from:  /home/olivares/.local/share/igor/models/human/tcr_beta/models/model_parms.txt
Reading Marginals filename from:  /home/olivares/.local/share/igor/models/human/tcr_beta/models/model_marginals.txt
[2]:
mdl_hb
[2]:
<pygor3.IgorIO.IgorModel at 0x7f8568eeacd0>

Conditional probabilites

[3]:
mdl_hb.export_plot_Pconditionals('hb_CP')
_images/Tutorial_8_0.png
_images/Tutorial_8_1.png
_images/Tutorial_8_2.png
_images/Tutorial_8_3.png
_images/Tutorial_8_4.png
_images/Tutorial_8_5.png
_images/Tutorial_8_6.png
_images/Tutorial_8_7.png
_images/Tutorial_8_8.png
_images/Tutorial_8_9.png
_images/Tutorial_8_10.png
_images/Tutorial_8_11.png
[4]:
P_J_g_V = mdl_hb['j_choice']
P_J_g_V
[4]:
<xarray.DataArray (v_choice: 89, j_choice: 15)>
array([[1.28586e-01, 1.04003e-01, 4.10916e-02, ..., 0.00000e+00,
        5.36364e-02, 3.73186e-02],
       [1.66126e-01, 7.89615e-02, 1.13701e-02, ..., 2.46074e-02,
        6.49049e-02, 7.46140e-02],
       [1.66156e-01, 7.90103e-02, 1.13744e-02, ..., 2.45354e-02,
        6.49334e-02, 7.46554e-02],
       ...,
       [1.24050e-01, 4.76394e-02, 0.00000e+00, ..., 3.52196e-02,
        5.51254e-02, 6.01472e-02],
       [3.64941e-06, 2.04721e-09, 0.00000e+00, ..., 3.41728e-01,
        6.84855e-02, 8.22442e-02],
       [1.06077e-01, 1.13773e-01, 4.03129e-02, ..., 2.20679e-02,
        7.16947e-02, 5.81105e-02]])
Coordinates:
  * v_choice       (v_choice) int64 0 1 2 3 4 5 6 7 ... 81 82 83 84 85 86 87 88
    lbl__v_choice  (v_choice) <U11 'TRBV1*01' 'TRBV10-1*01' ... 'TRBV9*01'
    seq__v_choice  (v_choice) object 'GATACTGGAATTACCCAGACACCAAAATACCTGGTCACA...
  * j_choice       (j_choice) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
    lbl__j_choice  (j_choice) <U10 'TRBJ1-1*01' 'TRBJ1-2*01' ... 'TRBJ2-7*02'
    seq__j_choice  (j_choice) object 'TGAACACTGAAGCTTTCTTTGGACAAGGCACCAGACTCA...
Attributes:
    nickname:    j_choice
    event_type:  GeneChoice
    seq_type:    J_gene
    seq_side:    Undefined_side
    priority:    7
    parents:     ['v_choice']
    childs:      ['j_5_del', 'd_gene']
[5]:
P_J_g_V[{'j_choice': 3, 'v_choice': 7}]
[5]:
<xarray.DataArray ()>
array(0.0052083)
Coordinates:
    v_choice       int64 7
    lbl__v_choice  <U11 'TRBV11-1*01'
    seq__v_choice  object 'GAAGCTGAAGTTGCCCAGTCCCCCAGATATAAGATTACAGAGAAAAGCCA...
    j_choice       int64 3
    lbl__j_choice  <U10 'TRBJ1-4*01'
    seq__j_choice  object 'CAACTAATGAAAAACTGTTTTTTGGCAGTGGAACCCAGCTCTCTGTCTTGG'
Attributes:
    nickname:    j_choice
    event_type:  GeneChoice
    seq_type:    J_gene
    seq_side:    Undefined_side
    priority:    7
    parents:     ['v_choice']
    childs:      ['j_5_del', 'd_gene']
[6]:
mdl_hb.plot_Event('j_choice')
[6]:
(<Figure size 1296x1080 with 2 Axes>,
 <AxesSubplot:title={'center':'$P($j_choice$|$v_choice$)$'}, xlabel='j_choice', ylabel='v_choice'>)
_images/Tutorial_11_1.png
[ ]:

Marginal probabilities

Examples of marginals probabilities

$ P(J) = \sum_V P(J|V) P(V) $

$ P(D) = \sum_V P(D|V, J) P(V,J) $

[7]:
mdl_hb.export_plot_Pmarginals('hb_MP')
_images/Tutorial_14_0.png
_images/Tutorial_14_1.png
_images/Tutorial_14_2.png
_images/Tutorial_14_3.png
_images/Tutorial_14_4.png
_images/Tutorial_14_5.png
_images/Tutorial_14_6.png
_images/Tutorial_14_7.png
_images/Tutorial_14_8.png
_images/Tutorial_14_9.png
_images/Tutorial_14_10.png
_images/Tutorial_14_11.png
[8]:
mdl_hb.Pmarginal['j_choice']
[8]:
<xarray.DataArray (j_choice: 15)>
array([0.1196221 , 0.105381  , 0.02016193, 0.05684022, 0.09945699,
       0.03726755, 0.03223616, 0.12542481, 0.05034467, 0.09853413,
       0.0194823 , 0.07500332, 0.01905345, 0.0709004 , 0.07029394])
Coordinates:
  * j_choice       (j_choice) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
    lbl__j_choice  (j_choice) object 'K02545|TRBJ1-1*01|Homo sapiens|F|J-REGI...
    seq__j_choice  (j_choice) object 'TGAACACTGAAGCTTTCTTTGGACAAGGCACCAGACTCA...
[9]:
mdl_hb.plot_Event_Marginal('j_choice')
[9]:
<AxesSubplot:xlabel='j choice', ylabel='P'>
_images/Tutorial_16_1.png

Joint Probabilities

[10]:
P_V_J = mdl_hb.get_P_joint(['v_choice', 'j_choice'])
P_V_J
[10]:
<xarray.DataArray (v_choice: 89, j_choice: 15)>
array([[6.28453463e-04, 5.08306114e-04, 2.00831906e-04, ...,
        0.00000000e+00, 2.62143710e-04, 1.82391677e-04],
       [1.54890932e-03, 7.36213918e-04, 1.06011576e-04, ...,
        2.29432410e-04, 6.05154843e-04, 6.95679479e-04],
       [1.54900602e-03, 7.36581450e-04, 1.06039046e-04, ...,
        2.28733827e-04, 6.05348578e-04, 6.95983268e-04],
       ...,
       [4.05407259e-04, 1.55690174e-04, 0.00000000e+00, ...,
        1.15101267e-04, 1.80155267e-04, 1.96566942e-04],
       [3.00565952e-09, 1.68608515e-12, 0.00000000e+00, ...,
        2.81447867e-04, 5.64048061e-05, 6.77364474e-05],
       [1.65903669e-03, 1.77940202e-03, 6.30491475e-04, ...,
        3.45140763e-04, 1.12130116e-03, 9.08844648e-04]])
Coordinates:
  * v_choice       (v_choice) int64 0 1 2 3 4 5 6 7 ... 81 82 83 84 85 86 87 88
    seq__v_choice  (v_choice) object 'GATACTGGAATTACCCAGACACCAAAATACCTGGTCACA...
  * j_choice       (j_choice) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
    seq__j_choice  (j_choice) object 'TGAACACTGAAGCTTTCTTTGGACAAGGCACCAGACTCA...
    lbl__v_choice  (v_choice) <U11 'TRBV1*01' 'TRBV10-1*01' ... 'TRBV9*01'
    lbl__j_choice  (j_choice) <U10 'TRBJ1-1*01' 'TRBJ1-2*01' ... 'TRBJ2-7*02'
[11]:
P_V_J.plot(cmap='gnuplot2_r')
[11]:
<matplotlib.collections.QuadMesh at 0x7f85630c9290>
_images/Tutorial_19_1.png

Entropy

\(H = -P(\vec{E})\sum_{\vec{E}} \log_2 P(\vec{E})\)

[12]:
mdl_hb.plot_recombination_entropy()
[12]:
<AxesSubplot:>
_images/Tutorial_21_1.png
[13]:
mdl_hb.get_df_entropy_decomposition()
[13]:
event_nickname event_type seq_type entropy
0 v_choice GeneChoice V_gene 5.252905
1 d_gene GeneChoice D_gene 1.141779
2 j_choice GeneChoice J_gene 3.609102
3 vd_ins Insertion VD_genes 14.894931
4 dj_ins Insertion DJ_gene 14.981991
5 v_3_del Deletion V_gene 3.147511
6 d_3_del Deletion D_gene 2.778230
7 d_5_del Deletion D_gene 3.634137
8 j_5_del Deletion J_gene 3.356340
[14]:
da_mi = mdl_hb.get_mutual_information()
mdl_hb.plot_mutual_information(da_mi)
[14]:
<AxesSubplot:>
_images/Tutorial_23_1.png

Evaluate Sequences

[15]:
str_seq = "TTGAAATGTGAACAACATCTGGGTCATAACGCTATGTATTGGTACAAGCAAAGTGCTAAGAAGCCACTGGAGCTCATGTTTGTCTACAGTCTTGAAGAACGGGTTGAAAACAACAGTGTGCCAAGTCGCTTCTCACCTGAATGCCCCAACAGCTCTCACTTATCCCTTCACCTACACACCCTGCAGCCAGAAGACTCGGCCCTGTATCTCTGCGCCAGCAGCCGTAGGCGAGCGCGGCGGGGAGCTGTTTTTTGGAGAAGGCTCTAGGCTGACCGTACTGG"
str_seq
[15]:
'TTGAAATGTGAACAACATCTGGGTCATAACGCTATGTATTGGTACAAGCAAAGTGCTAAGAAGCCACTGGAGCTCATGTTTGTCTACAGTCTTGAAGAACGGGTTGAAAACAACAGTGTGCCAAGTCGCTTCTCACCTGAATGCCCCAACAGCTCTCACTTATCCCTTCACCTACACACCCTGCAGCCAGAAGACTCGGCCCTGTATCTCTGCGCCAGCAGCCGTAGGCGAGCGCGGCGGGGAGCTGTTTTTTGGAGAAGGCTCTAGGCTGACCGTACTGG'
[16]:
df_scens = p3.evaluate(str_seq, mdl_hb, N_scenarios=10, igor_wd='tmp', batch_clean=False)
Writing model parms in file  tmp/dataIGoRrUn7rDfYiq_mdldata/models/model_parms.txt
Writing model marginals in file  tmp/dataIGoRrUn7rDfYiq_mdldata/models/model_marginals.txt
Writing gene anchor's in file  tmp/dataIGoRrUn7rDfYiq_mdldata/ref_genome/V_gene_CDR3_anchors.csv
Writing gene anchor's in file  tmp/dataIGoRrUn7rDfYiq_mdldata/ref_genome/J_gene_CDR3_anchors.csv
Writing model parms in file  tmp/dataIGoRrUn7rDfYiq_mdldata//models/model_parms.txt
Writing model marginals in file  tmp/dataIGoRrUn7rDfYiq_mdldata//models/model_marginals.txt
Writing gene anchor's in file  tmp/dataIGoRrUn7rDfYiq_mdldata//ref_genome/V_gene_CDR3_anchors.csv
Writing gene anchor's in file  tmp/dataIGoRrUn7rDfYiq_mdldata//ref_genome/J_gene_CDR3_anchors.csv
/home/olivares/.local/bin/igor -set_wd tmp -batch dataIGoRrUn7rDfYiq -read_seqs tmp/dataIGoRrUn7rDfYiqinput_sequences.csv
/home/olivares/.local/bin/igor -set_wd tmp -batch dataIGoRrUn7rDfYiq -set_genomic  --V tmp/dataIGoRrUn7rDfYiq_mdldata//ref_genome/genomicVs.fasta --D tmp/dataIGoRrUn7rDfYiq_mdldata//ref_genome/genomicDs.fasta --J tmp/dataIGoRrUn7rDfYiq_mdldata//ref_genome/genomicJs.fasta -set_CDR3_anchors  --V tmp/dataIGoRrUn7rDfYiq_mdldata//ref_genome/V_gene_CDR3_anchors.csv --J tmp/dataIGoRrUn7rDfYiq_mdldata//ref_genome/J_gene_CDR3_anchors.csv -align  --all
Batch name set to: dataIGoRrUn7rDfYiq_
Working directory set to: "tmp/"
Performing V alignments....
V_gene alignments [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Performing D alignments....
D_gene alignments [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Performing J alignments....
J_gene alignments [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Performing CDR3 sequence extraction ....
Writing model parms in file  tmp/dataIGoRrUn7rDfYiq_mdldata//models/model_parms.txt
Writing model marginals in file  tmp/dataIGoRrUn7rDfYiq_mdldata//models/model_marginals.txt
Writing gene anchor's in file  tmp/dataIGoRrUn7rDfYiq_mdldata//ref_genome/V_gene_CDR3_anchors.csv
Writing gene anchor's in file  tmp/dataIGoRrUn7rDfYiq_mdldata//ref_genome/J_gene_CDR3_anchors.csv
/home/olivares/.local/bin/igor -set_wd tmp -batch dataIGoRrUn7rDfYiq -set_custom_model tmp/dataIGoRrUn7rDfYiq_mdldata//models/model_parms.txt tmp/dataIGoRrUn7rDfYiq_mdldata//models/model_marginals.txt -set_CDR3_anchors  --V tmp/dataIGoRrUn7rDfYiq_mdldata//ref_genome/V_gene_CDR3_anchors.csv --J tmp/dataIGoRrUn7rDfYiq_mdldata//ref_genome/J_gene_CDR3_anchors.csv -evaluate  -output  --scenarios 10  --Pgen
igor_fln_generated_realizations_werr:  tmp/dataIGoRrUn7rDfYiq_output/best_scenarios_counts.csv

The column scenario_proba_cond_seq gives us

$ P(\text{scenario}|:nbsphinx-math:sigma) $

and the event columns provide us the selected parameter for the scenarios

$ P(\text{scenario}) = P(\text{v_choice}{id}) :nbsphinx-math:`times `P(:nbsphinx-math:`text{j_choice}`\text{id}|:nbsphinx-math:text{v_choice}{id}) :nbsphinx-math:`times `P(:nbsphinx-math:`text{d_gene}`\text{id} | \text{j_choice}:nbsphinx-math:`text{id}`, :nbsphinx-math:`text{v_choice}`\text{id}) … $

[17]:
df_scens
[17]:
scenario_rank scenario_proba_cond_seq v_choice j_choice d_gene v_3_del d_5_del j_5_del d_3_del vd_ins vd_dinucl dj_ins dj_dinucl Mismatches Pgen_estimate
seq_index
0 1 0.178551 48 8 1 8 10 11 10 7 [2, 3, 0, 2, 2, 1, 2] 3 [2, 2, 1] [] 1.426210e-16
0 2 0.071263 48 8 0 8 9 11 8 2 [2, 3] 9 [2, 2, 1, 2, 1, 2, 0, 2, 1] [] 1.426210e-16
0 3 0.060515 48 8 0 8 13 11 4 3 [2, 3, 0] 8 [2, 2, 1, 2, 1, 2, 0, 2] [] 1.426210e-16
0 4 0.042415 48 8 1 8 11 11 9 10 [2, 3, 0, 2, 2, 1, 2, 0, 2, 1] 0 [] [] 1.426210e-16
0 5 0.038252 48 8 1 8 11 11 10 4 [2, 3, 0, 2] 7 [2, 2, 1, 2, 1, 2, 0] [] 1.426210e-16
0 6 0.036261 48 8 1 8 11 11 10 8 [2, 3, 0, 2, 2, 1, 2, 0] 3 [2, 2, 1] [] 1.426210e-16
0 7 0.030704 48 8 2 8 15 11 6 6 [2, 3, 0, 2, 2, 1] 5 [2, 2, 1, 2, 1] [] 1.426210e-16
0 8 0.028821 48 8 1 8 10 11 11 7 [2, 3, 0, 2, 2, 1, 2] 4 [2, 2, 1, 2] [] 1.426210e-16
0 9 0.019521 48 8 2 8 16 11 5 2 [2, 3] 9 [2, 2, 1, 2, 1, 2, 0, 2, 1] [] 1.426210e-16
0 10 0.016049 48 8 0 8 13 11 3 3 [2, 3, 0] 7 [2, 2, 1, 2, 1, 2, 0] [] 1.426210e-16

Visualize a scenario

[18]:
ps_scenario = df_scens.iloc[0]
ps_scenario
mdl_hb.plot_scenario(ps_scenario)
[18]:
<AxesSubplot:>
_images/Tutorial_30_1.png
[19]:
# ps_scenario = df_scens.iloc[0]
for index, ps_scenario in df_scens.iterrows():
    mdl_hb.plot_scenario(ps_scenario)
_images/Tutorial_31_0.png
_images/Tutorial_31_1.png
_images/Tutorial_31_2.png
_images/Tutorial_31_3.png
_images/Tutorial_31_4.png
_images/Tutorial_31_5.png
_images/Tutorial_31_6.png
_images/Tutorial_31_7.png
_images/Tutorial_31_8.png
_images/Tutorial_31_9.png
[20]:
mdl_hb.genomic_dataframe_dict['J']
[20]:
name value anchor_index
id
0 K02545|TRBJ1-1*01|Homo sapiens|F|J-REGION|749.... TGAACACTGAAGCTTTCTTTGGACAAGGCACCAGACTCACAGTTGTAG 17
1 K02545|TRBJ1-2*01|Homo sapiens|F|J-REGION|886.... CTAACTATGGCTACACCTTCGGTTCGGGGACCAGGTTAACCGTTGTAG 17
2 M14158|TRBJ1-3*01|Homo sapiens|F|J-REGION|1499... CTCTGGAAACACCATATATTTTGGAGAGGGAAGTTGGCTCACTGTT... 19
3 M14158|TRBJ1-4*01|Homo sapiens|F|J-REGION|2095... CAACTAATGAAAAACTGTTTTTTGGCAGTGGAACCCAGCTCTCTGT... 20
4 M14158|TRBJ1-5*01|Homo sapiens|F|J-REGION|2368... TAGCAATCAGCCCCAGCATTTTGGTGATGGGACTCGACTCTCCATC... 19
5 M14158|TRBJ1-6*01|Homo sapiens|F|J-REGION|2859... CTCCTATAATTCACCCCTCCACTTTGGGAATGGGACCAGGCTCACT... 22
6 L36092|TRBJ1-6*02|Homo sapiens|F|J-REGION|6430... CTCCTATAATTCACCCCTCCACTTTGGGAACGGGACCAGGCTCACT... 22
7 X02987|TRBJ2-1*01|Homo sapiens|F|J-REGION|800.... CTCCTACAATGAGCAGTTCTTCGGGCCAGGGACACGGCTCACCGTG... 19
8 X02987|TRBJ2-2*01|Homo sapiens|F|J-REGION|995.... CGAACACCGGGGAGCTGTTTTTTGGAGAAGGCTCTAGGCTGACCGT... 20
9 X02987|TRBJ2-3*01|Homo sapiens|F|J-REGION|1282... AGCACAGATACGCAGTATTTTGGCCCAGGCACCCGGCTGACAGTGCTCG 18
10 X02987|TRBJ2-4*01|Homo sapiens|F|J-REGION|1432... AGCCAAAAACATTCAGTACTTCGGCGCCGGGACCCGGCTCTCAGTG... 19
11 X02987|TRBJ2-5*01|Homo sapiens|F|J-REGION|1553... ACCAAGAGACCCAGTACTTCGGGCCAGGCACGCGGCTCCTGGTGCTCG 17
12 X02987|TRBJ2-6*01|Homo sapiens|F|J-REGION|1673... CTCTGGGGCCAACGTCCTGACTTTCGGGGCCGGCAGCAGGCTGACC... 22
13 M14159|TRBJ2-7*01|Homo sapiens|F|J-REGION|2316... CTCCTACGAGCAGTACTTCGGGCCGGGCACCAGGCTCACGGTCACAG 16
14 X02987|TRBJ2-7*02|Homo sapiens|ORF|J-REGION|18... CTCCTACGAGCAGTACGTCGGGCCGGGCACCAGGCTCACGGTCACAG 16

Generating random sequences from model

[21]:
df_gen_seqs = p3.generate(Nseqs=10, mdl=mdl_hb)
df_gen_seqs
Writing model parms in file  ./igor_generating_u_gmyr0_/dataIGoRdx2C1EIkND_mdldata/models/model_parms.txt
Writing model marginals in file  ./igor_generating_u_gmyr0_/dataIGoRdx2C1EIkND_mdldata/models/model_marginals.txt
Writing gene anchor's in file  ./igor_generating_u_gmyr0_/dataIGoRdx2C1EIkND_mdldata/ref_genome/V_gene_CDR3_anchors.csv
Writing gene anchor's in file  ./igor_generating_u_gmyr0_/dataIGoRdx2C1EIkND_mdldata/ref_genome/J_gene_CDR3_anchors.csv
Writing model parms in file  ./igor_generating_u_gmyr0_/dataIGoRdx2C1EIkND_mdldata//models/model_parms.txt
Writing model marginals in file  ./igor_generating_u_gmyr0_/dataIGoRdx2C1EIkND_mdldata//models/model_marginals.txt
Writing gene anchor's in file  ./igor_generating_u_gmyr0_/dataIGoRdx2C1EIkND_mdldata//ref_genome/V_gene_CDR3_anchors.csv
Writing gene anchor's in file  ./igor_generating_u_gmyr0_/dataIGoRdx2C1EIkND_mdldata//ref_genome/J_gene_CDR3_anchors.csv
/home/olivares/.local/bin/igor -set_wd ./igor_generating_u_gmyr0_ -batch dataIGoRdx2C1EIkND -set_custom_model ./igor_generating_u_gmyr0_/dataIGoRdx2C1EIkND_mdldata//models/model_parms.txt ./igor_generating_u_gmyr0_/dataIGoRdx2C1EIkND_mdldata//models/model_marginals.txt -set_CDR3_anchors  --V ./igor_generating_u_gmyr0_/dataIGoRdx2C1EIkND_mdldata//ref_genome/V_gene_CDR3_anchors.csv --J ./igor_generating_u_gmyr0_/dataIGoRdx2C1EIkND_mdldata//ref_genome/J_gene_CDR3_anchors.csv -generate 10
[21]:
nt_sequence
seq_index
0 CATGCCAAAGTCACACAGACTCCAGGACATTTGGTCAAAGGAAAAG...
1 GATGCTGGAGTTATCCAGTCACCCCGCCATGAGGTGACAGAGATGG...
2 GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG...
3 AATGCTGGTGTCACTCAGACCCCAAAATTCCAGGTCCTGAAGACAG...
4 AAGGCTGGAGTCACTCAAACTCCAAGATATCTGATCAAAACGAGAG...
5 AATGCCGGCGTCATGCAGAACCCAAGACACCTGGTCAGGAGGAGGG...
6 GATGCTGATGTTACCCAGACCCCAAGGAATAGGATCACAAAGACAG...
7 AATGCTGGTGTCACTCAGACCCCAAAATTCCGGGTCCTGAAGACAG...
8 GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG...
9 GACGCTGGAGTCACCCAAAGTCCCACACACCTGATCAAAACGAGAG...
[22]:
df_gen_seqs = p3.generate(Nseqs=10, mdl=mdl_hb, return_scenarios=True)
df_gen_seqs
Writing model parms in file  ./igor_generating_wwkgcy8c/dataIGoRWb17vmfBGo_mdldata/models/model_parms.txt
Writing model marginals in file  ./igor_generating_wwkgcy8c/dataIGoRWb17vmfBGo_mdldata/models/model_marginals.txt
Writing gene anchor's in file  ./igor_generating_wwkgcy8c/dataIGoRWb17vmfBGo_mdldata/ref_genome/V_gene_CDR3_anchors.csv
Writing gene anchor's in file  ./igor_generating_wwkgcy8c/dataIGoRWb17vmfBGo_mdldata/ref_genome/J_gene_CDR3_anchors.csv
Writing model parms in file  ./igor_generating_wwkgcy8c/dataIGoRWb17vmfBGo_mdldata//models/model_parms.txt
Writing model marginals in file  ./igor_generating_wwkgcy8c/dataIGoRWb17vmfBGo_mdldata//models/model_marginals.txt
Writing gene anchor's in file  ./igor_generating_wwkgcy8c/dataIGoRWb17vmfBGo_mdldata//ref_genome/V_gene_CDR3_anchors.csv
Writing gene anchor's in file  ./igor_generating_wwkgcy8c/dataIGoRWb17vmfBGo_mdldata//ref_genome/J_gene_CDR3_anchors.csv
/home/olivares/.local/bin/igor -set_wd ./igor_generating_wwkgcy8c -batch dataIGoRWb17vmfBGo -set_custom_model ./igor_generating_wwkgcy8c/dataIGoRWb17vmfBGo_mdldata//models/model_parms.txt ./igor_generating_wwkgcy8c/dataIGoRWb17vmfBGo_mdldata//models/model_marginals.txt -set_CDR3_anchors  --V ./igor_generating_wwkgcy8c/dataIGoRWb17vmfBGo_mdldata//ref_genome/V_gene_CDR3_anchors.csv --J ./igor_generating_wwkgcy8c/dataIGoRWb17vmfBGo_mdldata//ref_genome/J_gene_CDR3_anchors.csv -generate 10
igor_fln_generated_realizations_werr:  ./igor_generating_wwkgcy8c/dataIGoRWb17vmfBGo_generated/generated_realizations_werr.csv
[22]:
nt_sequence v_choice j_choice d_gene v_3_del d_5_del j_5_del d_3_del vd_ins vd_dinucl dj_ins dj_dinucl Errors
seq_index
0 GATGGTGGAATCACTCAGTCCCCAAAGTACCTGTTCAGAAAGGAAG... 26 3 0 13 7 9 8 2 [2, 1] 4 [2, 2, 2, 1] []
1 ACTGAAGTTACCCAGACACCAAAACACCTGGTCATGGGAATGACAA... 45 9 0 14 6 6 11 11 [3, 2, 1, 0, 0, 1, 1, 1, 1, 2, 2] 7 [1, 3, 2, 3, 0, 2, 2] []
2 GGAGCTGGAGTCTCCCAGTCCCCCAGTAACAAGGTCACAGAGAAGG... 70 0 0 11 8 4 9 8 [0, 2, 2, 2, 1, 2, 0, 0] 5 [2, 2, 2, 0, 0] []
3 CATGCCAAAGTCACACAGACTCCAGGACATTTGGTCAAAGGAAAAG... 30 3 0 14 3 11 8 4 [0, 0, 0, 2] 9 [2, 0, 1, 3, 1, 0, 0, 2, 2] [265]
4 GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGG... 73 1 0 9 9 11 7 2 [1, 1] 2 [3, 2] []
5 GAGGCTGGAGTCACCCAAAGTCCCACACACCTGATCAAAACGAGAG... 51 10 0 9 9 7 11 6 [2, 1, 1, 0, 2, 3] 0 [] []
6 GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG... 29 4 0 7 4 12 6 3 [3, 1, 1] 11 [3, 3, 3, 0, 1, 1, 1, 1, 2, 3, 2] []
7 GAACCTGAAGTCACCCAGACTCCCAGCCATCAGGTCACACAGATGG... 28 9 0 12 13 8 7 7 [0, 1, 1, 3, 2, 1, 0] 0 [] []
8 GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG... 29 7 1 7 4 13 6 3 [3, 1, 3] 4 [2, 3, 3, 2] []
9 GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG... 29 11 0 9 8 6 9 13 [1, 1, 1, 1, 1, 3, 1, 1, 3, 1, 1, 2, 2] 6 [2, 2, 0, 0, 3, 0] []
[23]:
df_gen_seqs['nt_sequence']
[23]:
seq_index
0    GATGGTGGAATCACTCAGTCCCCAAAGTACCTGTTCAGAAAGGAAG...
1    ACTGAAGTTACCCAGACACCAAAACACCTGGTCATGGGAATGACAA...
2    GGAGCTGGAGTCTCCCAGTCCCCCAGTAACAAGGTCACAGAGAAGG...
3    CATGCCAAAGTCACACAGACTCCAGGACATTTGGTCAAAGGAAAAG...
4    GGTGCTGGAGTCTCCCAGACCCCCAGTAACAAGGTCACAGAGAAGG...
5    GAGGCTGGAGTCACCCAAAGTCCCACACACCTGATCAAAACGAGAG...
6    GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG...
7    GAACCTGAAGTCACCCAGACTCCCAGCCATCAGGTCACACAGATGG...
8    GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG...
9    GGTGCTGTCGTCTCTCAACATCCGAGCTGGGTTATCTGTAAGAGTG...
Name: nt_sequence, dtype: object
[ ]:

Inferring a new model

[24]:
# FIXME: For this example we use some Emerson's data, add the reference
[25]:
import pandas as pd
df_input = pd.read_csv('HIP00110.tsv.gz', sep='\t')
df_input
[25]:
nucleotide aminoAcid count (templates/reads) frequencyCount (%) cdr3Length vMaxResolved vFamilyName vGeneName vGeneAllele vFamilyTies vGeneNameTies vGeneAlleleTies dMaxResolved dFamilyName dGeneName dGeneAllele dFamilyTies dGeneNameTies dGeneAlleleTies jMaxResolved jFamilyName jGeneName jGeneAllele jFamilyTies jGeneNameTies jGeneAlleleTies vDeletion n1Insertion d5Deletion d3Deletion n2Insertion jDeletion vIndex n1Index dIndex n2Index jIndex estimatedNumberGenomes sequenceStatus cloneResolved vOrphon dOrphon jOrphon vFunction dFunction jFunction fractionNucleated vAlignLength vAlignSubstitutionCount vAlignSubstitutionIndexes vAlignSubstitutionGeneThreePrimeIndexes vSeqWithMutations
0 GAATGTGAGCACCTTGGAGCTGGGGGACTCGGCCCTTTATCTTTGC... NaN 3088 0.019194 38 TCRBV05-01*01 TCRBV05 TCRBV05-01 1.0 NaN NaN NaN TCRBD01-01*01 TCRBD01 TCRBD01-01 1.0 NaN NaN NaN TCRBJ01-06*01 TCRBJ01 TCRBJ01-06 1.0 NaN NaN NaN 4 1 5 2 5 10 43 55 60 65 66 3088 Out VDJ NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 GCTACCAGCTCCCAGACATCTGTGTACTTCTGTGCCACCACGGGTA... CATTGTSGGPSQSTQYF 1772 0.010946 51 TCRBV10-03*01 TCRBV10 TCRBV10-03 1.0 NaN NaN NaN TCRBD02-01*01 TCRBD02 TCRBD02-01 1.0 NaN NaN NaN TCRBJ02-03*01 TCRBJ02 TCRBJ02-03 1.0 NaN NaN NaN 10 12 3 2 8 8 30 37 45 56 68 1772 In VDJ NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 ATCCAGCGCACAGAGCAGGGGGACTCGGCCATGTATCTCTGTGCCA... CASSLRVGGYGYTF 1763 0.010841 42 TCRBV07-09 TCRBV07 TCRBV07-09 NaN NaN NaN 01,03 TCRBD02-01*01 TCRBD02 TCRBD02-01 1.0 NaN NaN NaN TCRBJ01-02*01 TCRBJ01 TCRBJ01-02 1.0 NaN NaN NaN 2 0 8 2 5 4 39 54 59 -1 65 1763 In VDJ NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 TGCAGCAAGAAGACTCAGCTGCGTATCTCTGCACCAGCAGCCAAGG... NaN 1241 0.007660 52 TCRBV01-01*01 TCRBV01 TCRBV01-01 1.0 NaN NaN NaN TCRBD02-01*01 TCRBD02 TCRBD02-01 1.0 NaN NaN NaN TCRBJ02-01*01 TCRBJ02 TCRBJ02-01 1.0 NaN NaN NaN 1 3 7 1 6 3 29 45 51 59 62 1241 Out VDJ NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 GAGTCTGCCAGGCCCTCACATACCTCTCAGTACCTCTGTGCCAGCA... CASRRLGGLNTEAFF 985 0.006130 45 TCRBV25-01*01 TCRBV25 TCRBV25-01 1.0 NaN NaN NaN TCRBD02-01*02 TCRBD02 TCRBD02-01 2.0 NaN NaN NaN TCRBJ01-01*01 TCRBJ01 TCRBJ01-01 1.0 NaN NaN NaN 6 1 10 0 7 0 36 47 54 60 61 985 In VDJ NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
130935 AAAAGAACCCGACAGCTTTCTATCTCTGTGCCAGTAACGCGGGTCT... NaN 1 0.000001 55 TCRBV19-01 TCRBV19 TCRBV19-01 NaN NaN NaN 01,02 TCRBD02-01 TCRBD02 TCRBD02-01 NaN NaN NaN 01,02 TCRBJ01-05*01 TCRBJ01 TCRBJ01-05 1.0 NaN NaN NaN 7 16 7 4 2 0 26 36 38 43 59 1 Out VDJ NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
130936 AAAAGAACCCGACAGCTTTCTATCTCTGTGCCAGCAGTAACCCCCC... NaN 1 0.000001 55 TCRBV19-01 TCRBV19 TCRBV19-01 NaN NaN NaN 01,02 TCRBD02-01 TCRBD02 TCRBD02-01 NaN NaN NaN 01,02 TCRBJ02-04*01 TCRBJ02 TCRBJ02-04 1.0 NaN NaN NaN 4 11 8 5 6 0 26 39 45 48 59 1 Out VDJ NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
130937 AAAACTCATCCTGTACCTTGGAGATCCAGTCCACGGAGTCAGGGGA... NaN 1 0.000010 19 TCRBV21-01*01 TCRBV21 TCRBV21-01 1.0 NaN NaN NaN NaN NaN unresolved NaN TCRBD01,TCRBD02 TCRBD01-01,TCRBD02-01 NaN TCRBJ01-03*01 TCRBJ01 TCRBJ01-03 1.0 NaN NaN NaN 20 0 1 8 7 10 62 59 66 -1 69 1 Out VDJ NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
130938 AAAAAAACAGAATTTCCTGCTGGGGTTGGAGTCGGCTGCTCCCTCC... NaN 1 0.000002 17 TCRBV06 TCRBV06 unresolved NaN NaN TCRBV06-02,TCRBV06-03 NaN TCRBD01-01*01 TCRBD01 TCRBD01-01 1.0 NaN NaN NaN TCRBJ02-01*01 TCRBJ02 TCRBJ02-01 1.0 NaN NaN NaN 17 0 4 4 0 9 64 -1 64 -1 68 1 Out VDJ NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
130939 AAAAAAACAGAATTTCCTGCTGGGGTTGGAGTCGGCTGCTCCCTCC... NaN 1 0.000006 17 TCRBV06 TCRBV06 unresolved NaN NaN TCRBV06-02,TCRBV06-03 NaN TCRBD01-01*01 TCRBD01 TCRBD01-01 1.0 NaN NaN NaN TCRBJ01-01*01 TCRBJ01 TCRBJ01-01 1.0 NaN NaN NaN 18 0 3 4 3 10 64 63 66 -1 71 1 Out VDJ NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

130940 rows × 52 columns

Get Genomic Germline templates from IMGT

[26]:
imgt_species_list = p3.imgt.get_species_list()
print( imgt_species_list )
['', 'Macaca+nemestrina', 'Rattus+norvegicus', 'Mus+saxicola', 'Gallus+gallus', 'Gorilla+gorilla+gorilla', 'Canis+lupus+familiaris', 'Felis+catus', 'Mus+minutoides', 'Salmo+salar', 'Ornithorhynchus+anatinus', 'Oryctolagus+cuniculus', 'Mus+pahari', 'Cercocebus+atys', 'Homo+sapiens', 'Tursiops+truncatus', 'Oncorhynchus+mykiss', 'Papio+anubis+anubis', 'Mus+cookii', 'Heterocephalus+glaber', 'Macaca+mulatta', 'Equus+caballus', 'Bos+taurus', 'Mus+musculus', 'Mustela+putorius+furo', 'Capra+hircus', 'Danio+rerio', 'Sus+scrofa', 'Camelus+dromedarius', 'Ovis+aries', 'Vicugna+pacos', 'Rattus+rattus', 'Macaca+fascicularis', 'Mus+spretus']
[27]:
imgt_species = 'Homo+sapiens'
imgt_chain = 'TRB'
hb_genomic_dict = p3.imgt.download_ref_genome(imgt_species, imgt_chain, dropna=True)
Homo+sapiens TRBV http://www.imgt.org/genedb/GENElect?query=7.2+TRBV&species=Homo+sapiens
http://www.imgt.org/genedb/GENElect?query=7.2+TRBV&species=Homo+sapiens
Homo+sapiens TRBD http://www.imgt.org/genedb/GENElect?query=7.2+TRBD&species=Homo+sapiens
http://www.imgt.org/genedb/GENElect?query=7.2+TRBD&species=Homo+sapiens
Homo+sapiens TRBJ http://www.imgt.org/genedb/GENElect?query=7.2+TRBJ&species=Homo+sapiens
http://www.imgt.org/genedb/GENElect?query=7.2+TRBJ&species=Homo+sapiens
http://www.imgt.org/genedb/GENElect?query=8.1+TRBV&species=Homo+sapiens&IMGTlabel=2nd-CYS
http://www.imgt.org/genedb/GENElect?query=8.1+TRBJ&species=Homo+sapiens&IMGTlabel=J-PHE
http://www.imgt.org/genedb/GENElect?query=8.1+TRBJ&species=Homo+sapiens&IMGTlabel=J-TRP
http://www.imgt.org/genedb/GENElect?query=8.1+TRBV&species=Homo+sapiens&IMGTlabel=2nd-CYS
http://www.imgt.org/genedb/GENElect?query=8.1+Homo+sapiensJ&species=Homo+sapiens&IMGTlabel=J-PHE
http://www.imgt.org/genedb/GENElect?query=8.1+Homo+sapiensJ&species=Homo+sapiens&IMGTlabel=J-TRP
[28]:
hb_genomic_dict
[28]:
{'V':                                                  value         name  \
 0    GATACTGGAATTACCCAGACACCAAAATACCTGGTCACAGCAATGG...     TRBV1*01
 1    GATGCTGAAATCACCCAGAGCCCAAGACACAAGATCACAGAGACAG...  TRBV10-1*01
 2    GATGCTGAAATCACCCAGAGCCCAAGACACAAGATCACAGAGACAG...  TRBV10-1*02
 3    GATGCTGGAATCACCCAGAGCCCAAGATACAAGATCACAGAGACAG...  TRBV10-2*01
 4    GATGCTGGAATCACCCAGAGCCCAAGATACAAGATCACAGAGACAG...  TRBV10-2*02
 ..                                                 ...          ...
 140  GAGGCAGGGATCAGCCAGATACCAAGATATCACAGACACACAGGGA...   TRBV8-1*02
 141  GATGCTGGGATCACCCAGATGCCAAGATATCACATTGTACAGAAGA...   TRBV8-2*02
 142  GATTCTGGAGTCACACAAACCCCAAAGCACCTGATCACAGCAACTG...     TRBV9*01
 143  GATTCTGGAGTCACACAAACCCCAAAGCACCTGATCACAGCAACTG...     TRBV9*02
 144  GATTCTGGAGTCACACAAACCCCAAAGCACCTGATCACAGCAACTG...     TRBV9*03

      anchor_index gfunction
 0             267         P
 1             270         F
 2             270         F
 3             270         F
 4             270         F
 ..            ...       ...
 140           265         P
 141           257         P
 142           270         F
 143           270         F
 144           270       (F)

 [145 rows x 4 columns],
 'D':         name             value
 id
 0   TRBD1*01      GGGACAGGGGGC
 1   TRBD2*01  GGGACTAGCGGGGGGG
 2   TRBD2*02  GGGACTAGCGGGAGGG,
 'J':                                                 value        name  \
 0    TGAACACTGAAGCTTTCTTTGGACAAGGCACCAGACTCACAGTTGTAG  TRBJ1-1*01
 1    CTAACTATGGCTACACCTTCGGTTCGGGGACCAGGTTAACCGTTGTAG  TRBJ1-2*01
 2   CTCTGGAAACACCATATATTTTGGAGAGGGAAGTTGGCTCACTGTT...  TRBJ1-3*01
 3   CAACTAATGAAAAACTGTTTTTTGGCAGTGGAACCCAGCTCTCTGT...  TRBJ1-4*01
 4   TAGCAATCAGCCCCAGCATTTTGGTGATGGGACTCGACTCTCCATC...  TRBJ1-5*01
 5   CTCCTATAATTCACCCCTCCACTTTGGGAATGGGACCAGGCTCACT...  TRBJ1-6*01
 6   CTCCTATAATTCACCCCTCCACTTTGGGAACGGGACCAGGCTCACT...  TRBJ1-6*02
 7   CTCCTACAATGAGCAGTTCTTCGGGCCAGGGACACGGCTCACCGTG...  TRBJ2-1*01
 8   CGAACACCGGGGAGCTGTTTTTTGGAGAAGGCTCTAGGCTGACCGT...  TRBJ2-2*01
 9   AGCACAGATACGCAGTATTTTGGCCCAGGCACCCGGCTGACAGTGCTCG  TRBJ2-3*01
 10  AGCCAAAAACATTCAGTACTTCGGCGCCGGGACCCGGCTCTCAGTG...  TRBJ2-4*01
 11   ACCAAGAGACCCAGTACTTCGGGCCAGGCACGCGGCTCCTGGTGCTCG  TRBJ2-5*01
 12  CTCTGGGGCCAACGTCCTGACTTTCGGGGCCGGCAGCAGGCTGACC...  TRBJ2-6*01
 13    CTCCTACGAGCAGTACTTCGGGCCGGGCACCAGGCTCACGGTCACAG  TRBJ2-7*01

     anchor_index function
 0             17        F
 1             17        F
 2             19        F
 3             20        F
 4             19        F
 5             22        F
 6             22        F
 7             19        F
 8             20        F
 9             18        F
 10            19        F
 11            17        F
 12            22        F
 13            16        F  }

Create a new model

[29]:
hb_mdl_0 = p3.IgorModel.make_default_from_Dataframe_dict(hb_genomic_dict)
[30]:
df_input_test = df_input['nucleotide'].loc[:500]
df_input_test
[30]:
0      GAATGTGAGCACCTTGGAGCTGGGGGACTCGGCCCTTTATCTTTGC...
1      GCTACCAGCTCCCAGACATCTGTGTACTTCTGTGCCACCACGGGTA...
2      ATCCAGCGCACAGAGCAGGGGGACTCGGCCATGTATCTCTGTGCCA...
3      TGCAGCAAGAAGACTCAGCTGCGTATCTCTGCACCAGCAGCCAAGG...
4      GAGTCTGCCAGGCCCTCACATACCTCTCAGTACCTCTGTGCCAGCA...
                             ...
496    CTGGAGCTGGGGGACTCAGCTTTGTATTTCTGTGCCAGCAGCGTAA...
497    CTGCTGTCGGCTGCTCCCTCCCAGACATCTGTGTACTTCTGTGCCA...
498    CTGCTGTCGGCTGCTCCCTCCCAGACATCTGTGTACTTCTGTGCCA...
499    CTGAAGATCCGGTCCACAAAGCTGGAGGACTCAGCCATGTACTTCT...
500    CTCCCTCCCAGACATCTGTGTACTTCTGTGCCAGCAGTTGGGACTA...
Name: nucleotide, Length: 501, dtype: object
[31]:
df_functionality, df_CDR3 = p3.naive_align(df_input_test, hb_mdl_0)
Writing model parms in file  ./igor_aligning_3_y3t_de/dataIGoRfD02VCeWdL_mdldata/models/model_parms.txt
Writing model marginals in file  ./igor_aligning_3_y3t_de/dataIGoRfD02VCeWdL_mdldata/models/model_marginals.txt
/home/olivares/.local/bin/igor -set_wd ./igor_aligning_3_y3t_de -batch dataIGoRfD02VCeWdL -read_seqs ./igor_aligning_3_y3t_de/dataIGoRfD02VCeWdLinput_sequences.csv
/home/olivares/.local/bin/igor -set_wd ./igor_aligning_3_y3t_de -batch dataIGoRfD02VCeWdL -set_genomic  --V ./igor_aligning_3_y3t_de/dataIGoRfD02VCeWdL_mdldata/ref_genome/genomicVs.fasta --D ./igor_aligning_3_y3t_de/dataIGoRfD02VCeWdL_mdldata/ref_genome/genomicDs.fasta --J ./igor_aligning_3_y3t_de/dataIGoRfD02VCeWdL_mdldata/ref_genome/genomicJs.fasta -set_CDR3_anchors  --V ./igor_aligning_3_y3t_de/dataIGoRfD02VCeWdL_mdldata/ref_genome/V_gene_CDR3_anchors.csv --J ./igor_aligning_3_y3t_de/dataIGoRfD02VCeWdL_mdldata/ref_genome/J_gene_CDR3_anchors.csv -align  --all   ---best_align_only true
Batch name set to: dataIGoRfD02VCeWdL_
Working directory set to: "./igor_aligning_3_y3t_de/"
Performing V alignments....
V_gene alignments [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Performing D alignments....
D_gene alignments [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Performing J alignments....
J_gene alignments [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Performing CDR3 sequence extraction ....
[32]:
df_functionality
[32]:
functionality in_frame stop_codon
seq_index
0 False False True
1 True True False
2 True True False
3 False False True
4 True True False
... ... ... ...
496 True True False
497 True True False
498 True True False
499 True True False
500 False False True

501 rows × 3 columns

[33]:
df_CDR3
[33]:
v_anchor j_anchor CDR3nt CDR3aa v_call j_call
seq_index
0 43.0 81.0 TGCGCCAGCAGCATCAAAGGGGCTCACCCCTCCACTTT NaN TRBV5-1*01 TRBJ1-6*02
1 30.0 81.0 TGTGCCACCACGGGTACTAGCGGGGGCCCAAGCCAGAGTACGCAGT... CATTGTSGGPSQSTQYF TRBV10-3*01 TRBJ2-3*01
2 39.0 81.0 TGTGCCAGCAGCTTACGAGTCGGGGGCTATGGCTACACCTTC CASSLRVGGYGYTF TRBV7-9*01 TRBJ1-2*01
3 29.0 81.0 TGCACCAGCAGCCAAGGGGATCGCGGGGGGCCACTACAATGAGCAG... NaN TRBV1*01 TRBJ2-1*01
4 36.0 81.0 TGTGCCAGCAGACGCCTCGGAGGGTTGAACACTGAAGCTTTCTTT CASRRLGGLNTEAFF TRBV25-1*01 TRBJ1-1*01
... ... ... ... ... ... ...
496 30.0 81.0 TGTGCCAGCAGCGTAAGGGGGACTAGCGGGGGGTCCGCTACGCAGT... CASSVRGTSGGSATQYF TRBV9*01 TRBJ2-3*01
497 39.0 81.0 TGTGCCAGCAGTTATATCCCTGTCAGGGAAGAGCAGTACTTC CASSYIPVREEQYF TRBV6-5*01 TRBJ2-7*01
498 39.0 81.0 TGTGCCAGCAGGCAGGGGGCGCGAGGAAACACCATATATTTT CASRQGARGNTIYF TRBV6-5*01 TRBJ1-3*01
499 45.0 81.0 TGTGCCAGCAGTTTCCGCCCCTATGAGCAGTTCTTC CASSFRPYEQFF TRBV2*01 TRBJ2-1*01
500 26.0 81.0 TGTGCCAGCAGTTGGGACTAGCGGGGATTTGAGCTCCTACAATGAG... NaN TRBV6-5*01 TRBJ2-1*01

501 rows × 6 columns

[34]:
df_input_test_no_productive = df_input_test.loc[~df_functionality['functionality'] ]
df_input_test_no_productive
[34]:
0      GAATGTGAGCACCTTGGAGCTGGGGGACTCGGCCCTTTATCTTTGC...
3      TGCAGCAAGAAGACTCAGCTGCGTATCTCTGCACCAGCAGCCAAGG...
8      ATCCAGCGCACAGAGCAGGGGGACTCGGCCATGTATCTCTGTGCCA...
9      GGCTGCTGTCGGCTGCTCCCTCCCAGACATCTGTGTACTTCTGTGC...
12     ACCTACACGCCCTGCAGCCAGAAGACTCAGCCCTGTATCTCTGCGC...
                             ...
487    GAGCTCTCTGGAGCTGGGGGACTCAGCTTTGTATTTCTGTGCCAGA...
488    GAGCACCTTGGAGCTGGGGGACTCGGCCCTTTATCTTTGCGCCAGC...
489    GAGCAACATGAGCCCTGAAGACAGCAGCATATATCTCTGCAGCGTC...
490    CTTGGAGCTGGAGGACTCGGCCCTGTATCTCTGTGCCAGCATCCTG...
500    CTCCCTCCCAGACATCTGTGTACTTCTGTGCCAGCAGTTGGGACTA...
Name: nucleotide, Length: 105, dtype: object
[ ]:

[35]:
hb_mdl_new, df_likelihoods = p3.infer(df_input_test_no_productive, hb_mdl_0, N_iter=10, return_likelihoods=True)
Writing model parms in file  ./igor_inferring_8stxm4z4/dataIGoR1hwusR6KOG_mdldata/models/model_parms.txt
Writing model marginals in file  ./igor_inferring_8stxm4z4/dataIGoR1hwusR6KOG_mdldata/models/model_marginals.txt
/home/olivares/.local/bin/igor -set_wd ./igor_inferring_8stxm4z4 -batch dataIGoR1hwusR6KOG -read_seqs ./igor_inferring_8stxm4z4/dataIGoR1hwusR6KOGinput_sequences.csv
/home/olivares/.local/bin/igor -set_wd ./igor_inferring_8stxm4z4 -batch dataIGoR1hwusR6KOG -set_genomic  --V ./igor_inferring_8stxm4z4/dataIGoR1hwusR6KOG_mdldata/ref_genome/genomicVs.fasta --D ./igor_inferring_8stxm4z4/dataIGoR1hwusR6KOG_mdldata/ref_genome/genomicDs.fasta --J ./igor_inferring_8stxm4z4/dataIGoR1hwusR6KOG_mdldata/ref_genome/genomicJs.fasta -set_CDR3_anchors  --V ./igor_inferring_8stxm4z4/dataIGoR1hwusR6KOG_mdldata/ref_genome/V_gene_CDR3_anchors.csv --J ./igor_inferring_8stxm4z4/dataIGoR1hwusR6KOG_mdldata/ref_genome/J_gene_CDR3_anchors.csv -align  --all
Batch name set to: dataIGoR1hwusR6KOG_
Working directory set to: "./igor_inferring_8stxm4z4/"
Performing V alignments....
V_gene alignments [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Performing D alignments....
D_gene alignments [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Performing J alignments....
J_gene alignments [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Performing CDR3 sequence extraction ....
Batch name set to: dataIGoR1hwusR6KOG_
GeneChoice read
GeneChoice read
GeneChoice read
Deletion read
Deletion read
Deletion read
Deletion read
Insertion read
DinucMarkov read
Insertion read
DinucMarkov read
Working directory set to: "./igor_inferring_8stxm4z4/"
Performing Evaluate/Inference iteration 1
Initializing probability bounds...
== Alignment finished! ==
Writing model parms in file  ./igor_inferring_8stxm4z4/dataIGoR1hwusR6KOG_mdldata//models/model_parms.txt
Writing model marginals in file  ./igor_inferring_8stxm4z4/dataIGoR1hwusR6KOG_mdldata//models/model_marginals.txt
/home/olivares/.local/bin/igor -set_wd ./igor_inferring_8stxm4z4 -batch dataIGoR1hwusR6KOG -set_custom_model ./igor_inferring_8stxm4z4/dataIGoR1hwusR6KOG_mdldata//models/model_parms.txt ./igor_inferring_8stxm4z4/dataIGoR1hwusR6KOG_mdldata//models/model_marginals.txt -set_CDR3_anchors  --V ./igor_inferring_8stxm4z4/dataIGoR1hwusR6KOG_mdldata//ref_genome/V_gene_CDR3_anchors.csv --J ./igor_inferring_8stxm4z4/dataIGoR1hwusR6KOG_mdldata//ref_genome/J_gene_CDR3_anchors.csv -infer   --N_iter 10
Initialization of probability bounds over.
Iteration 1 [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Performing Evaluate/Inference iteration 2
Initializing probability bounds...
Initialization of probability bounds over.
Iteration 2 [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Performing Evaluate/Inference iteration 3
Initializing probability bounds...
Initialization of probability bounds over.
Iteration 3 [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Performing Evaluate/Inference iteration 4
Initializing probability bounds...
Initialization of probability bounds over.
Iteration 4 [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Performing Evaluate/Inference iteration 5
Initializing probability bounds...
Initialization of probability bounds over.
Iteration 5 [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Performing Evaluate/Inference iteration 6
Initializing probability bounds...
Initialization of probability bounds over.
Iteration 6 [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Performing Evaluate/Inference iteration 7
Initializing probability bounds...
Initialization of probability bounds over.
Iteration 7 [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Performing Evaluate/Inference iteration 8
Initializing probability bounds...
Initialization of probability bounds over.
Iteration 8 [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Performing Evaluate/Inference iteration 9
Initializing probability bounds...
Initialization of probability bounds over.
Iteration 9 [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Performing Evaluate/Inference iteration 10
Initializing probability bounds...
Initialization of probability bounds over.
Iteration 10 [||||||||||||||||||||||||||||||||||||||||||||||||||]  Done.
Reading Parms filename from:  ./igor_inferring_8stxm4z4/dataIGoR1hwusR6KOG_inference/final_parms.txt
Reading Marginals filename from:  ./igor_inferring_8stxm4z4/dataIGoR1hwusR6KOG_inference/final_marginals.txt
Reading Parms filename from:  ./igor_inferring_8stxm4z4/dataIGoR1hwusR6KOG_inference/final_parms.txt
Reading Marginals filename from:  ./igor_inferring_8stxm4z4/dataIGoR1hwusR6KOG_inference/final_marginals.txt
[36]:
df_likelihoods
[36]:
iteration mean_log_Likelihood n_seq
0 1 -16.0033 104
1 2 -11.6762 104
2 3 -11.3408 104
3 4 -11.1774 104
4 5 -11.0885 104
5 6 -11.0380 104
6 7 -11.0009 104
7 8 -10.9747 104
8 9 -10.9552 104
9 10 -10.9353 104
[37]:
df_likelihoods['mean_log_Likelihood'].plot()
[37]:
<AxesSubplot:>
_images/Tutorial_55_1.png
[38]:
hb_mdl_new.export_plot_Pconditionals('CP_hb_new')
_images/Tutorial_56_0.png
_images/Tutorial_56_1.png
_images/Tutorial_56_2.png
_images/Tutorial_56_3.png
_images/Tutorial_56_4.png
_images/Tutorial_56_5.png
_images/Tutorial_56_6.png
_images/Tutorial_56_7.png
_images/Tutorial_56_8.png
_images/Tutorial_56_9.png
_images/Tutorial_56_10.png
_images/Tutorial_56_11.png