Getting started with WFSim

Hello friend. Welcome to the basic tutorial on how to simulate waveforms with the latest wfsim version in strax. Here we’ll just demonstrate the basic functionality. For more in depth analysis stuff, checkout the straxen tutorials for more detailed thing.

[1]:
import numpy as np
import strax
import straxen
import wfsim

import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
from multihist import Histdd, Hist1d
from scipy import stats

import json
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-1-28690857fe6c> in <module>
      1 import numpy as np
      2 import strax
----> 3 import straxen
      4 import wfsim
      5

~/Desktop/python/straxen/straxen/__init__.py in <module>
      1 __version__ = '0.15.8'
      2
----> 3 from utilix import uconfig
      4 from .common import *
      5 # contexts.py below

~/Desktop/python/utilix/utilix/__init__.py in <module>
      2
      3 from utilix.config import Config
----> 4 uconfig = Config()
      5
      6 from utilix.rundb import DB

~/Desktop/python/utilix/utilix/config.py in __init__(self, path)
     19     def __init__(self, path=None):
     20         if not Config.instance:
---> 21             Config.instance = Config.__Config()
     22
     23     def __getattr__(self, name):

~/Desktop/python/utilix/utilix/config.py in __init__(self)
     50
     51             else:
---> 52                 raise FileNotFoundError(f"Could not load a configuration file. "
     53                                         f"You can create one at {home_config}, or set a custom path using\n\n"
     54                                         f"export XENON_CONFIG=path/to/your/config\n")

FileNotFoundError: Could not load a configuration file. You can create one at /Users/petergaemers/.xenon_config, or set a custom path using

export XENON_CONFIG=path/to/your/config

Different simulators

Because we’d like to make simulations we have multiple different ways of making data. They are: * #### RawRecordsFromFaxNT(1T) * #### RawRecordsFromFaxEpix * #### RawRecordsFromFaxOptical * #### RawRecordsFromFaxNveto

The main difference is the way you specify instructions. The NT(1T) simulator depends on either a csv file or a defaul/user specified function to generate instructions. Epix takes input from G4 processed with epix. Optical takes optical input from G4 for photon channels and timings. And Nveto is a small modification to optical to provide nveto data

Setting everything up

First we need to define the right context. For this we made two contexts to help you out. One for 1T and one for nT. By default in your context RawRecordsFromFaxNT will be registered. If you want another one be sure to register it

For 1T do:

[ ]:
st = straxen.contexts.xenon1t_simulation()

Or nT do:

[2]:
st = straxen.contexts.xenonnt_simulation()

Now we need to define a run id. What you give it doesn’t really matter, since strax will look for data and make new if it doesn’t find anything. And this is what you want. Strax will use the run id to get the electron lifetime and pmt gains from a database, and returns placeholders if the run doesn’t exist.

[3]:
run_id = '1'

Defining instructions

For the instructions there are multiple different ways to do it. The simulator has this option called “fax_file”. If it has a value (None by default) the simulator will either read it as a csv or root file. If not it will use some predefined functions to make your events. The number of event you’ll simulate based on the product of the config values “nchunk”,”event_rate” and “chunk_size”. Which you can set as follows:

[4]:
st.set_config(dict(nchunk=1, event_rate=5, chunk_size=10,))

Strax groups data together in chunks based on time (for low level data). nchunk is the number of chunks you want to simulate event_rate is the number of events per seconds, so this effects the amount of spacing between events. Finally chunk_size is the length in seconds of your chunk For the DAQ this is 5 seconds. For simulations you can do whatever you want. It is important to note that Strax will write out data per chunk So when your chunks are small you’ll, among other things, call Strax’ IO functions a lot, giving a substantial overhead. On the other hand, to large chunks will hog all your memory and your kernel might crash. Based on my experimentation setting chunk_size to 500 gives best performance

These are ways you can give instructions * #### Random This is the default where simulator will generate some random instructions for you. * #### Custom For this you will need to overwrite the instruction generator function * #### CSV You can provide a csv file with the instruction (Like the output of nSort) * #### Geant4 For this you’ll need to use epix to do some clustering for you.

Random

I guess this is pretty self explanatory. The simulator has this function called “rand_instructions” which will make something up for you.

[6]:
wfsim.rand_instructions??

Custom

This is some more fun. To do this we’ll write a new function which returns a structured numpy array with the correct dtype.

[7]:
wfsim.instruction_dtype
[7]:
[('event_number', numpy.int32),
 ('type', numpy.int8),
 ('time', numpy.int64),
 ('x', numpy.float32),
 ('y', numpy.float32),
 ('z', numpy.float32),
 ('amp', numpy.int32),
 ('recoil', '<U2')]

Event number is just a lable which peaks are together. type is either 1(S1) or 2 (S2). In the truth higher numbers are also used to refer to different types of afterpulses. time,x,y,z are the time and positions of the signal. Amp is the number of photons or electrons generated, and recoil can be used for different types of recoil (but only Electronic recoil is supported). Recoil is a number according to the NEST convention which will be used to indicate which interaction. For the key look here: https://github.com/XENONnT/WFSim/blob/2c614b0f7b0d7c7adc516f6188e281857e8d7e22/wfsim/core.py#L22

Now, lets say we want some krypton peaks. For this we’ll need to change the default instruction function to include this double decay and use Nestpy to convert energy deposits into a number of photons and electrons.

In this case I’ll use 1 “event” per full decay, that where all the 4’s are coming from.

[8]:
def super_awesome_custom_instruction(c):
    import nestpy
    half_life = 156.94e-9 #Kr intermediate state half-life in ns
    decay_energies = [32.2,9.4] # Decay energies in kev

    n = c['nevents'] = c['event_rate'] * c['chunk_size'] * c['nchunk']
    c['total_time'] = c['chunk_size'] * c['nchunk']

    instructions = np.zeros(4 * n, dtype=wfsim.instruction_dtype)
    instructions['event_number'] = np.digitize(instructions['time'],
         1e9 * np.arange(c['nchunk']) * c['chunk_size']) - 1

    instructions['type'] = np.tile([1, 2], 2 * n)
    instructions['recoil'] = [7 for i in range(4 * n)]

    r = np.sqrt(np.random.uniform(0, 2500, n))
    t = np.random.uniform(-np.pi, np.pi, n)
    instructions['x'] = np.repeat(r * np.cos(t), 4)
    instructions['y'] = np.repeat(r * np.sin(t), 4)
    instructions['z'] = np.repeat(np.random.uniform(-100, 0, n), 4)

    #To get the correct times we'll need to include the 156.94 ns half life of the intermediate state.

    uniform_times = c['total_time'] * (np.arange(n) + 0.5) / n
    delayed_times = uniform_times + np.random.exponential(half_life/np.log(2),len(uniform_times))
    instructions['time'] = np.repeat(list(zip(uniform_times,delayed_times)),2) * 1e9


    # Here we'll define our XENON-like detector
    nc = nestpy.NESTcalc(nestpy.VDetector())
    A = 131.293
    Z = 54.
    density = 2.862  # g/cm^3   #SR1 Value
    drift_field = 82  # V/cm    #SR1 Value
    interaction = nestpy.INTERACTION_TYPE(7)

    energy = np.tile(decay_energies,n)
    quanta = []
    for en in energy:
        y = nc.GetYields(interaction,
                         en,
                         density,
                         drift_field,
                         A,
                         Z,
                         (1, 1))
        quanta.append(nc.GetQuanta(y, density).photons)
        quanta.append(nc.GetQuanta(y, density).electrons)

    instructions['amp'] = quanta

    return instructions

Now here comes the magic line:

[9]:
wfsim.strax_interface.rand_instruction = super_awesome_custom_instruction

This changes the default rand_instruction function in our own super awesome function. So when the simulator will call rand_instruction the code from super_awesome_custom_instruction will be executed

CSV

The format for csv files is the same as the instructions dtype. So on every line specify event_number,type,time ,x,y,z, amp and recoil in that order. Then tell the simulator it exists:

[10]:
st.set_config(dict(fax_file='instructions.csv'))

Ofcourse if you do not have this file it will not work

Geant4

For starters you’ll need to register the Epix plugin. This requires an epix configuration (it’s on private_nt_aux_files), and if you want a start and stop event.

[11]:
st.register(wfsim.strax_interface.RawRecordsFromFaxEpix)
st.set_config(dict(epix_config= path_to_config))

Configuration customization

The simulator using a larger large amount of configuration settings to do it’s magic. Some of them are best left along, like pmt_circuit_load_resistor. Others on the other hand are things you might want change a bit to see how the data will change. Unfortunately currently the full list is spread out over two different places. One is the fax config json which is on github. The other is the option list in strax. Besides those things like pattern maps are hardcoded in load_resource.py. These files are overrideable with the option “fax_config_override”

The strax config is viewable like this and can be changed by st.set_config(dict(option you want=value you want))

[12]:
st.show_config('raw_records')
/Users/petergaemers/Desktop/python/strax/strax/context.py:369: UserWarning: Option channel_map not taken by any registered plugin
  warnings.warn(f"Option {k} not taken by any registered plugin")
[12]:
option default current applies_to help
0 fax_file None /dali/lgrandi/pgaemers/fax_files/Xenon1T_Whole... (raw_records, truth) Directory with fax instructions
1 nSort_path None <OMITTED> (raw_records, truth) If nSort is needed to convert GEANT4 to instru...
2 fax_config_override None <OMITTED> (raw_records, truth) Dictionary with configuration option overrides
3 event_rate 5 5 (raw_records, truth) Average number of events per second
4 chunk_size 5 10 (raw_records, truth) Duration of each chunk in seconds
5 nchunk 4 1 (raw_records, truth) Number of chunks to simulate
6 fax_config https://raw.githubusercontent.com/XENONnT/stra... https://raw.githubusercontent.com/XENONnT/stra... (raw_records, truth)
7 to_pe_file https://raw.githubusercontent.com/XENONnT/stra... https://raw.githubusercontent.com/XENONnT/stra... (raw_records, truth)
8 gain_model (to_pe_per_run, https://raw.githubusercontent.... (to_pe_constant, 0.005) (raw_records, truth) PMT gain model. Specify as (model_type, model_...
9 right_raw_extension 50000 <OMITTED> (raw_records, truth)
10 zle_threshold 0 <OMITTED> (raw_records, truth)
11 detector XENON1T XENONnT (raw_records, truth)
12 timeout 1800 <OMITTED> (raw_records, truth) Terminate processing if any one mailbox receiv...

The config from github can be loaded as:

[13]:
straxen.get_resource('https://raw.githubusercontent.com/XENONnT/'
                               'strax_auxiliary_files/master/fax_files/fax_config_nt.json',fmt='json').keys()
[13]:
dict_keys(['pmt_pulse_time_rounding', 'gauss_noise_sigmas', 'real_noise_file', 'real_noise_sample_mode', 'maximum_recombination_time', 's1_detection_efficiency', 'singlet_lifetime_liquid', 'triplet_lifetime_liquid', 's1_ER_recombination_fraction', 's1_ER_primary_singlet_fraction', 's1_ER_secondary_singlet_fraction', 's1_NR_singlet_fraction', 's1_ER_alpha_singlet_fraction', 'electron_trapping_time', 'gas_drift_velocity_slope', 'lxe_dielectric_constant', 'singlet_lifetime_gas', 'triplet_lifetime_gas', 'singlet_fraction_gas', 'each_pmt_afterpulse_types', 'pmt_afterpulse_types', 'p_double_pe_emision', 's1_model_type', 's1_decay_time', 'pmt_transit_time_spread', 'real_noise_sample_size', 'pressure', 'temperature', 'anode_voltage', 'pmt_rise_time', 'pmt_fall_time', 'photon_area_distribution', 'pe_pulse_ts', 'pe_pulse_ys', 'samples_before_pulse_center', 'samples_after_pulse_center', 'pmt_transit_time_mean', 'led_pulse_length', 'drift_field', 'liquid_density', 'diffusion_constant_liquid', 'electron_extraction_yield', 'gate_to_anode_distance', 'elr_gas_gap_length', 's2_secondary_sc_gain', 'anode_field_domination_distance', 'anode_wire_radius', 's2_fitted_patterns_file', 's2_mean_area_fraction_top', 'rz_position_distortion_map', 'tpc_name', 'run_number', 'channels_top', 'channels_bottom', 'sample_duration', 'digitizer_voltage_range', 'nominal_gain', 'digitizer_bits', 'pmt_circuit_load_resistor', 'external_amplification', 'digitizer_reference_baseline', 's1_width_threshold', 's1_risetime_threshold', 'tight_coincidence_window_left', 'tight_coincidence_window_right', 'tight_coincidence_threshold', 'tpc_length', 'tpc_radius', 'electron_lifetime_liquid', 'drift_velocity_liquid', 'drift_time_gate', 'shrink_data_threshold', 'xy_posrec_preference', 'channels_in_detector', 'n_channels', 'gains', 'gain_sigmas', 'quantum_efficiencies', 'relative_qe_error', 'relative_gain_error', 'pmts', 'noise_file_index', 'noise_file_folder', 'max_intervals', 'samples_to_store_before', 'samples_to_store_after', 'zle_threshold', 'special_thresholds', 's1_ER_recombination_time', 'pmt_ap_modifier', 'photoionization_modifier', 'pmt_ap_t_modifier', 'trigger_window', 'photoelectric_p', 'photoelectric_modifier', 'photoelectric_t_center', 'photoelectric_t_spread'])

Changing things in this guy goes slightly different. In the strax option list there is the option called “fax_config_override”. This takes a dict which will be used to override any values in the json config. So changing the ‘s2_secondary_sc_gain’ is done as:

[14]:
st.set_config(dict(fax_config_override = dict(s2_secondary_sc_gain=23)))

What actually happens?

What happens behind the scenes is that the instructions are first grouped together in chunks. Then we loop over the instructions and the full chunk is returned before starting with the next one.

We use a S1 and S2 class to calculate the arrival times of the photons and the channels which have been hit. Then we’ll hand them over to the Pulse class to calculate the currents in the channels. Finally the currents go to a RawData class where we fake the digitizer response.

S1

For S1s we start with calculating the light yield based on the position of the interaction, and draw the number of photons seen from a Poisson distribution.

Second we calculate the arrival times of the photons. This is based on the scintillation of the xenon atoms. It is dependend on the recombination time, the singlet and triplet fractions.

Finally the channels are calculated. Based on the pattern map we use a interpolation map to get a probability distribution for channels to be hit for a S1 signal based on the position of the interaction, and then we draw from this distribution for every photon.

S2

S2s are slightly more complicated. First we need to drift the electrons up, and while doing so we’ll lose some of them. To get the photon timings, we first need to get the arrival times of the electrons at the gas interface based on a diffusion model. Then we can calculate the photon timings based on a luminescence model for every individual electron. And for the channels we do the same trick with the interpolating map.

Pulse

When we have our lists of channels and timing we can generate actual pulses. First we add a pmt transition time. Then we loop over all channels, calculate the double pe emission probabilities, and add a current in the pmt channel based on the arrival time. This is all stored in a big dictionary. Afterwards this is passed to our fake digitizers which then returns you with your very own pretty data

Getting down to bussiness

Now we have access to all the normal strax data types, and another one called ‘truth’ which holds the simulation instructions. Calling it follows the normal strax convention.

[15]:
st.set_config(dict(fax_file=None))
[17]:
# Remove any previously simulated data, if such exists
# !rm -r strax_data

records = st.get_array(run_id,'records')
# peaks = st.get_array(run_id, ['peaks','peak_basics'])
# data = st.get_df(run_id, 'event_info')

truth = st.get_df(run_id, 'truth')
---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-17-98001a2a1c1a> in <module>
      2 # !rm -r strax_data
      3
----> 4 records = st.get_array(run_id,'records')
      5 # peaks = st.get_array(run_id, ['peaks','peak_basics'])
      6 # data = st.get_df(run_id, 'event_info')

~/Desktop/python/strax/strax/context.py in get_array(self, run_id, targets, save, max_workers, **kwargs)
   1106                 max_workers=max_workers,
   1107                 **kwargs)
-> 1108             results = [x.data for x in source]
   1109
   1110         results = np.concatenate(results)

~/Desktop/python/strax/strax/context.py in <listcomp>(.0)
   1106                 max_workers=max_workers,
   1107                 **kwargs)
-> 1108             results = [x.data for x in source]
   1109
   1110         results = np.concatenate(results)

~/Desktop/python/strax/strax/context.py in get_iter(self, run_id, targets, save, max_workers, time_range, seconds_range, time_within, time_selection, selection_str, keep_columns, _chunk_number, progress_bar, **kwargs)
    900                                          save=save,
    901                                          time_range=time_range,
--> 902                                          chunk_number=_chunk_number)
    903
    904         # Cleanup the temp plugins

~/Desktop/python/strax/strax/context.py in get_components(self, run_id, targets, save, time_range, chunk_number)
    768         for d in plugins.values():
    769             self._set_plugin_config(d, run_id, tolerant=False)
--> 770             d.setup()
    771         return strax.ProcessorComponents(
    772             plugins=plugins,

~/Desktop/python/wfsim/wfsim/strax_interface.py in setup(self)
    541
    542     def setup(self):
--> 543         self.set_config()
    544
    545         c=self.config

~/Desktop/python/wfsim/wfsim/strax_interface.py in set_config(self)
    564     def set_config(self,):
    565         c = self.config
--> 566         c.update(get_resource(c['fax_config'], fmt='json'))
    567         overrides = self.config['fax_config_override']
    568         if overrides is not None:

~/Desktop/python/straxen/straxen/common.py in get_resource(x, fmt)
    155     # 2. load from file
    156     elif os.path.exists(x):
--> 157         return open_resource(x, fmt=fmt)
    158     # 3. load from database
    159     elif straxen.uconfig is not None:

~/Desktop/python/straxen/straxen/common.py in open_resource(file_name, fmt)
    113     elif fmt == 'json':
    114         with open(file_name, mode='r') as f:
--> 115             result = json.load(f)
    116     elif fmt == 'binary':
    117         with open(file_name, mode='rb') as f:

~/anaconda3/envs/strax/lib/python3.7/json/__init__.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    294         cls=cls, object_hook=object_hook,
    295         parse_float=parse_float, parse_int=parse_int,
--> 296         parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
    297
    298

~/anaconda3/envs/strax/lib/python3.7/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    346             parse_int is None and parse_float is None and
    347             parse_constant is None and object_pairs_hook is None and not kw):
--> 348         return _default_decoder.decode(s)
    349     if cls is None:
    350         cls = JSONDecoder

~/anaconda3/envs/strax/lib/python3.7/json/decoder.py in decode(self, s, _w)
    335
    336         """
--> 337         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338         end = _w(s, end).end()
    339         if end != len(s):

~/anaconda3/envs/strax/lib/python3.7/json/decoder.py in raw_decode(self, s, idx)
    353             obj, end = self.scan_once(s, idx)
    354         except StopIteration as err:
--> 355             raise JSONDecodeError("Expecting value", s, err.value) from None
    356         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Now it is time to make pretty plots and see if what we makes actually makes any sense

[17]:
peak_basics = st.get_df(run_id,'peak_basics')
[18]:
peak_basics[:10]
[18]:
time endtime center_time area n_channels max_pmt max_pmt_area range_50p_area range_90p_area area_fraction_top length dt rise_time tight_coincidence type
0 99999510 100000190 100000098 1511.265015 438 425 11 45.498920 108.933258 0.090103 68 10 32.073212 278 1
1 100000190 100000460 100000224 160.055038 436 425 2 41.434231 93.053680 0.088126 27 10 27.741791 19 1
2 100000470 100002090 100001584 1.275000 1 477 1 19.543358 126.469147 0.000000 162 10 12.299422 1 0
3 100002730 100003030 100002842 2.070000 3 36 0 211.971954 235.813843 1.000000 30 10 17.707243 2 0
4 100164950 100166240 100165555 18.500000 17 128 3 202.121216 416.111206 0.173243 129 10 187.104172 3 2
5 100514580 100523780 100518825 11563.562500 491 206 1228 1771.331787 4564.572266 0.087043 184 50 1627.153198 59 2
6 100870810 100871920 100871312 18.850000 19 34 1 144.104645 346.100159 0.701591 111 10 124.979836 8 2
7 100938950 100940020 100939763 20.995001 19 97 5 183.766678 469.409851 0.568231 107 10 316.340454 4 2
8 299999510 300000190 300000098 1476.585693 458 332 9 44.220722 108.634956 0.156120 68 10 30.482187 313 1
9 300000190 300000770 300000223 158.209991 455 179 1 40.539444 93.234810 0.161873 58 10 26.819931 32 1

Matching

To do matching with the truth the easiest way is to write a new strax plugin where you loop over peaks and get the truth arrays where the mean arrival time of the photons are within the time window of the peak So that will look something like this:

[20]:
class MatchedPeaks(strax.LoopPlugin):
    depends_on = ('peak_basics','truth')
    provides = 'matched_peaks'
    __version__ = '0.0.2'
    dtype = [('time',np.int),
             ('endtime',np.int),
             ('area',np.int),
             ('n_photon',np.int)]

    def compute(self, peaks, truth):
        result = np.zeros(len(peaks), self.dtype)

        for ix, p in enumerate(peaks):
            t = truth[(p['time']<truth['t_mean_photon'])&
                      (p['endtime']>truth['t_mean_photon'])]
            r = result[ix]
            r['time'] = p['time']
            r['endtime'] = p['endtime']
            r['area'] = p['area']
            if len(t)==0:
                r['n_photon'] = 0
            elif len(t)>1:
                r['n_photon'] = np.sum(t['n_photon'])
            else:
                r['n_photon'] = t['n_photon']

        return result

Of course this doesn’t actually work. An electron afterpulse can be very spread out leading it to be interpreted as multiple peaks while coming from 1 instruction falling outside of the specified range. It would be very much appreciated if someone wants make a more sturdy selection criteria :)

For example, checkout out peak matching algorithm for WFSim: https://github.com/XENONnT/pema

[21]:
st.register(MatchedPeaks)
[21]:
__main__.MatchedPeaks
[22]:
st.get_array(run_id,'matched_peaks')
[ ]: