Understanding Waveform Simulation for XENONnT
Nov 30
[1]:
import strax, straxen, wfsim
import numpy as np
import pandas as pd
*** Detector definition message ***
You are currently using the default XENON10 template detector.
[2]:
straxen.print_versions(modules=('strax', 'straxen', 'cutax', 'wfsim', 'ntauxfiles'))
[2]:
module | version | path | git | |
---|---|---|---|---|
0 | python | 3.9.17 | /opt/XENONnT/anaconda/envs/XENONnT_2023.07.1/b... | None |
1 | strax | 1.5.2 | /opt/XENONnT/anaconda/envs/XENONnT_2023.07.1/l... | None |
2 | straxen | 2.1.1 | /opt/XENONnT/anaconda/envs/XENONnT_2023.07.1/l... | None |
3 | cutax | 1.15.1 | /home/shenyangshi/cutax/cutax | branch:bdt_ms | 277f174 |
4 | wfsim | 1.0.2 | /opt/XENONnT/anaconda/envs/XENONnT_2023.07.1/l... | None |
5 | ntauxfiles | 0.3.0 | /project2/lgrandi/shenyangshi/home/private_nt_... | branch:AmBe_ms_fit | 3555246 |
[4]:
config = straxen.get_resource('fax_config_nt_sr0_dev.json', fmt='json')
config.update({'detector':'XENONnT', 'right_raw_extension':50000})
Part 1, Simulation flow
Conceptual Overview / Between Interface and Core / Pulse Classes / Workflow Walkthrough
Conceptual Overview
What it’s for?
Imagine a monoenergetic source like Kr83m, producing photons and electrons uniformally in the detector, what would the events look like, can the processing software correctly reconstruct them? Or Would you like some fake events in the blinded regions?
The signals from XENON experiments have been extensively studied, so that we can mostly model them from the bottom up approach. The WFSim use those models do construct waveforms as realistic as possible while keeping track of the inputs.
Let’s break down the WFSim bit by bit.
How it works
The WFSim from the outside works like a python iterator, and is composed of four levels of iterators, where the deepest are Pulse
classes (those are not exactly iterator) taking instruction groups and return a list of pseudo pulses. The RawData
take the pseudo pulses and yield
digitized pulses, similar to what physical digitizers would do. The ChunkRawRecords
and RawRecordsFromFaxNT
takes care of chunking and interface with strax respectively.
Instruction groups
However, it is not exactly iterating over instructions, the instructions are just one of the arguments for __init__
. It is designed to turn instruction like the one below into the lowest input structure of the processing software, chunk
containing raw_records
(event
containing pulses
).
[ ]:
[5]:
instructions = inst_array = wfsim.strax_interface.rand_instructions(c={'event_rate':3, 'chunk_size':1, 'nchunk':1, **config})
inst_df = pd.DataFrame(inst_array)
inst_df
WARNING:wfsim.interface:rand_instructions is deprecated, please use wfsim.random_instructions
WARNING:wfsim.interface:g4id is not (fully) filled
WARNING:wfsim.interface:vol_id is not (fully) filled
[5]:
event_number | type | time | x | y | z | amp | recoil | e_dep | g4id | vol_id | local_field | n_excitons | x_pri | y_pri | z_pri | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 166666666 | 32.122543 | -22.621212 | -43.662350 | 818 | 7 | 13.101433 | -1 | -1 | 22.92 | 113 | 32.122543 | -22.621212 | -43.662350 |
1 | 0 | 2 | 166666666 | 32.122543 | -22.621212 | -43.662350 | 145 | 7 | 13.101433 | -1 | -1 | 22.92 | 0 | 32.122543 | -22.621212 | -43.662350 |
2 | 0 | 1 | 500000000 | 7.289836 | -28.672121 | -58.539639 | 3675 | 7 | 57.908840 | -1 | -1 | 22.92 | 670 | 7.289836 | -28.672121 | -58.539639 |
3 | 0 | 2 | 500000000 | 7.289836 | -28.672121 | -58.539639 | 609 | 7 | 57.908840 | -1 | -1 | 22.92 | 0 | 7.289836 | -28.672121 | -58.539639 |
4 | 0 | 1 | 833333333 | 36.476028 | -32.952290 | -136.419952 | 4234 | 7 | 65.917046 | -1 | -1 | 22.92 | 755 | 36.476028 | -32.952290 | -136.419952 |
5 | 0 | 2 | 833333333 | 36.476028 | -32.952290 | -136.419952 | 625 | 7 | 65.917046 | -1 | -1 | 22.92 | 0 | 36.476028 | -32.952290 | -136.419952 |
6 | 1 | 1 | 1166666666 | -13.274544 | -16.667040 | -7.988991 | 1040 | 7 | 16.401899 | -1 | -1 | 22.92 | 159 | -13.274544 | -16.667040 | -7.988991 |
7 | 1 | 2 | 1166666666 | -13.274544 | -16.667040 | -7.988991 | 164 | 7 | 16.401899 | -1 | -1 | 22.92 | 0 | -13.274544 | -16.667040 | -7.988991 |
8 | 1 | 1 | 1500000000 | -6.285732 | 65.843475 | -28.266441 | 2797 | 7 | 43.466827 | -1 | -1 | 22.92 | 486 | -6.285732 | 65.843475 | -28.266441 |
9 | 1 | 2 | 1500000000 | -6.285732 | 65.843475 | -28.266441 | 383 | 7 | 43.466827 | -1 | -1 | 22.92 | 0 | -6.285732 | 65.843475 | -28.266441 |
10 | 1 | 1 | 1833333333 | -45.993202 | -14.296285 | -122.466698 | 346 | 7 | 5.972553 | -1 | -1 | 22.92 | 24 | -45.993202 | -14.296285 | -122.466698 |
11 | 1 | 2 | 1833333333 | -45.993202 | -14.296285 | -122.466698 | 102 | 7 | 5.972553 | -1 | -1 | 22.92 | 0 | -45.993202 | -14.296285 | -122.466698 |
The event_number
are all 0, but don’t worry about it. It is used as event index when using with pax, while having no significant meaning when using with straxen.
The instruction is sorted by the physical time of the signal, that is S2-esque signal are delayed by drift time. And clustered into instruction groups, similar to gap size clustering that split when the gap is larger than right_raw_extension
.
[6]:
# Pre-load some constents from config
v = config['drift_velocity_liquid']
rext = config['right_raw_extension']
# Primary instructions must be sorted by signal time
# int(type) by design S1-esque being odd, S2-esque being even
# thus type%2-1 is 0:S1-esque; -1:S2-esque
# Make a list of clusters of instructions, with gap smaller then rext
inst_time = instructions['time'] + instructions['z'] / v * (instructions['type'] % 2 - 1)
inst_queue = np.argsort(inst_time)
inst_queue = np.split(inst_queue, np.where(np.diff(inst_time[inst_queue]) > rext)[0]+1)
Between Interface and Core
Let’s now examine what’s been passing between the interface and core, specifically between ChunkRawRecord
and ChunkRawRecord.rawdata (rawdata)
. The most important things passed into rawdata
are the instructions. But we also pass the truth buffer (empty array of truth dtype with 10000 slots) into rawdata
. In return, we have (channel, left, right, data)
valuse of digitized pulses, needed to build raw records.
At the same time three properties are interally used as chunking indicators rawdata.left, rawdata.right, radata.source_finished
. Whereas the left and right above returned by calling __next__
on rawdata
are of individual pulses, the rawdata.left, rawdata.right
are of all the pulses originated from an instruction group. So that when we stop and chunk, we can be sure we have finished an instruction group, and the next pulse will come at least right_raw_extension
away.
Sim data is the function where Pulse
class are called.
Convert to raw records
The pulses returned are in the form of (channel, left, right, data)
, not exactly the form of raw records. So one of the two main functions of ChunkRawRecord
is to covert them.
[7]:
# !!! Do not run this cell
pulse_length = right - left + 1
records_needed = int(np.ceil(pulse_length / samples_per_record))
# WARNING baseline and area fields are zeros before finish_results
s = slice(blevel, blevel + records_needed)
record_buffer[s]['channel'] = channel
record_buffer[s]['dt'] = dt
record_buffer[s]['time'] = dt * (left + samples_per_record * np.arange(records_needed))
record_buffer[s]['length'] = [min(pulse_length, samples_per_record * (i+1))
- samples_per_record * i for i in range(records_needed)]
record_buffer[s]['pulse_length'] = pulse_length
record_buffer[s]['record_i'] = np.arange(records_needed)
record_buffer[s]['data'] = np.pad(data,
(0, records_needed * samples_per_record - pulse_length), 'constant').reshape((-1, samples_per_record))
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[7], line 3
1 # !!! Do not run this cell
----> 3 pulse_length = right - left + 1
4 records_needed = int(np.ceil(pulse_length / samples_per_record))
6 # WARNING baseline and area fields are zeros before finish_results
NameError: name 'right' is not defined
Digitize pulse cache
The psuedo pulses are cached in _pulses_cache
, once an instruction group is over, we digitize all the psuedo pulses. This is done by summing up all the pulse current in each channel and convert unit from PE to acd count thus becoming adc_wave
. Three functions add_noise
, add_baseline
and digitizer_saturation
are done afterwards.
[8]:
# !!! Do not run this cell
current_2_adc = config['pmt_circuit_load_resistor'] \
* config['external_amplification'] \
/ (config['digitizer_voltage_range'] / 2 ** (config['digitizer_bits']))
left = np.min([p['left'] for p in _pulses_cache]) - config['trigger_window']
right = np.max([p['right'] for p in _pulses_cache]) + config['trigger_window']
assert right - left < 200000, "Pulse cache too long"
if left % 2 != 0: left -= 1 # Seems like a digizier effect
_raw_data = np.zeros((801,
right - left + 1), dtype=('<i8'))
for ix, _pulse in enumerate(_pulses_cache):
ch = _pulse['channel']
adc_wave = - np.trunc(_pulse['current'] * current_2_adc).astype(int)
_slice = slice(_pulse['left'] - left, _pulse['right'] - left + 1)
_raw_data[ch, _slice] += adc_wave
if config['detector'] == 'XENONnT':
adc_wave_he = adc_wave * int(config['high_energy_deamplification_factor'])
if ch <= config['channels_top'][-1]:
ch_he = config['channels_top_high_energy'][ch]
_raw_data[ch_he, _slice] += adc_wave_he
elif ch <= config['channels_bottom'][-1]:
sum_signal(adc_wave_he,
_pulse['left'] - left,
_pulse['right'] - left + 1,
_raw_data[config['channels_in_detector']['sum_signal']])
# Adding noise, baseline and digitizer saturation
add_noise(data=_raw_data,
channel_mask=_channel_mask,
noise_data=resource.noise_data,
noise_data_length=len(resource.noise_data))
add_baseline(_raw_data, _channel_mask,
config['digitizer_reference_baseline'],)
digitizer_saturation(_raw_data, _channel_mask)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 8
1 # !!! Do not run this cell
4 current_2_adc = config['pmt_circuit_load_resistor'] \
5 * config['external_amplification'] \
6 / (config['digitizer_voltage_range'] / 2 ** (config['digitizer_bits']))
----> 8 left = np.min([p['left'] for p in _pulses_cache]) - config['trigger_window']
9 right = np.max([p['right'] for p in _pulses_cache]) + config['trigger_window']
10 assert right - left < 200000, "Pulse cache too long"
NameError: name '_pulses_cache' is not defined
Z(ero) L(ength) E(ncoding)
Right after digitize pulse cache, we run ZLE, which uses find_intervals_below_threshold
. For each interval, this yields a pulse, similar to what physical digitizers are doing
Pulse Classes
Pulse classes are another monster we will go into with more details in other notebooks. But in general, there’s the parent class Pulse
while different types of signal are children of it. And S2-esque after pulsing all inherite from S2
.
Workflow Walkthrough
[ ]: