Understanding Waveform Simulation for XENONnT

Nov 30

[1]:
import strax, straxen, wfsim
import numpy as np
import pandas as pd
*** Detector definition message ***
You are currently using the default XENON10 template detector.

[2]:
straxen.print_versions(modules=('strax', 'straxen', 'cutax', 'wfsim', 'ntauxfiles'))
[2]:
module version path git
0 python 3.9.17 /opt/XENONnT/anaconda/envs/XENONnT_2023.07.1/b... None
1 strax 1.5.2 /opt/XENONnT/anaconda/envs/XENONnT_2023.07.1/l... None
2 straxen 2.1.1 /opt/XENONnT/anaconda/envs/XENONnT_2023.07.1/l... None
3 cutax 1.15.1 /home/shenyangshi/cutax/cutax branch:bdt_ms | 277f174
4 wfsim 1.0.2 /opt/XENONnT/anaconda/envs/XENONnT_2023.07.1/l... None
5 ntauxfiles 0.3.0 /project2/lgrandi/shenyangshi/home/private_nt_... branch:AmBe_ms_fit | 3555246
[4]:
config = straxen.get_resource('fax_config_nt_sr0_dev.json', fmt='json')
config.update({'detector':'XENONnT', 'right_raw_extension':50000})

Part 1, Simulation flow

Conceptual Overview / Between Interface and Core / Pulse Classes / Workflow Walkthrough

Conceptual Overview

What it’s for?

Imagine a monoenergetic source like Kr83m, producing photons and electrons uniformally in the detector, what would the events look like, can the processing software correctly reconstruct them? Or Would you like some fake events in the blinded regions?

The signals from XENON experiments have been extensively studied, so that we can mostly model them from the bottom up approach. The WFSim use those models do construct waveforms as realistic as possible while keeping track of the inputs.

Let’s break down the WFSim bit by bit.

How it works

The WFSim from the outside works like a python iterator, and is composed of four levels of iterators, where the deepest are Pulse classes (those are not exactly iterator) taking instruction groups and return a list of pseudo pulses. The RawData take the pseudo pulses and yield digitized pulses, similar to what physical digitizers would do. The ChunkRawRecords and RawRecordsFromFaxNT takes care of chunking and interface with strax respectively.

a443cec3599f489bb32e99374350e4da

Instruction groups

However, it is not exactly iterating over instructions, the instructions are just one of the arguments for __init__. It is designed to turn instruction like the one below into the lowest input structure of the processing software, chunk containing raw_records(event containing pulses).

[ ]:

[5]:
instructions = inst_array = wfsim.strax_interface.rand_instructions(c={'event_rate':3, 'chunk_size':1, 'nchunk':1, **config})
inst_df = pd.DataFrame(inst_array)
inst_df
WARNING:wfsim.interface:rand_instructions is deprecated, please use wfsim.random_instructions
WARNING:wfsim.interface:g4id is not (fully) filled
WARNING:wfsim.interface:vol_id is not (fully) filled
[5]:
event_number type time x y z amp recoil e_dep g4id vol_id local_field n_excitons x_pri y_pri z_pri
0 0 1 166666666 32.122543 -22.621212 -43.662350 818 7 13.101433 -1 -1 22.92 113 32.122543 -22.621212 -43.662350
1 0 2 166666666 32.122543 -22.621212 -43.662350 145 7 13.101433 -1 -1 22.92 0 32.122543 -22.621212 -43.662350
2 0 1 500000000 7.289836 -28.672121 -58.539639 3675 7 57.908840 -1 -1 22.92 670 7.289836 -28.672121 -58.539639
3 0 2 500000000 7.289836 -28.672121 -58.539639 609 7 57.908840 -1 -1 22.92 0 7.289836 -28.672121 -58.539639
4 0 1 833333333 36.476028 -32.952290 -136.419952 4234 7 65.917046 -1 -1 22.92 755 36.476028 -32.952290 -136.419952
5 0 2 833333333 36.476028 -32.952290 -136.419952 625 7 65.917046 -1 -1 22.92 0 36.476028 -32.952290 -136.419952
6 1 1 1166666666 -13.274544 -16.667040 -7.988991 1040 7 16.401899 -1 -1 22.92 159 -13.274544 -16.667040 -7.988991
7 1 2 1166666666 -13.274544 -16.667040 -7.988991 164 7 16.401899 -1 -1 22.92 0 -13.274544 -16.667040 -7.988991
8 1 1 1500000000 -6.285732 65.843475 -28.266441 2797 7 43.466827 -1 -1 22.92 486 -6.285732 65.843475 -28.266441
9 1 2 1500000000 -6.285732 65.843475 -28.266441 383 7 43.466827 -1 -1 22.92 0 -6.285732 65.843475 -28.266441
10 1 1 1833333333 -45.993202 -14.296285 -122.466698 346 7 5.972553 -1 -1 22.92 24 -45.993202 -14.296285 -122.466698
11 1 2 1833333333 -45.993202 -14.296285 -122.466698 102 7 5.972553 -1 -1 22.92 0 -45.993202 -14.296285 -122.466698

The event_number are all 0, but don’t worry about it. It is used as event index when using with pax, while having no significant meaning when using with straxen.

The instruction is sorted by the physical time of the signal, that is S2-esque signal are delayed by drift time. And clustered into instruction groups, similar to gap size clustering that split when the gap is larger than right_raw_extension.

[6]:
# Pre-load some constents from config
v = config['drift_velocity_liquid']
rext = config['right_raw_extension']

# Primary instructions must be sorted by signal time
# int(type) by design S1-esque being odd, S2-esque being even
# thus type%2-1 is 0:S1-esque;  -1:S2-esque
# Make a list of clusters of instructions, with gap smaller then rext
inst_time = instructions['time'] + instructions['z']  / v * (instructions['type'] % 2 - 1)
inst_queue = np.argsort(inst_time)
inst_queue = np.split(inst_queue, np.where(np.diff(inst_time[inst_queue]) > rext)[0]+1)

Between Interface and Core

Let’s now examine what’s been passing between the interface and core, specifically between ChunkRawRecord and ChunkRawRecord.rawdata (rawdata). The most important things passed into rawdata are the instructions. But we also pass the truth buffer (empty array of truth dtype with 10000 slots) into rawdata. In return, we have (channel, left, right, data) valuse of digitized pulses, needed to build raw records.

At the same time three properties are interally used as chunking indicators rawdata.left, rawdata.right, radata.source_finished. Whereas the left and right above returned by calling __next__ on rawdata are of individual pulses, the rawdata.left, rawdata.right are of all the pulses originated from an instruction group. So that when we stop and chunk, we can be sure we have finished an instruction group, and the next pulse will come at least right_raw_extension away.

9cf6e478a9a2462a8b4afc3900baf825

Sim data is the function where Pulse class are called.

Convert to raw records

The pulses returned are in the form of (channel, left, right, data), not exactly the form of raw records. So one of the two main functions of ChunkRawRecord is to covert them.

[7]:
# !!! Do not run this cell

pulse_length = right - left + 1
records_needed = int(np.ceil(pulse_length / samples_per_record))

# WARNING baseline and area fields are zeros before finish_results
s = slice(blevel, blevel + records_needed)
record_buffer[s]['channel'] = channel
record_buffer[s]['dt'] = dt
record_buffer[s]['time'] = dt * (left + samples_per_record * np.arange(records_needed))
record_buffer[s]['length'] = [min(pulse_length, samples_per_record * (i+1))
    - samples_per_record * i for i in range(records_needed)]
record_buffer[s]['pulse_length'] = pulse_length
record_buffer[s]['record_i'] = np.arange(records_needed)
record_buffer[s]['data'] = np.pad(data,
    (0, records_needed * samples_per_record - pulse_length), 'constant').reshape((-1, samples_per_record))

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 3
      1 # !!! Do not run this cell
----> 3 pulse_length = right - left + 1
      4 records_needed = int(np.ceil(pulse_length / samples_per_record))
      6 # WARNING baseline and area fields are zeros before finish_results

NameError: name 'right' is not defined

Digitize pulse cache

The psuedo pulses are cached in _pulses_cache, once an instruction group is over, we digitize all the psuedo pulses. This is done by summing up all the pulse current in each channel and convert unit from PE to acd count thus becoming adc_wave. Three functions add_noise, add_baseline and digitizer_saturation are done afterwards.

[8]:
# !!! Do not run this cell


current_2_adc = config['pmt_circuit_load_resistor'] \
    * config['external_amplification'] \
    / (config['digitizer_voltage_range'] / 2 ** (config['digitizer_bits']))

left = np.min([p['left'] for p in _pulses_cache]) - config['trigger_window']
right = np.max([p['right'] for p in _pulses_cache]) + config['trigger_window']
assert right - left < 200000, "Pulse cache too long"

if left % 2 != 0: left -= 1 # Seems like a digizier effect

_raw_data = np.zeros((801,
    right - left + 1), dtype=('<i8'))

for ix, _pulse in enumerate(_pulses_cache):
    ch = _pulse['channel']
    adc_wave = - np.trunc(_pulse['current'] * current_2_adc).astype(int)
    _slice = slice(_pulse['left'] - left, _pulse['right'] - left + 1)

    _raw_data[ch, _slice] += adc_wave

    if config['detector'] == 'XENONnT':
        adc_wave_he = adc_wave * int(config['high_energy_deamplification_factor'])
        if ch <= config['channels_top'][-1]:
            ch_he = config['channels_top_high_energy'][ch]
            _raw_data[ch_he, _slice] += adc_wave_he
        elif ch <= config['channels_bottom'][-1]:
            sum_signal(adc_wave_he,
                _pulse['left'] - left,
                _pulse['right'] - left + 1,
                _raw_data[config['channels_in_detector']['sum_signal']])

# Adding noise, baseline and digitizer saturation
add_noise(data=_raw_data,
               channel_mask=_channel_mask,
               noise_data=resource.noise_data,
               noise_data_length=len(resource.noise_data))
add_baseline(_raw_data, _channel_mask,
    config['digitizer_reference_baseline'],)
digitizer_saturation(_raw_data, _channel_mask)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 8
      1 # !!! Do not run this cell
      4 current_2_adc = config['pmt_circuit_load_resistor'] \
      5     * config['external_amplification'] \
      6     / (config['digitizer_voltage_range'] / 2 ** (config['digitizer_bits']))
----> 8 left = np.min([p['left'] for p in _pulses_cache]) - config['trigger_window']
      9 right = np.max([p['right'] for p in _pulses_cache]) + config['trigger_window']
     10 assert right - left < 200000, "Pulse cache too long"

NameError: name '_pulses_cache' is not defined

Z(ero) L(ength) E(ncoding)

Right after digitize pulse cache, we run ZLE, which uses find_intervals_below_threshold. For each interval, this yields a pulse, similar to what physical digitizers are doing

Pulse Classes

Pulse classes are another monster we will go into with more details in other notebooks. But in general, there’s the parent class Pulse while different types of signal are children of it. And S2-esque after pulsing all inherite from S2.

270d8ff80375411a82e56d0ec303f9c7

Workflow Walkthrough

[ ]: