Thomas Eckert

< writing /


Responsible Data Fitting

17 Feb 2018 | Seattle, WA

The Summer after I graduated from college, I worked as a research assistant in a nuclear fusion laboratory. One of the tasks I was given early on was to fit a curve to data that came from a detector in the fusion reactor. A photomultiplier tube (PMT) captured light emitted from high-energy neutrons passing through a liquid scintillator. This gave us a plot of charge in the PMT with respect to time.

I got to work in MatLab, imported the data and naively began fussing around with coeffients of some polynomial. Unfortunately, I don't have that first fit available because it would have been a great laugh today. I do remember it had 15 parameters and looked something like this:$$f(x) = 0.182 x +9.3 x^2 +0.78 x^3 +0.004 x^4 +1.2 x^5 ...$$

[fig. 1] A decent fit for one peak using linear regression.

It fit the data and in that way it wasn't wrong, but it definitely was not the right way to go about things. My mistake was in not considering why my data looked the way it did in the first place. This meant that when I got the following data, I initially had no idea how to fit it.

[fig. 2] An early attempt at fitting two peaks of neutrons using a linear regression. I found this file labeled "unsureCalib.jpg". It seems I was unsure how to calibrate my fit.

I had to take a step back because my approach to the first fit was not extensible to more complicated data such as this. However, both data sets were caused by the same underlying physical phenomena. This contributed to their shape. With help from my advisor, I worked out how I should use this to make a better fit.

What I was measuring was neutrons from a fusion reaction. These would have a Gaussian distribution in energy as they approached the liquid scintillator. They excited molecules in the scintillator. As these molecules relaxed, they released light. The rate of relaxation was not the same for every excited molecule – it exponentially decayed over time.

This resulted in data that was a convolution of a Gaussian and exponential distribution. This convolution is commonly referred to as an Exponentially Modified Gaussian:$$f(x) = \frac{\lambda}{2} e^{\frac{\lambda}{2}(2\mu +\lambda\sigma^2 -2x)} \text{erfc}\left(\frac{\mu +\lambda\sigma^2 -x}{\sqrt{2} \sigma}\right)$$where \(\mu\) is the mean, \(\sigma^2\) is the variance, and \(\lambda\) is the exponential decay rate.

While this equation may look complicated, it made future fitting endeavors much simpler. Faced with multiple peaks, I was able to linearly combine Exponentially Modified Gaussians and shift the mean. The relaxation time was consistent across all peaks because it depended on the scintillator, not the energy of the neutrons.

[fig. 3] Fitting and removing the background of two peaks of neutrons measured by a scintillator detector.

Later, I had to perform more complicated analyses of the data such as looking at the moments or removing background. Solving these problems were more straightforward because I had the intuition of why the data had the shape it did.

I learned through this that it is important to take time and think about what gave your data its structure before naively diving in with a linear regression.