Title: Machine Learning the Likelihood Data
Presenter: Rafał Masełek
Date: 19.01.2026
Participants:
Wojciech Krzemień (WK)
Konrad Klimaszewski (KK)
Rafał Masełek (RM)
Dima Melnychuk (DM)
I. Demchenko (ID)
Orest Hrycyna (OH)
Michał Obara (MO)
Lech Raczyński (LR)
Michał Mazurek (MM)
Varvara Bazoskaya (VB)
Oleksandr Fedoruk (OF)
Janiyu Zhang (JZ)
Discussions:
WK: I assume that systematic uncertainties are encoded as nuisance parameters. Do you know whether they are always treated as Gaussian? In experiments, there are cases where the distributions are non-Gaussian.
RM: In the input data, most nuisance parameters have Gaussian or Poisson distributions, but in principle, they could follow other distributions. I am not entirely sure.
WK: Is assigning H0 to BSM and H1 to the SM merely a convention?
RM: There is a difference between claiming a discovery and excluding a model - so depending on the goal. There are also different formulations of test statistics that lead to different “branches” of interpretation.
WK: What does the expected data mean in the simplest scenario?
RM: Expected data can mean several things here that should not be confused. First, the JSON file published by ATLAS includes information about the mean yields in all bins under the SM-only hypothesis. This represents the expected (i.e. SM-only) channel data, n, in the likelihood formula. When preparing a data set for training, I vary the BSM signal assuming mu=1, which corresponds to changing the nominal rates, v. In simpler words, I assume a hypothetical scenario where a given BSM model is true, but the experimental results match exactly the SM prediction. Then, I perform profiling with PyHF and eliminate the dependency on nuisance parameters. In the end, I get a number, "expected" likelihood, depending only on the BSM signal size in each bin. In practice, I made a choice to use the total yields, so SM + BSM, which is equivalent, since the SM prediction is known and fixed. The yields and corresponding expected likelihoods constitute the "expected data" set mentioned on slide 22. To calculate median (aka expected) significance, we rely on the Asimov construction described by Cowan et al in arXiv:1007.1727 [physics.data-an] For the same yield values that we used to create the "expected data set", we construct an "Asimov expected data" that replaces the SM-only yields from the JSON file. We calculate profiled likelihood values and add them to the table. Analagous procedure is applied to create "observed" and "observed Asimov" training sets. In the end, the training data consists of yields (inputs) and corresponding likelihoods (targets). Practical reasons make us use negative log-likelihood ratios instead.
WK: On slide 26, what is the ground truth? Is it the full statistical model?
RM: Yes, the comparison is made to the training data.
WK: Regarding the training and inference procedure for your FNN, do you use the standard approach, e.g. dividing the data into training and testing sets?
RM: Yes.
WK: How many models are you planning to cover?
RM: We have already covered several analyses, and I am finalizing more. Later, we want to establish a kind of general method or framework and make it available to the public.
KK: When you define the input for your network, how are nuisance parameters used?
RM: Our method is an approximation to a full statistical model. The surrogate network doesn't perform profiling, but instead regresses already profiled likelihoods. Nuisance parameters do not enter directly.
KK: Let’s say you have a trained model. What is the use-case scenario if I have two experiments measuring the same observables under the same model?
RM: Each model is based on a specific analysis and experimental setup, so you would need to repeat the training separately for each measurement.
KK:What is the main advantage of having these surrogate models?
RM: They are much faster than the full models.
There are minutes attached to this event.
Show them.