The sun at wavelength 171 on 23 April 2018

SDOBenchmark is a machine learning image dataset for the prediction of solar flares.

Solar flares are intense bursts of radiation which can disrupt the power grids of a continent, shut down the GPS system or irradiate people exposed in space.
Developing systems for predicting solar flares would allow us to precisely aim our observation instruments at upcoming events, and eventually enable countermeasures against such worst-case scenarios.

Dataset

In the SDOBenchmark dataset, each sample consists of 4 time steps (→) with 10 images (↓, 1,2...10):

Wavelength 131 of 6 March 2013
1

AIA wavelength 131

HMI magnetogram of 6 March 2013
2

HMI magnetogram

...
Wavelength 171 of 6 March 2013
10

AIA wavelength 171

Wavelength 131 of 7 March 2013

AIA wavelength 131

HMI magnetogram of 7 March 2013

HMI magnetogram

...
Wavelength 171 of 7 March 2013

AIA wavelength 171

Wavelength 131 of 7 March 2013

AIA wavelength 131

HMI magnetogram of 7 March 2013

HMI magnetogram

...
Wavelength 171 of 7 March 2013

AIA wavelength 171

Wavelength 131 of 7 March 2013

AIA wavelength 131

HMI magnetogram of 7 March 2013

HMI magnetogram

...
Wavelength 171 of 7 March 2013

AIA wavelength 171

1e-09
"Nothing will happen"
12h
5h
1.5h
10min
24h prediction period
10 different image types
  • 256x256 px JPEG
  • coming from 2 detectors onboard the SDO satellite: 8 from AIA, 2 from HMI
  • Missing images in many samples
4 in succession 12h, 5h, 1.5h and 10min before prediction period
labeled with an emission
  • the peak emission ("flux") of the sun
  • within 24h
  • 1e-9 = "quiet sun"
  • 1e-3 = largest flare in a decade

The dataset comes with 8'336 training samples and 886 test samples.


Evaluation metric

We use the mean absolute error as a performance metric for this dataset.


Hints and tipps about the data

The data has gaps, but it's reasonably stable

  • The label peak_flux is quite stable according to the manual labeling of experts (see HEK vs GOES)
  • The data went through plenty of verification in code (e.g. here or here)

The data is not stationary

The data is imbalanced

..but the mean absolute error takes care of that for you in this case (the FAQ explains why).
The data analyis gives a rough idea of the imbalance by counting the binned emissions.

FAQ

  • SDO is a satellite mission which is orbiting around earth
  • AIA and HMI are two instruments on SDO that record the activities of the sun

While SDO has been observing the Sun since 2010, we're using data from 2012 onward.

  • Sample complexity: A single sample is a collection of images over four time steps
  • Not many samples
  • Regression problem
In September 2017 a flare erupted of a magnitude larger than any other in the last decade. In the dataset, this flare is represented with a peak flux of 0.001 (i.e. 1e-3). The smallest flare that is part of the data is listed with a peak flux of 1e-7 (while 1e-9 is "nothing happens at all").

The easy answer is that we can't artificially produce or simulate more flares. But there is also a strategic choice behind it:
Imagine you have to train a network to recognize cats. We'd give you a training set of 10 million images, recorded by cameras around our neighbourhood. While 10 million images is a great size for deep learning, you'd soon come to realize that all our images are from the same 100 cats. As a consequence, your model would overfit heavily and just recognize those 100 cats e.g. by some specific individual features. That's why we chose to preselect only a few of the most different images per cat.

The same is true for our images of Active Regions. It would learn to recognize these patches of the sun, and then "look up" in its neurons whether this specific patch will flare or not flare.
(This is also the reason why we have to make sure to have different cats / Active Regions in the training and test sets. Simply selecting images randomly would not be sufficient.)

Due to the logarithmic nature of the label peak_flux, the mean absolute error weighs errors with strong activities much higher than in calm or low activities. While this correlates with our intentions, the data imbalance helps to rebalance the otherwise predominant strong flare predictions to some degree.
Further, while there are plenty of norms around, the mean absolute error is simple and exists out of the box in practically all machine learning frameworks.
Other standard metrics (e.g. mean squared error) have less desireable characteristics for this prediction problem.

Creating a solar flare prediction dataset requires a lot of domain knowledge and time. By providing an already existing dataset we hope to encourage machine learners to push the envelope of solar flare predictions. We therefore put a lot of effort into providing both great accessibility and high scientific quality.

Currently, it is still unknown how well models will perform on this dataset. Our goal is to create a benchmark as simple as possible, yet without sacrificing scientific value.
We will gladly provide a more difficult prediction problem at a later stage.

Except for vertical flipping (upside down), I claim that all data augmentation will alter the underlying physics.
When you flip the images horizontally, the solar rotation, the perspective distortions and the spherical projection remain the same. And there is no known difference between Active Regions in the upper and the lower hemispheres of the sun.
Horizontal flipping will make the Sun rotate in the other direction. It can work though if you process the images individually.
As for random cropping, keep in mind that you might run into the same issue as mentioned in "Why are there so few training samples?".

You can find existing Jupyter notebooks on the SDOBenchmark GitHub repository or in the Kaggle dataset entry.

Current state

... for this dataset

name author mae TSS HSS
Fixed point baseline Roman Bolzern 1.53e-5 0.0 ?
First competitive model Roman Bolzern 3.6e-5 0.45 ?
Feel free to also check out the SDOBenchmark Kaggle Dataset page.

Also worth a mention are tools and helpers for solar flare forecasting.
  • Flarecast.eu, a flare forecasting framework
  • Flarenet, a framework built for helping deep learning researchers to faciliate creating a dataset similar to this one (Paper here)