A New Automatic Tool for CME Detection and Tracking with Machine-learning Techniques (2024)

With the accumulation of coronal mass ejection (CME) observations by coronagraphs, automatic detection and tracking of CMEs has proven to be crucial. The excellent performance of the convolutional neural network in image classification, object detection, and other computer vision tasks motivates us to apply it to CME detection and tracking as well. We developed a new tool for CME Automatic detection and tracking with MachinE Learning (CAMEL) techniques. The system is a three-module pipeline. It is first a supervised image classification problem. We solve it by training a neural network LeNet with training labels obtained from an existing CME catalog. Those images containing CME structures are flagged as CME images. Next, to identify the CME region in each CME-flagged image, we use deep descriptor transforming to localize the common object in an image set. A following step is to apply the graph cut technique to finely tune the detected CME region. To track the CME in an image sequence, the binary images with detected CME pixels are converted from a cartesian to a polar coordinate. A CME event is labeled if it can move in at least two frames and reach the edge of the coronagraph field of view. For each event, a few fundamental parameters are derived. The results of four representative CMEs with various characteristics are presented and compared with those from four existing automatic and manual catalogs. We find that CAMEL can detect more complete and weaker structures and has better performance to catch a CME as early as possible.

1.Introduction

Observations of coronal mass ejections (CMEs) by space missions can be dated back to 1970s. The coronagraphs aboard the Solar and Heliospheric Observatory (SOHO) have made tremendous contributions to CME observations. The Large Angle and Spectrometric Coronagraph Experiment (LASCO; Brueckner et al. 1995) can follow CMEs from 1.1 to about 30 . Since the launch of the Solar TErrestrial RElations Observatory (STEREO) mission, CMEs can be observed from two different perspectives with the coronagraphs COR 1 and COR 2 in the Sun Earth Connection Coronal and Heliospheric Investigation (SECCHI; Howard et al. 2008) instrument package. With the accumulation of coronagraph images, it becomes more and more important to have the capability to automatically detect and track different features, especially CMEs, and build corresponding event catalogs. On one hand, they provide much easier access to data for statistical studies of CME key parameters. On the other hand, with automatic detection, the coronagraph images with CMEs flagged can be used immediately for the purpose of near-real-time space weather predictions.

Different CME catalogs have been developed with the long-running coronagraph observations. They are classified as either manual or automated catalogs. The manual catalog that we mostly use is the CME catalog⁴ created for LASCO observations and maintained at the Coordinated Data Analysis Workshops (CDAW) data center (Yashiro et al. 2004). Event movies of observations by LASCO and other related instruments together with key parameters of each CME are provided. Although the CDAW catalog has been widely adopted, the CME detection and tracking are done by human perception and are obviously subjective and time consuming. Depending on the experience of different operators, we may reach different detection results and physical parameters. When the Sun approaches its activity maximum, the detection and tracking of CMEs require significant man power.

These disadvantages of manual CME catalogs prompt the development of automatic catalogs. Several methods have been devised and deployed for the LASCO and/or SECCHI coronagraph images—for instance, the Solar Eruptive Event Detection System (SEEDS; Olmedo et al. 2008),⁵ Computer-Aided CME Tracking (CACTus; Robbrecht & Berghmans 2004; Robbrecht et al. 2009),⁶ CORonal IMage Process (CORIMP; Byrne et al. 2012),⁷ and Automatic Recognition of Transient Events and Marseilles Inventory from Synoptic maps (ARTEMIS; Boursier et al. 2009).⁸ Thanks to the observations of two STEREO spacecraft, dual-viewpoint CME catalogs have also been developed, e.g., Vourlidas et al. (2017),⁹ for STEREO/COR 2 observations. For all the aforementioned catalogs, CMEs are detected automatically using different traditional segmentation methods.

Nowadays, machine-learning techniques become more and more widely used in many different research fields. It brings a cross-disciplinary research community together between computer science and solar and heliospheric physics. There have been quite a few applications of machine-learning techniques for different solar features and space weather purposes. For example, Dhuri et al. (2019) used machine learning to understand the underlying mechanisms governing flares. Huang et al. (2018) applied a deep learning method to flare forecasting. Camporeale et al. (2017) and Delouille et al. (2018) used the machine-learning technique for the classification of solar wind and the coronal hole, respectively. Very recently, Galvez et al. (2019) even compiled a curated data set for the Solar Dynamics Observatory mission in a format suitable for the booming machine-learning research. A review paper on the challenges of machine learning in space weather nowcasting and forecasting can be found in Camporeale (2019).

In the field of computer vision, machine learning has shown excellent performance in image classification, feature detection, and tracking (Krizhevsky et al. 2012; He et al. 2017; Shelhamer et al. 2017). In view of its great success and our need of fast detection and tracking for CME prediction, we developed and validated our machine-learning technique, CME Automatic detection and tracking with MachinE Learning (CAMEL), for the automatic CME detection and tracking based on the LASCO C2 data. Section 2 describes the detailed mathematical methodology including image classification, CME detection, and CME tracking. In Section 3, we compare our results with those derived from existing SEEDS, CACTus, and CORIMP catalogs for four representative CMEs with different angular widths, velocities, and brightnesses. The method is developed and tested using the observations around the solar maximum, during which the CME occurrence rate is much higher than that around the solar minimum. The large number of CMEs around the solar maximum poses a challenge in CME detection and tracking. The last section is dedicated to conclusions and discussions.

2.Methodology

Our goal is to detect and track pixel-level CME regions in a set of white-light coronagraph images by using machine-learning methods. To this end, we design a three-module algorithm pipeline. In the first module, we use a trained convolutional neural network (CNN) to classify whether a coronagraph image contains CME structures or not. The images with CME structures are flagged as CME images. On the contrary, the rest images are flagged as non-CME images. The second module is to detect pixel-level CME regions in CME-flagged images by using an unsupervised common object co-localization method, and the detected CME regions are future-refined using the graph cut method in computer vision. The final module serves to track a CME in running-difference images.

2.1.Preprocessing

Before we going through the pipeline, all coronagraph data are processed in the following way: the downloaded level 0.5 LASCO C2 FITS files are read with from the Solar Software (SSW) and are then processed to level 1 data using from the SSW. The processing consists the calibrations for the dark current, flat field, stray light, distortion, vignetting, photometry, time, and position correction. After the processing, the solar north has been rotated to the image north. For CME detection and tracking, we use the running-difference images as inputs to the three-module algorithm pipeline.

As a preprocessing step, all input LASCO C2 images with a 1024×1024 resolution are first down-sampled to a 512×512 resolution and aligned according to coordinates of solar centers. Then, all down-sampled images are passed through a noise filter to suppress some sharp noise features. In our method, we use a normalized box filter with a sliding window that has the size of 3×3. Normalized box filtering is a basic linear image filtering, which computes the average value of the surrounding pixels. Then, the running-difference images are computed simply by using the following:

where is the running-difference image, which equals the current image, , minus the previous image, . For some of the LASCO images containing missing blocks, we create a missing-block mask according to the previous image: if the value of a pixel in the previous image is zero, then the same pixel of the running-difference image also has a zero value. The final running-difference image is multiplied by the missing-block mask.

For the first module of our algorithm pipeline, we need to train a CNN for image classification and rough localization. From the perspective of computational efficiency, our CNN takes 112×112 resolution images as the input. After rough localization, the down-sampled images of the CME region will be refined by the graph cut method in the original 512×512 running-difference images.

2.2.Image Classification

Detecting and locating instances of a certain class in an image can be seen as a basic problem in computer vision. Each certain class has its own special features, which can be manually or automatically extracted by a supervised machine-learning method. Recently, CNNs have shown excellent performance in image classification, object detection, and some other computer vision tasks. The multi-scale and sliding window approach can help CNNs learn more robust features of a certain class without any human effort and prior knowledge. Before detecting the CME events in each image, we need a CNN model to tell us if there are CME structures in every input LASCO C2 running-difference image first. To train such a CNN in a supervised fashion, we need to first collect images and training labels. As a preprocessing step, all input running-difference images with a 1024×1024 resolution are down-sampled to a 112×112 resolution. The training set of data are 10-month LASCO C2 images from 2011 January to October around the solar maximum whose category label is known. Both image categories flagged with or without CMEs are obtained from the CDAW catalog. The first step of CME detection can be treated as a supervised image classification problem by assigning a given white-light coronagraph image to the CME-detected category or the CME-not-detected category. As a second step, the middle-level features extracted from the well-trained CNN can be used for detecting the CME regions in Section 2.3.

The CNN architecture we used is called LeNet-5 from Lecun et al. (1998), which has two convolution layers, two down-sampling layers, and two fully connected layers. This classical architecture can be divided into two modules: a feature extractor module and a classifier module. The feature extractor module consists of convolution layers, nonlinearity activation layers and down-sampling layers and the last two fully connected layers form the classification module. A convolution layer can be seen as a local-connected network in which each hidden unit will connect to only a small contiguous region of the input and obtain different feature activation values at each location. The convolution kernel slides from left to right and from top to bottom on the input feature map of the upper layer. Each time it slides to a position, the convolution kernel is multiplied and summed with the pixel value of input feature map block at that position, and the summation result is passed through the activation function to obtain an output pixel value of the feature map of the layer. The jth feature map of layer l is obtained as follows:

where N denotes the number of feature maps of layer l−1, represents convolution kernels, and is a bias term. f represents the nonlinearity activation function. We use rectified linear units (ReLUs), which can make the CNN training several times faster. Down-sampling layers could help to enlarge the receptive field and aggregate features at various locations. This layer treats each feature map separately. It computes the max value over a neighborhood in each feature map. The neighborhoods are stepped by a stride whose size is two.

After convolution layers and down-sampling layers, the feature map of each image is down-sampled to a 25×25 resolution. Then, the high-level semantic knowledge can be obtained via fully connected layers and output the final CME occurrence probability. The original LeNet architecture is designed for handwritten digit number recognition, and the output layer outputs 10 units, which represent the probability of each class (0–9). We modified the output layer and output two units, which represent the probability of the CME occurrence. To obtain the probability, we use two-way softmax function to produce a distribution over the two-class labels:

where x_CME and x_non-CME are both output units from the final output layer. An image with the output probability value greater than 0.5 can be seen as a CME-detected image. Figure 1 shows the LeNet architecture we use.

2.3.CME Detection

2.3.1.CME Region Co-localization

After the classification, the next step is to segment the CME regions in every CME-detected image. However, due to the lack of a set of images with known labeled CME regions, we need to solve the problem in an unsupervised fashion. After training the above LeNet neural network, we can extract convolutional feature maps from the last convolution layer of the CNN model. Each feature map is considered as a down-sample of the input and contains high-level semantic information. To mine the hidden information for segmenting the CME regions, we use Deep Descriptor Transforming (DDT; Wei et al. 2019), an unsupervised image co-localization method, which utilizes the Principal Component Analysis (PCA; Pearson 1901) to analyze CNN feature maps and localize category-consistent regions of each image in an image set. The extracted feature maps can be considered to have 25×25 cells and each cell contains one d-dimensional feature vector. PCA uses an orthogonal transformation to convert d-dimensional correlated variables into a set of linearly uncorrelated variables that called principal components by the eigendecomposition of the covariance matrix. The covariance matrix of the input data is calculated by

where K=h×w×N. N denotes the number of input feature maps with a h×w resolution and represents a C dimension CNN feature vector of image n at pixel position (i, j). After the eigendecomposition, we can get the eigenvectors ,..., of the covariance matrix that correspond to the sorted eigenvalues in a descending order. We take the first eigenvector that corresponds to the largest eigenvalue as the main projection direction. For a particular position, (i, j), of a CNN feature vector of image n, its main principal feature is calculated as follows:

In this way, we reduce its feature dimension from C to 1 and the feature value after transformation can be treated as the appearance possibility of the common object at each pixel position. All pixel locations of f_{(i, j)} form into an indicator matrix whose dimensions are h×w:

The pipeline of image co-localization can be found in Figure 2. The image sequence of CME evolution we obtain from the trained CNN model consists a set of CME images, which are directly processed through the DDT algorithm for CME region co-localization. The final output of the image co-localization is a set of CME region mask images with the same resolution as that of the input feature maps. For convenience, we resize the output in the same resolution by the nearest interpolation.

A New Automatic Tool for CME Detection and Tracking with Machine-learning Techniques (26)

2.3.2.CME Region Refinement

The outputs of the pipeline in Figure 2 are just images with roughly detected CME regions. To obtain images with CME region finely tuned, we use the graph cut method (Boykov et al. 2001) in computer vision for segmented region smoothing. Obviously, the indicator matrix can only roughly tell the probabilities that a pixel position belongs to a CME or a non-CME class. However, class consistency may arise among neighboring pixels. To address this problem, a framework of energy minimization is naturally formulated. In this framework, one seeks the labeling l of image pixels that minimizes the energy:

where λ_s and λ_d are nonnegative constants to balance the influences of each term. E_smooth(l) measures the class consistency of l among boundary pixels according to their neighborhood intensity difference, while E_data(l) measures the disagreement between l and the predicted data, which is optimized mainly based on the probability calculated in Section 2.2. We set E_smooth(l) and E_data(l) as follows:

where pr(l_p) denotes the probability of a pixel position p assigned as a class CME and I_p denotes the intensity on position i. The graph cut optimization can be then employed to efficiently solve the energy minimization problem. The graph cut algorithm generates a related graph of the labeling problem and the minimum cut set is used to solve the minimum cut set of that graph. Then, the minimum solution of the energy function is obtained. More details on the graph cut algorithm can be found in Boykov et al. (2001). Here, we show one example of the comparison results before and after optimization in Figure 3.

A New Automatic Tool for CME Detection and Tracking with Machine-learning Techniques (30)

2.4.CME Tracking

After CME region detection and refinement, we can only obtain CME regions in each frame independently. Furthermore, there could be more than one CME in the image sequence of CME evolution that we obtained in Sections 2.2 and 2.3. To track a CME in a series of running-difference images, we define some rules to identify a CME, which are similar to Olmedo et al. (2008). First, a CME must be seen to move outward in at least two running-difference images. Second, the maximal height of a CME must reach out of the C2 field of view (FOV). Each tracked CME that does not satisfy the above two rules is abandoned. Moreover, given a set of images with the pixel-level CME region annotated, we aim to recognize each CME in that image set and analyze its key parameters, e.g., the central position angle (CPA), angular width, median, and maximal velocities.

To better track the movement of a CME, all images with CME regions annotated are transformed to a polar coordinate system with a 360×360 resolution. The height range at each angle position is from 2.2 to 6.2 solar radii. As a demonstration, we use the CME event that occurred on 2012 February 4. Figure 4(a) shows the input of our tracking module, which is an image sequence of CME evolution for a given time range. All images are ordered according to the observation time. The original CME images are shown in gray and the images with detected CME pixels are indicated in pink. The top panel of Figure 4(b) presents the results of coordinate transform for the frame at 19:35UT as an example. Actually, we apply the coordinate transformation to each image in the sequence and compute the maximal height of the CME region mask at each position angle in the given time range. All position angles with a maximal height less than half of the FOV are removed. The rest position angles are merged together according to the position connectivity. Again, using the frame at 19:35UT as an illustration, we show the cleaned result in the bottom panel of Figure 4(b) from where we determine the start position angle (start PA) and the end position angle (end PA) of each CME. The central position angle (PA) is the average of the start PA and the end PA. And the angular width is derived as the difference of these two PAs.

A New Automatic Tool for CME Detection and Tracking with Machine-learning Techniques (31)

The height–time diagram at each position angle between the derived start and end position angles can be retrieved. Subsequently, the corresponding velocity can be obtained. In Figure 4(c), as a representative case, we plot the CME height evolution at the position angle with the maximal velocity. To determine the start and end time of each CME, we find all time segments with an increasing CME height in the height–time diagram. Next, we check if the CME in each time segment meets the defined two criteria: existing in at least two frames and reaching beyond the LASCO C2 FOV. The time segment that does not satisfy the aforementioned criteria is discarded. For the case in Figure 4(c), the derived final time range of the tracked CME is indicated by the blue dashed line. To derive representative CME velocities, we compute the median and maximal values of the CME velocity distribution at all derived position angle. The velocity at each position angle is calculated by a linear fit to the data points in the obtained time segment. As in the CACTus catalog (Robbrecht et al. 2009), we use the median velocity as an overall velocity of the detected CME. Meanwhile, in order to compare with the velocity in the CDAW catalog, we also calculate the maximal velocity. In summary, for a tracked CME, we offer the following five fundamental parameters: the first appearance time in the LASCO C2 FOV (T_start), CPA, angular width (AW), median, and maximal velocities (V_med and V_max). These fundamental parameters are used for the comparison among different detection and tracking techniques.

3.Results and Comparisons

By applying the method described above, we detected and tracked the CMEs from the LASCO C2 running-difference images in the time range from 2011 November to 2012 April. To evaluate the performance of our machine-learning techniques, we compare our results to those with some other existing automatic detection and tracking techniques—namely, CACTus, CORIMP, and SEEDS. Table 1 presents the comparison results of a few fundamental CME parameters: the T_start, CPA, AW, and velocity between our CAMEL technique, CACTus, CORIMP, SEEDS, and CDAW. Note that in Table 1, we include the available median and maximal velocities derived with CAMEL, CACTus, and CORIMP and include only the maximal velocity derived with CDAW. The velocity derived with SEEDS is calculated from the CME height at the half-max-lead (Olmedo et al. 2008).

Table 1.Fundamental Parameters of Four Representative CME Events Observed by LASCO C2

Method	Parameters	2012 Jan 1	2011 Nov 22	2012 Jan 18	2012 Mar 21
CAMEL	T_start	01:25	21:17	12:22	07:35
	CPA	41	313	163	halo
	AW	16	100	182	360
	V_med	584	440	256	633
	V_max	680	524	327	1042

CACTus	T_start	01:25	21:28	12:24	07:48
	CPA	42	291	180	336
	AW	18	168	98	118
	V_med	719	512	299	868
	V_max	893	702	363	1838

CORIMP	T_start	⋯	⋯	14:00	08:00
	CPA	⋯	⋯	162	255
	AW	⋯	⋯	12	3
	V_med	⋯	⋯	241	555
	V_max	⋯	⋯	⋯	921

SEEDS	T_start	01:36	21:17	14:00	08:00
	CPA	61	324	177	321
	AW	53	98	79	74
	Vel	189	325	296	812

CDAW	T_start	01:25	20:57	12:24	07:36
	CPA	40	291	172	halo
	AW	23	157	203	360
	V_max	801	668	267	1178

Note.T_start (UT): the time of first appearance in the C2 FOV; CPA (degree): central position angle that is measured counterclockwise from the north of the Sun; AW (degree): angular width; and CME velocity (km s⁻¹), where SEEDS shows the half-max-lead velocity, CDAW shows the maximal velocity (V_med), and other methods show the median and maximal velocities (V_med and V_max).

Download table as: ASCIITypeset image

According to the morphological characteristics of CMEs resulted from the CAMEL detection, we have chosen four representative CME events with different AWs. They are a jetlike narrow CME on 2012 January 1 with an AW of 16°, a limb CME on 2011 November 22 with an AW of 100°, a partial halo CME on 2012 January 18 with an AW of 182°, and a full halo CME on 2012 March 21. On the other hand, the selected four CMEs span a large range of CME velocities from about 300 km s⁻¹ to more than 1000 km s⁻¹, and cover structures with different levels of brightnesses. Because CAMEL detects and tracks features that moves outward, at that moment, it does not differentiate whether a structure belongs to a CME or a CME-driven shock automatically. Therefore, for the full halo CME on 2011 November 22, we actually detect and track the CME-driven shock. Kwon et al. (2014) claimed that the halo CME they studied is primarily the projection of the bubble-shaped shock wave but not the underlying CME flux rope. Nevertheless, it nicely proves that our CAMEL method is able to detect and track not only bright but also weak signals, e.g., in this case, a fainter shock wave or in other cases, weak CMEs. From the comparison of the fundamental CME parameters, we can see that the results with our CAMEL method for these four CME events are, in general, more similar to the CDAW manual measurements. Systematically, we find that the velocity from CACTus is always the largest, and the velocity from SEEDS is always the smallest. This is because the CME detection with CACTus is based on the J-map, which has the capability to track weak signals at the CME leading edge, whereas the velocity derived with SEEDS is calculated from the CME height at the half-max-lead. As the CME velocity generally increases from inside to the leading front (Feng et al. 2015; Ying et al. 2019), CACTus is expected to yield relatively higher velocities and SEEDS produces relatively lower velocities.

In the following subsections, we individually present the observations of the selected four CME events and show the comparison of the detected CME region with CAMEL, CACTus, CORIMP, and SEEDS. Note that CME parameters in Table 1 and the corresponding frames in Figures 5–8 derived with CACTus, CORIMP, and SEEDS are adopted from their websites, as introduced in Section 1. Because the LASCO C2 data we use are processed to level 1 with the observation time corrected, there might be up to a two-minute time difference for a given frame between our results and the others. For the C2 observations on 2012 January 1 and 18, besides the CME we are interested in, we detect more than one CMEs in the same frame. In such cases, the other CMEs are not marked in pink and each CME is grouped separately into an image sequence.

A New Automatic Tool for CME Detection and Tracking with Machine-learning Techniques (32)

3.1.Jetlike Narrow CME on 2012 January 1

The CME under investigation is jetlike and has a very narrow AW in the northeast quadrant of the C2 image. The detection and tracking results with CAMEL, CACTus, CORIMP, and SEEDS are displayed from top to bottom in Figure 5. CAMEL detects its first appearance in C2 at 01:25 UT. It is consistent with the time given by CACTus and CDAW. SEEDS detects the first appearance about 11 minutes later in the next C2 frame. This CME is not included in the CORIMP catalog, although CORIMP recognizes it with the identified leading edge much lower than the true position. Apparently, it registers another CME, as marked by the blue lines in the third row of Figure 5. Concerning the CPA and AW, CAMEL, CACTus, and CDAW yield consistent results with the CPA at about 40° and the AW at about 20°. SEEDS has a somewhat higher CPA of 61° and wider AW of 53°. For the velocity, as described before CACTus and SEEDS have the highest and lowest values respectively, CAMEL and CDAW sit in between. When we inspect the detected CME region obtained with CAMEL in pink, CORIMP in yellow and SEEDS in red, we find that the detection with CAMEL is closest to our visual perception. CACTus does not provide the detected pixel-level positions of the CME front in its catalog. As shown in the second row, only the angular range of the CME is indicated by the white solid lines.

3.2.Limb CME on 2011 November 22

Limb CMEs probably belong to the most common category of CMEs. Usually different detection and tracking methods work better for this category. Figure 6 presents some selected frames of the detection and tracking results with CAMEL, CACTus, CORIMP, and SEEDS from top to bottom. As demonstrated in Figure 6, all four automatic tools produce reasonable results. The issue with CORIMP is that although the CME we are interested in is correctly identified in the C2 FOV and its boundary appears to be well defined, it is not registered in the catalog. On the contrary, the other much weaker CME is included as indicated by the blue lines in the third row of Figure 6. The first appearance time of the CME obtained by all the automatic methods is later than the time inspected by eye from the manual CDAW catalog. The CPA has a relatively small range from 291° to 324° with an average of 305°, while the AW has a much larger range from 98° to 168° with an average of 131°. The reason for this wide AW range may be partially due to the influence of the deflected streamers adjacent to the CME. It seems to be a common issue for all automatic tools to precisely separate streamers from the CME itself. As can be seen in Figure 6, the deflected streamers in the north of the CME are all falsely included as part of the CME. For CACTus, the deflected streamers in the south are also included and make its AW the largest. Concerning the velocities, as expected, CACTus has the highest value, SEEDS has the lowest value, and CAMEL and CDAW sit in between.

A New Automatic Tool for CME Detection and Tracking with Machine-learning Techniques (33)

3.3.Partial Halo CME on 2012 January 18

When we get to halo CMEs, an important task is to identify a complete CME structure. Some frames illustrating the detection and tracking results with CAMEL, CACTus, CORIMP, and SEEDS are presented in Figure 7 from top to bottom. It reveals that CAMEL has a better performance to detect the partial CME structure as complete as possible, thanks to its capability of weak-signal detection. Both CACTus and SEEDS can only detect the northern bright segment of the CME, and fail to identify the weaker eastern part of the CME. CORIMP only outputs a very narrow region of the CME with the AW of 12°. The AWs from the other four methods, CAMEL, CACTus, SEEDS, and CDAW are 182°, 98°, 72°, and 203°, respectively. Consistently, all methods yield the CPA with a narrow distribution from 162° to 180°. Concerning the first appearance time in LASCO C2, automatic CAMEL and CACTus identify the CME at the same time as the manual CDAW. CORIMP and SEEDS find the CME about 1.5 hr later. The CME under investigation is a rather slow CME with a velocity in the range of 241–363 km s⁻¹.

A New Automatic Tool for CME Detection and Tracking with Machine-learning Techniques (34)

3.4.Full Halo CME on 2012 March 21

Full halo CMEs with 360° AW belong to a category of CMEs, which are mostly Earth effective. If the CME is Earth-directed and carries southward magnetic fields, it usually causes a geomagnetic storm. Furthermore, if the CME is super-Alfvénic, the driven shock wave may enhance the level of the geomagnetic storm. Therefore, the automatic detection and tracking of full halo CMEs are of particular importance for the purpose of space weather prediction. Note that the identified full halo CME might actually be a driven shock wave as we mentioned at the beginning of this section.

Figure 8 compares the detection and tracking results of the CME on 2012 March 21 obtained with CAMEL, CACTus, CORIMP, and SEEDS from top to bottom. Among the four techniques, this event is first detected with our CAMEL method at 07:35UT, which is the same as CDAW. CACTus, CORIMP, and SEEDS identify the CME at 07:48UT, 08:00UT, and 08:00UT, respectively. Concerning the AW, only CAMEL is able to detect the whole CME (actually shock) structure. All the other three methods fail to identify it, especially the fainter shock wave segment on the left side of the deflected streamers. The corresponding AWs are 118°, 3°, and 74° for CACTus, CORIMP, and SEEDS, respectively. In fact, SEEDS has two different entries for this CME in its catalog. In Table 1, only the entry with a larger AW is included. The velocity increases from 812 to 1838 km s⁻¹, as obtained by SEEDS and CACTus. And CAMEL has a velocity of 1042 km s⁻¹, which is closest to the value of 1178 km s⁻¹ derived by CDAW.

A New Automatic Tool for CME Detection and Tracking with Machine-learning Techniques (35)

4.Conclusions and Discussions

We implemented a novel system that can detect and track pixel-level CME regions in a set of white-light coronagraphs by using machine-learning methods. The system is made of a three-module algorithm pipeline. The input of the system is a sequence of running-difference images observed by LASCO C2 and processed to level 1. In the first module, we use a well-trained supervised CNN to help classify whether a LASCO C2 image contains CME structures or not. The second module is to detect the pixel-level CME regions in the CME-flagged images by using an unsupervised PCA co-localization method and to use graph cut method to refine the detection. The final module is to track each individual CME in the continuous CME image sequence with detected CME pixels produced by the first and second modules, respectively. The pipeline is run on a simple personal computer (PC) with a six-core i7 8700K processor, a NVIDIA gtx 1060 GPU graphic card with 6 GB frame buffer, and a computer memory of 8 GB. For the training of the 10-month LASCO C2 data for image classification, it takes us about 10 hours of computation on this PC. Training is a bit time consuming, but we need to only run it once after designing a proper training algorithm. The detection and tracking after the training is very fast. To process one-day LASCO C2 images of about 120 frames with a 512×512 resolution, it takes about 5 to 10 minutes to complete the task of detection and tracking.

To evaluate the performance of our CAMEL technique, we select a few representative CME events that cover a large range of AWs, velocities and brightnesses and compare our results with those obtained from a few existing CME catalogs. The available catalogs include the manual CDAW catalog and the automatic CACTus, SEEDS, and CORIMP catalogs. We compiled a few fundamental parameters of CMEs for comparison, i.e., the first appearance time in C2, central position angles, AWs, and velocities. Moreover, the detection and tracking results are also compared in a frame-by-frame manner. A couple of advantages we have seen for CAMEL are:

1.
CAMEL is more qualified to detect and track weak signals to output more complete CME structures. Therefore, we can derive more precise CME morphological parameters, e.g., CPA and AW.
2.
CAMEL has a better performance to catch the appearance time of CMEs in LASCO C2 as early as possible. The more precise information of CME morphology and timing can eventually be used to derive more accurate CME kinematics.
3.
CAMEL records pixel-level positions of CMEs. Thus, we are detecting and tracking not only the CME leading front, but also any other interested structure in the detected CME regions, e.g., the CME core in Figure 6.
4.
CAMEL has a by-product of the image classification. Categorizing CME-flagged images is very useful for distributing CME data timely and effectively to different space weather prediction centers to predict CME arrivals.
5.
CAMEL is computationally cheap and fast. After a proper training, the detection and tracking of the CME in a single LASCO C2 image only takes a few seconds on a normal PC.

This paper aims to introduce the detailed mathematical method of our newly developed tool for the automatic detection and tracking of CMEs with a novel machine-learning technique and evaluate its performance. Based on the automatic detection and tracking results, a chain of automatic processes can be carried out in the future. We can automatically obtain the CME periphery from the pixel-level CME positions. Further three-dimensional reconstructions of a CME based on the identified CME from single or multi-perspective observations can be done (e.g., Feng et al. 2012, 2013; Lu et al. 2017). The 3D parameters derived from three-dimensional reconstructions can be further used as inputs to MHD simulations, e.g., the ENLIL simulations of the CME propagation in the interplanetary space for the prediction of CME arrivals (Odstrcil 2003).

The CDAW CME catalog is generated and maintained at the CDAW Data Center by NASA and The Catholic University of America in cooperation with the Naval Research Laboratory. SOHO is a project of international cooperation between ESA and NASA. This paper uses data from the CACTus CME catalog, generated and maintained by the SIDC at the Royal Observatory of Belgium. We also acknowledge using the CME catalogs of SEEDS and CORIMP. This work is supported by NSFC grants (11522328, 11473070, 11427803, and U1731241), by CAS Strategic Pioneer Program on Space Science; grant Nos. XDA15010600, XDA15052200, XDA15320103, and XDA15320301; and the national key research and development program 2018YFA0404202.