Physics 728
Radio Astronomy:  Lecture #7
Prof. Dale E. Gary
NJIT

Image Reconstruction

Image Restoration (Deconvolution)

Once we have the dirty image, ID, we need to do more work to obtain a high quality image.  There are two main approaches, and both are nonlinear, so are not easy to treat mathematically.  The first is called CLEAN, from the term used by its inventor Hogbom (1974, "Aperture Synthesis with a Non-Regular Distribution of Interferometer Baselines," Astronomy & Astrophysics Supplement Series 15, 417.).  The second is the Maximum Entropy Method (MEM).
Before detailing these methods, we should ask what range of spatial scales we can expect to recover in an image.  The smallest source (hence related to pixel size) is given by Dl = 1/2umax, Dm = 1/2vmax, while the largest source (hence related to the map size) is given by NlDl = 1/umin, NmDm = 1/vmin.  A rule of thumb for an appropriate map pixel size is Dl = 1/3umax, Dm = 1/3vmax in order to have three pixels per smallest fringe.  There may be reasons to specify a larger map, for example to map out to the edge of the primary beam, and there may be other limitations to the field of view, such as due to bandwidth limitations (as in your homework problem) or violation of the flat plane of the sky assumption.

After gridding, our resampled visibility array has some cells populated and other cells empty (those that do not correspond to a u,v point measured with the array).  The Fourier Transform inversion is not unique, because these unmeasured points could have any value without violating the data constraints.  One particular solution, where all of the missing measurements are set to zero, is called the principle solution, and of course this is the one that corresponds to the dirty image we have been discussing.  However, this is not really a reasonable solution -- one would expect a more continuous distribution of visibilities.  The goal of image restoration techniques is to find an algorithm that allows us to guess more reasonable values for the unmeasured points.  A priori information is the key to choosing "reasonable" values -- for example, we may exclude negative flux values and attempt to maximize some measure of smoothness in the image.

CLEAN Algorithm
  1. Find the strength and location of brightest point in the image (or some previously defined portion of the image)
  2. Subtract from the dirty image at this location the dirty beam B multiplied by peak strength and gain factor g< 1.  (Record the position and the subtracted flux.)
  3. Go to (1) unless any remaining peak is below some user-specified level.  What is left are the residuals.
  4. Convolve the accumulated point source model with an idealized CLEAN beam (i.e. add back components) usually an elliptical gaussian of the same size and shape as the inner part of the dirty beam.
  5. Add the residuals from step (3) to the CLEAN image.
A number of similar algorithms have been developed, mostly to speed up the operation for certain types of imaging problems, but primarily they all do the same function.

CLEAN boxes
Step (1) can be restricted to smaller areas of the map if there is some reason to believe that the true emission is confined there.  A practical example is shown with OVSA data.

Number of Iterations and Loop Gain
The algorithm subtracts some fraction of the source and sidelobes, and runs to some limit (or number of components Ncl).  These are "knobs" to tweak that can affect the resulting map (which shows the nonuniqueness of the solution).  Using too high a gain tends to make extended, weak emission undetectable and noisy.  Basically the brightness distribution is being decomposed into point sources, and the larger the gain the smaller the number of components.  Using a number of components that is too high (by specifying too small a gain) means cleaning into the "noise" and wasting computer time.  The values of gain and number of components must be chosen on a case by case basis, depending on the source and data quality.

The Special Problem of the Sun as a Source
When mapping the entire disk of the Sun, on which smaller size-scale sources are located, we have an extreme form of extended emission that would take a very long time to decompose into point sources.  Thus, the usual CLEAN algorithm is not appropriate and another approach is needed.  This is an example of a general approach in which models for the brightness distribution are used.

A good model for the solar disk is a uniform disk of appropriate size (the radius for example, is not the solar optical radius, but a slightly larger (and frequency dependent) radius of order 10" larger).  One scales the disk to the appropriate flux level (perhaps as measured with another telescope), then does a Fourier inversion of the model to obtain the expected u,v visibilities sampled with the array.  One can then do a straight vector subtraction of the model visibilities, leaving residuals appropriate to a diskless Sun.  One can then use normal cleaning techniques on this modified u,v database, and after making a clean map the disk model can be added back in.  The best map will result when an accurate disk model is used.

Subtracting in the u,v Plane
As an example, I was once studying a rapidly varying flare star in the presence of a nearby radio-bright galaxy whose flux density was much stronger than the star I was trying to study.  The flare star was so weak that it could not even be seen until the sidelobes of the galaxy were cleaned away.

In order to follow the rapid variations I needed a map once per minute or so, but over ~ 12 hours this is a lot of maps to be CLEANed deeply.  Worse, slight variations in cleaning could masquerade as stellar variations.  What I did was to restrict the cleaning to the area around the galaxy, then used the clean components as a model of the galaxy.  I then subtracted the model (the u,v visibilities of the inverse transformed model) from the u,v data, which created a u,v database containing only the star.  I could then restrict my CLEAN to a small region of interest and quickly make maps of the variation.  This is all possible because of the linear nature of the Fourier Transform.  You can add and subtract in either the map plane or the u,v plane.

MEM Algorithm
The term Maximum Entropy comes from the similarity of the functional form of the "objective" function (the quantity to be maximized) to the statistical entropy in statistical mechanics.  However, it is not really entropy and the analogy should not be taken too far.  The objective function often used involves
H = -Sk Ik ln Ik/Mke      Ik = intensity in kth cell of image,
                                      Mk = "default" image, or model
which has two main attributes -- it produces a positive image everywhere, and the image is "maximally uniform " in a sense.  It is said that a solution that maximizes this objective function is the solution to missing u,v points that minimizes the amount of added information.  The uniformity comes about by forcing a compression in pixel values, consistent with the data constraints.  It is the logarithmic term that does this amplitude compression, and sometimes the objective function used involves a shortened form, H = -Sk ln Ik/Mke, which also works.
  1. Start with a default map.  This could be a flat map (a delta function in u,v space), or it could be the dirty map.  This map becomes the model, M, by which the intensity in each pixel k is divided.
  2. Calculate the entropy by summing the objective function, given above, over the entire map.  (In practice, algorithms minimize the "neg-entropy" H' = -H = Sk Ik ln Ik/Mke).
  3. Calculate the c2 difference, over the entire map, between the model visibilities V(uk,vk), and the measured u,v data V(uk,vk),
    1. D = c2 = Sk |V(uk,vk) -V(uk,vk)|2 / s2
    where s2 is a measure of the measurement errors.
  4. Determine the objective function (the function to be maximized),
    1. ri = lHi- Di
    where l is a Langrange multiplier that affects the weighting of the entropy relative to the data, and i is the iteration number.  Increasing l makes the entropy more important and moves the solution away from the data more quickly, which may cause the algorithm to converge more quickly but can also lead to instability and divergence.  Making l smaller is safer, but can lead to a very large number of iterations before convergence.
  5. To minimize the negative of the objective function, we determine the functional form for the derivative, which we want to bring closer to zero.  This functional form provides an indication of whether we need to increase or decrease the model intensity Ik in each pixel in order to bring that pixel's contribution closer to the minimum.
  6. We make those adjustments, which forms one iteration, and then repeat the entire process.  We continue until the value of c2 reaches an acceptable limit.

The process then, is one of making slight changes to the brightness in each pixel of the model until the model both matches the data (the contribution of D in the objective function) and is sufficiently uniform (the contribution of the entropy term H).  By changing the intensities in the map, we are putting non-zero values in the unmeasured u,v points in the u,v plane, but we may also be making changes to the measured points.  If so, that will make c2 worse, but perhaps improve the entropy.  The process will stop when we have filled in the unmeasured u,v in a manner that is consistent with the measured ones.  The hope is that when we are done we will have a map that is fully consistent with the measured values, is everywhere as uniform as possible, and is everywhere positive.

To discuss image reconstruction further, we will use the NRAO Summer School lecture on imaging and deconvolution.

The Receiving System for Interferometry

Introduction

We have seen what the receivers and front end (feed, amplifier, mixer) must do to sensitively detect weak radio emission.  We have also looked at the way interferometry can be used to image the sky, and how pairs of antennas (baselines) measure Fourier components of the sky brightness distribution.  We now will go back and look at what special aspects of the receiving system are needed to do interferometry.
Phaselock
The first part of the receiver is the heterodyning to bring the RF (radio frequency) down to a manageable intermediate frequency (IF) that we can work with.  This requires a mixer and local oscillator (frequency reference), as shown in Figure 1, below:

Figure 1: Heterodyne receiver, which uses a local oscillator (LO)
operating at frequency wo, to tune to the desired radio frequency (RF)
and mix with RF at a wide band of frequencies, and strip off a lower
bandwidth section of intermediate frequency (IF) for further processing.

For interferometry, we must correlate the signals from two antennas, which requires a number of additional considerations.  The main one is to ensure that the receivers of the two antennas are operating at exactly the same frequency.  If one were on a frequency different by only 1 Hz, the resultant phase between the two would change by 360 degrees every second!  To control the frequency of the two, we must use a phase lock system, whose block diagram might look like Figure 2.


Figure 2: Adding a phase lock loop, which compares the LO output
with an external reference frequency and sends an error signal back
to the LO to keep it in perfect phase lock with the reference signal.
Each antenna receives the Ref signal from the same source, so all
receivers are locked to the same frequency.

Mixer 2 compares the LO to a frequency reference, which comes from the same frequency source for all antennas.  Any error in the phase results in an error signal that is fed back to the oscillator to adjust its frequency to maintain exact frequency tuning.

Correlation
The IF signal from each receiver looks like a noise signal.  Part of the waveform is really signal from the source, and part of it (perhaps the largest part) is noise.  If they both look the same, how do we tell the difference?  The source signal will be correlated between the two antennas, while the noise signal will not.  This is illustrated with the simulated waveforms from two antennas, below:

Figure 3: Two simulated voltage waveforms, with phase 30 degrees, with the
waveform for antenna 1 shifted by 800 time samples.  The noise level is 1/5 of
the signal level in this example.  The waveforms appear to have no relation to
one another, but when correlated they give the plot in the third panel (cosine
channel), which shows a good correlation (spike) at a time lag of 800 samples.
Shifting the antenna 1 waveform by 90 degrees and performing the correlation
again gives the result shown in the bottom panel (sine channel).  The combination
of the sine and cosine channels gives an amplitude of 0.268 and phase of
30.2 degrees.  The correct values are 0.25 and 30 degrees.


Figure 4: Two simulated voltage waveforms, with the same characteristics as for
Figure 3, but now the noise level 5 times higher and is now equal to the signal level.
Because the noise is uncorrelated, the correlated signal is hardly affected, and gives
and amplitude of 0.245 and phase of 30.83 degrees, compared to the correct
values of 0.25 and 30 degrees.

Given time varying voltages V1 and V2, the correlation is found by multiplying them, with one delayed by the geometrical delay tg= B . s/c, then averaging, i.e.

r = <V1(t)V2(t)>
where < > denotes the expectation value, found by averaging over some integration time.  Considering for the moment a monochromatic time-varying signal
V1(t) = v1 cos[2pn(t-tg)]
V2(t) = v2 cos[2pnt]
we have
r = <v1v2 cos[2pn(t-tg)] cos[2pnt]>
   = v1v2 <cos2(2pnt) cos(2pntg) + cos(2pnt) sin(2pnt) sin(2pntg)>
   = v1v2 cos(2pntg)
Since the geometrical delay tg changes due to the Earth's rotation, the relatively slowly varying cosine term causes the oscillations that represent the motion of the source through the interferometer fringe pattern.  In the old days of chart recorders, this fringe pattern was traced on paper and its amplitude and phase could be measured by hand.  The rate of fringe oscillations is called the natural fringe rate.  The fringe frequency is
nF = dw/dt = -We u cos d
where We is the Earth rotation rate, w = (Bl . s) is the spatial frequency corresponding to the projected baseline z-component given by the coordinate transformation from the previous lecture, u is the usual E, or y-component spatial frequency, and d is the declination of the phase center.  The geometry is as shown in Figure 5.

Figure 5: The geometry and block diagram leading to the measured cosine component
of the correlation.  Both the multiplier and the integrator are part of the device called the
correlator.  A refinement is shown in Figure 6.

A major refinement is to use a second correlator, and shift one of the signals by p/2, so that both sine and cosine components are measured simultaneously, as shown below.  The components in the dashed box in Figure 5 are indicated by each circled X in Figure 6.

Figure 6: Inserting a phase shift of p/2 in one of the antennas and doing a
second correlation allows both sine and cosine components to be measured
simultaneously.  These are recorded and become the complex visibility at
spatial frequency u,v corresponding to the projected baseline between the
antennas.

The quantities measured out of the correlator are the real and imaginary parts of the complex visibility measured with the baseline, whose normalization is obtained by the calibration procedure, which we have not yet discussed.  You may wonder how we accomplish the 90 degree phase shift over a finite IF bandwidth Dn.  This is done in the OVSA receivers by changing the phase of the reference signal used to phase lock the local oscillator.  Note that this is not equivalent to a time delay, which would shift the phase by different amounts for different frequencies, but rather it shifts the phase of each frequency separately.  Some systems use the digital correlator to do the phase shifting.  As it turns out, phase shifting, or synchronous detection, is also important for eliminating any DC offsets.  If we multiply two waveforms with a DC offset, the offsets will give a non-zero signal even when there is no correlation in the signals.  This is eliminated by periodically inverting the signal at the antenna, and then synchronously inverting the signal again at the correlator.  In this way, the signals stay correlated while any unwanted DC offset gets inverted periodically and averages to zero.

To discuss correlators further, we will use the NRAO Summer School lecture on cross correlators.