1) The problem: we have a noisy image and we want to recover the best approximation of the clean image we can, from this noisy image.
2) The clean image (the signal) is not available but must be guessed from the noisy image. We can make some assumptions about the clean image.
The first is that each pixel in the image is highly correlated to its neighbours. This means that the information is spread all over the image.
The second assumption we can make is that we can apply a "transform" to the image, that is, represent the image in an alternate way such that most of the information is "compacted" in a few coefficients, and the other, many, coefficients are not so important. One of such transforms is the DCT, another is the Wavelet Transform (WT). If you look at the DCT of an image or of a block of an image you'll see that most of the energy is compacted near the DC coefficient and it rapidly vanishes as you move to the highest frequencies. In a WT most of the energy is at the lowest resolution subband, and each detail band has less energy as you move to the highest frequency subbands.
This is because most images have most of their energy concentrated in the lowest frequencies, and have very little high frequency information. This is, of course, a generalization, but this is what lossy image compressing schemes like JPEG or MPEG rely on.
3) A practical denoising technique must also make some assumptions on the noise.
In the so called white noise one pixel of noise is not correlated to other pixels of noise; This means it doesn't have a DC level (zero mean) and that if you take a DCT or WT there won't be any concentration of significant coefficients - the white noise has equal energy at all frequencies. This doesn't hold if the noise is "coloured".
White noise is also not related to the clean signal in any way. Some kinds of noise, like JPEG artifacts, are highly correlated to the clean signal.
The noise is considered additive, this meaning that the noisy image is a "sum" of the clean image with some amount of pure noise.
To be able to make a nice approximation of the clean image using the noisy image, the amount of noise should not be "too much". It's not only much harder to guess what's signal in the middle of an ocean of noise than to remove a little noise in an almost good image, but the more noise you have the worse will be your results. As the level of noise increases, the more subtle information is masked by the noise and you can only extract the more strong features of the clean signal.
4) The Wavelet denoising.
Since the noise is supposed to be additive, and the WT is linear, the WT of the noisy image is equal to the WT of the clean signal added to the WT of the pure noise.
The WT of the clean image has most energy compacted at the low frequency subbands. The WT of the noise has its energy spread evenly all over the subbands, and its energy level is hopefully very much smaller than the energy level of the lowest frequencies of the clean signal and hopefully smaller than the most important signal features at the highest frequency bands of the clean signal.
Since the energy level of the noise is hopefully very much smaller than the energy level of the lowest frequencies of the clean signal, you can leave the low frequencies alone as the noise is masked by the signal and wont bother you
Traditional denoising kills the noise by attenuating the high frequencies (with some smoothing filter), but this also attenuates the high frequencies of the signal by the same amount.
The WT thresholding technique uses a "kill or keep" strategy. If the high frequency coefficient amplitude is below a certain level, it is noise or some detail you can't see anyway because of the noise; so you kill it. If it is above a certain threshold it problably ain't noise, but it is some detail you can see even with the noise; so you keep it. The result is that the noise is "killed" but the details that stand above the noise level are kept intact. The higher the noise level, the more details you have to sacrifice, just like in traditional denoising using "smoothing", but the difference is that even if you're sacrificing lots of details, the most proeminent details are still kept intact.
================================================== ====================================