Tag Archives: PSI

(My) Hazy Numbers

After looking at yesterday’s data, I think I agree with NEA that you need at least a 3-hour average reading to make it reliable. It’s the minimum samples that you require to have a meaningful data. The 1-hour estimates simply fluctuate too much (or the model is too simple, perhaps it can be modelled better from the 3-hour reading). The ‘lag’ problem still persists, though, so I suppose there is still room for one more variable to indicate the ‘current trend’ of the data and have some predictive value to it.

Update: Apparently the problem lies in the piecewise linear relationship between PM10 concentration and PSI value, as highlighted in this excellent post.

psi10am

PSI reading from 17/06/13 5am to 21/06/13 10am. Most of the PSI 3-hour readings are given by NEA while the rest (e.g., those between 1AM-5AM) and the 1-hour estimates are back-calculated from the 3-hour readings. Note the very large fluctuation in the 1-hour estimates.

Hazy numbers

When it rains, it pours. As PSI reading of 100 becomes the new normal nowadays in Singapore, people also start to question whether they can trust the reading by NEA or not, since sometimes what they feel is worse than the given number. Well, the answer is yes, but with a caveat. The thing is the PSI number by NEA is a 3-hour average reading. It means that there will always be a lag factor in the number. Depending on the trend, the PSI reading can under-estimate the ‘current’ PSI value, but it also can over-estimate it. This can be illustrated as given in the figure below. I tried to back-calculate the 1-hour reading using several assumptions and you can see that the 3-hour average readings always lag the 1-hour estimate readings. So, for example, the peak value on 19/06 is not when the 3-hour PSI reached 321 at 10 pm, but an hour before that, where the 3-hour PSI reading was 290 but the 1-hour estimate was around 450.

PSI reading from 17/06/13 5am to 20/06/13 12pm. PSI 3-hour reading is given by NEA and the 1-hour reading is by back-calculation with assumptions.

PSI reading from 17/06/13 5am to 20/06/13 1pm. Most of the PSI 3-hour readings are given by NEA while the rest (e.g., those between 1AM-5AM) and the 1-hour readings are estimated by back-calculation with assumptions.

This 3-hour average reading is fine if there is not much movement in the data, and is actually a good practice of statistics (acquiring more samples), since it will reduce the noise in the data. The problem, of course, lies when there is indeed a large change in the data. In that case the 1-hour reading might give you a more accurate data, although the reading will be noisier. In the figure above, you would find more fluctuations in the 1-hour estimates if compared to the 3-hour readings. And also note that both the figures here are averages. Although the 1-hour estimate might be more ‘current’, it also comes with higher standard deviation and hence less confidence and reliability.

Back-calculation to get the 1-hour estimate reading is also useful to predict future readings, since parts of the current reading will still be used in at most 2 more hours afterwards. So, for example, the 20/06 3-hour reading at 1 pm is 371, which consists of 1-hour estimates of 256 at 11 am, 454 at 12 pm, and 403 at 1 pm. The 3-hour reading at 2 pm will consist of 454 at 12 pm, 403 at 1 pm, and the 1-hour estimate at 2 pm. Assuming that the 1-hour estimate at 2 pm will go down further (let say at 350), the 3-hour estimate PSI reading at 2 pm will be at 402. The number might sound even creepier, but if we look at the 1-hour estimate, actually we have gone through the worst hour (1-hour estimate of 454 at 12 pm). To put it differently, if the 3-hour PSI number were to go down from 371, we need the 1-hour reading at 2 pm to be below 250, which is pretty drastic since the current 1-hour estimate is at 403. But such drastic change is not unprecedented. The 1-hour estimate at 11 am today was 256 while it was 454 at 12 pm, an almost 200 change in value.

The point of all this is not to have some mathematical fun with the data, but to recognize the limitation that  the 3-hour readings inherently have. I still think that in an extra-ordinary situation like this we also need the 1-hour estimates, as the 3-hour readings, although more reliable, are not fast enough to capture the movement of the data.

(Update to this post: Here.)