Post-Facto Synchronization

To save energy in a sensor network, it is a desirable to keep nodes in a low-power state, if not turned off completely, for as long as possible. Sensor network hardware is often designed with this goal in mind; processor have various ``sleep'' modes or are capable of powering down high-energy peripherals when not in use.

This type of design is exemplified by the WINS platform [ADL$^+$98], which has an extremely low-power ``pre-processor'' that is capable of rudimentary signal processing. Normally, the entire node is powered down except for the pre-processor. When the pre-processor detects a potentially interesting signal, it powers on the general purpose processor for further analysis. The CPU, in turn, can power on the node's radio if it determines that an event has occurred that needs to be reported.

Such designs allow the components that consume the most energy to be powered for the least time, but also pose significant problems if we wish to keep synchronized time. Traditional methods try to keep the clock disciplined at all times so that an accurate timestamp is always available. What happens if the radio--our external source of time and frequency standards--is continuously off for hours at a time? Or, in the case of a platform like WINS, what if the general-purpose processor that knows how to discipline the clock is also off?

Our solution to this problem is post-facto synchronization. In our scheme, nodes' clocks are normally unsynchronized. When a stimulus arrives, each node records the time of the stimulus with respect to its own local clock. Immediately afterwards, a ``third party'' node--acting as a beacon--broadcasts a synchronization pulse to all nodes in the area using its radio. Nodes that receive this pulse use it as an instantaneous time reference and can normalize their stimulus timestamps with respect to that reference.

This kind of synchronization is not applicable in all situations, of course: it is limited in scope to the transmit range of the beacon and creates only an ``instant'' of synchronized time. This makes it inappropriate for an application that needs to communicate a timestamp over long distances or times. However, it does provide exactly the service necessary for beam-forming applications, localization systems, and other situations in which we need to compare the relative arrival times of a signal at a set of spatially local detectors.

Expected Sources of Error

There are three main factors that affect the accuracy and precision achievable by post-facto synchronization. Roughly in order of importance, they are: receiver clock skew, variable delays in the receivers, and propagation delay of the synchronization pulse.

Skew in the receivers' local clocks. Post-facto synchronization requires that each receiver accurately measure the interval that elapses between their detection of the event and the arrival of the synchronization pulse. However, nodes' clocks do not run at exactly the same rate, causing error in that measurement. Since clock skew among the group will cause the achievable error bound to decay as time elapses between the stimulus and pulse, it is important to minimize this interval.
One way of reducing this error is to use NTP to discipline the frequency of each node's oscillator. This exemplifies our idea of multi-modal synchronization. Although running NTP ``full-time'' defeats one of the original goals of keeping the main processor or radio off, it can still be useful for frequency discipline (much more so than for phase correction) at very low duty cycles.
Variable delays on the receivers. Even if the synchronization signal arrives at the same instant at all receivers, there is no guarantee that each receiver will detect the signal at the same instant. Nondeterminism in the detection hardware and operating system issues such as variable interrupt latency can contribute unpredictable delays that are inconsistent across receivers. The detection of the event itself (audio, seismic pulses, etc.) may also have nondeterministic delays associated with it. These delays will contribute directly to the synchronization error.
Our design avoids error due to variable delays in the sender by considering the sender of the sync pulse to be a ``third party.'' That is, the receivers are considered to be synchronized only with each other, not with the beacon.
It is interesting to note that the error caused by variable delay is the same irrespective of the time elapsed between the event and the sync pulse. This is in contrast to error due to clock skew that grows over time.
Propagation delay of the synchronization pulse. Our method assumes that the synchronization pulse is an absolute time reference at the instant of its arrival--that is, that it arrives at every node at exactly the same time. In reality, this is not the case due to the finite propagation speed of RF signals. Synchronization will never be achievable with accuracy better than the difference in the propagation delay between the various receivers and the synchronization beacon.
This source of error makes our technique most useful when comparing arrival times of phenomena that propagate much more slowly than RF, such as audio. The six-order-of-magnitude difference in the speed of RF and audio has been similarly exploited in the past in systems such as the ORL's Active Bat [WJH97] and Girod's acoustic rangefinder [Gir00].

Empirical Study

We designed an experiment to characterize the performance of our post-facto synchronization scheme. The experiment attempts to measure the sources of error described in the previous section by delivering a stimulus to each receiver at the same instant, and asking the receivers to timestamp the arrival time of that stimulus with respect to a synchronization pulse delivered via the same mechanism. Ideally, if there are no variable delays in the receivers or skew among the receivers' local oscillators, the times reported for the stimulus should be identical. In reality, these sources of error cause the dispersion among the reported times to grow as more time elapses between the stimulus and the sync pulse. The decay in the error bound should happen more slowly if NTP is simultaneously used to discipline the frequency of the receivers' oscillators.

We realized this experiment with one sender and ten receivers, each of which was ordinary PC hardware (Dell OptiPlex GX1 workstations) running the RedHat Linux operating system. Each stimulus and sync pulse was a simple TTL logic signal sent and received by the standard PC parallel port.² In each trial, each receiver reported its perceived elapsed time between the stimulus and synchronization pulse according to the system clock, which has $1\mu{}$ sec resolution. We defined the dispersion to be the standard deviation from the mean of these reported values. To minimize the variable delay introduced by the operating system, the times of the incoming pulses were recorded by the parallel port interrupt handler using a Linux kernel module.

In order to understand how dispersion is affected by the time elapsed between stimulus and sync pulse, we tested the dispersion for 21 different values of this elapsed time, ranging from $2^4\mu{}$ sec to $2^{24}\mu{}$ sec ( $16\mu{}$ sec to 16.8 seconds). For each elapsed-time value, we performed 50 trials and reported the mean. These 1,050 trials were performed in a random order over the course of one hour to minimize the effects of systematic error (e.g. changes in network activity that affect interrupt latency).

For comparison, this entire experiment was performed in three different configurations:

The experiment was run on the ``raw clock'': that is, while the receivers' clocks were not disciplined by any external frequency standard.
An NTPv3 client was started on each receiver and allowed to synchronize (via Ethernet) to our lab's stratum-1 GPS clock for ten days. The experiment was then repeated while NTP was running.
NTP's external time source was removed, and the NTP daemon was allowed to free-run for several days using its last-known estimates of the local clock's frequency. The experiment was then repeated.

To compare our post-facto method to the error bound achievable by NTP alone, we recorded two different stimulus-arrival timestamps when running the experiment in Configuration 2: the time with respect to the sync pulse and the time according to the NTP-disciplined local clock. Similar to the other configurations, a dispersion value for NTP was computed for each stimulus by computing the standard deviation from the mean of the reported timestamps. The horizontal line in Figure 1 is the mean of those 1,050 dispersion values--101.70 $\mu{}$ sec.

Our results are shown in Figure 1. ³

**Figure:** Synchronization error using post-facto time synchronization without external frequency discipline, with discipline from an active NTP time source, and with free-running NTP discipline (external time source removed after the oscillator's frequency was estimated). These are compared to the error bound achievable with NTP alone (the horizontal line near $100\mu {}$ sec). The breakpoint seen near 50msec is where error due to clock skew, which grows proportionally with the time elapsed from stimulus to sync pulse, overcomes other sources of error that are independent of this interval. Each point represents the dispersion experienced among 10 receivers, averaged over 50 trials.
$\begin{figure*}\centering %% <tex2html_file> ...$

Discussion

The results shown in Figure 1 illuminate a number of aspects of the system. First, the experiment gives insight into the nature of its error sources. The results with NTP-disciplined clock case are equivalent to undisciplined clocks when the interval is less than $\approx50$ msec, suggesting that the primary source of error in these cases is variable delays on the receiver (for example, due to interrupt latency or the sampling rate in the analog-to-digital conversion hardware in the PC parallel port). Beyond 50msec, the two experiments diverge, suggesting that clock skew becomes the primary source of error at this point.

Overall, the performance of post-facto synchronization was quite good. When NTP was used to discipline the local oscillator's frequency, an error bound very near to the clock's resolution of $1\mu{}$ sec was achieved. This is significantly better than the $100\mu {}$ sec achieved by NTP alone. Clearly, the combination of NTP's frequency estimation with the sync pulse's instantaneous phase correction was very effective. Indeed, the multi-modal combination's maximum error is better than either mode can achieve alone. We find this a very encouraging indicator for the multi-modal synchronization framework we proposed at the end of Section 5.

Without NTP discipline, the post-facto method still performs reasonably well for short intervals between stimulus and sync pulse. For longer intervals, we are at the mercy of happenstance: the error depends on the natural frequencies of whatever oscillators we happen to have in our receiver set.

Perhaps the most exciting result, however, is shown in the experiment where NTP disciplined the nodes' local clocks using only the last-known-estimate of frequency, after the external time source was removed. The achievable error bound was $1\mu{}$ sec: the limit of our clock's resolution and, more importantly, exactly the same as the result with NTP and an active external time standard. This result is important because it shows that extremely low-energy and low-error time synchronization is possible: after an initial frequency-training period, nodes might be able to keep their radios off for days and still instantly acquire a timebase within $1\mu{}$ sec when an event of interest arrives. That result is made possible by the multi-modal synchronization; the frequency correction provided by free-running NTP is not good enough to keep clocks precisely in phase over time due to accumulated error. (In the free-running NTP experiment, the accuracy of the timestamps when not normalized by the sync pulse was only in the tens of milliseconds.)

All of these results, while encouraging, do come with a number of caveats. Our experiments results were performed under idealized laboratory conditions, using (equal-length) cables to directly connect the sender to the receivers. Real world conditions will require wireless links that are likely far more complex with more opportunities for variable delay. In addition, the relatively constant ambient temperature reduced the oscillators' frequency drift over time. A real sensor network deployed outdoors might not be able let NTP free-run without an external time source for as long as we did in our experiment.

Next: Work Plan Up: Time Synchronization Services for Previous: Sensor Network Time Contents