Sie sind auf Seite 1von 145

Low Power Synchronization for Wireless Communication

by

Marcy Josephine Ammer

B.S. (Massachusetts Institute of Technology) 1997


M. Eng. (Massachusetts Institute of Technology) 1999

A dissertation submitted in partial satisfaction of the

requirements for the degree of

Doctor of Philosophy
in

Engineering – Electrical Engineering and Computer Sciences

in the

GRADUATE DIVISION

Of the

UNIVERSITY OF CALIFORNIA, BERKELEY

Committee in charge:

Professor Jan Rabaey, Chair


Professor Heinrich Meyr
Professor Borivoje Nikolic
Professor Paul Wright

Fall 2004
The dissertation of Marcy Josephine Ammer is approved:

Chair Date

Date

Date

Date

University of California, Berkeley

Fall 2004
Low Power Synchronization for Wireless Communication

Copyright 2004

by

Marcy Josephine Ammer


Abstract

Low Power Synchronization for Wireless Communication

by

Marcy Josephine Ammer

Doctor of Philosophy in Engineering – Electrical Engineering and Computer Sciences

University of California, Berkeley

Professor Jan Rabaey, Chair

Synchronization is increasingly important in wireless communication devices.

Synchronization performance is critical to system performance and, it is where a large amount of

design time and receiver area and power is spent.

Not only is synchronization important, but the relevance is increasing due to four factors:

1. Decreased transmit distances use lower transmit power and, therefore, receiver power

begins to dominate.

2. The wireless channel is more frequency selective at higher transmission speeds which

require increased synchronization functionality.

3. Trends toward higher bandwidth efficiency moves modulation to higher order

constellations where synchronization specifications are tighter.

4. The push for integration moves RF functionality to digital CMOS processes with low

supply voltages forcing the synchronization system to contend with more front-end

nonidealities.

There are few places where the whole topic of synchronization is covered and fewer still

where the power consumption is considered. This research shows that significant system power

savings can be realized through systematic exploration of synchronization power consumption.

1
This dissertation sets up a framework for the systematic exploration of power consumption in

synchronization systems, applies this framework to a few representative problems, and uses some

system examples to show the impact of this type of exploration.

At the component level, frequency estimation and interpolation are investigated. It is shown

that frequency estimation power reductions of up to 4x are possible while simultaneously

decreasing convergence time by up to 4x. For interpolation, it is shown that proper parameter

selection can result in a 10x reduction in power consumption.

At the system level, two non standards-based communication systems are considered. PNII is

a 1.6 Mbps personal area network system for wireless intercom type applications over short

distances (10-30 m). The original system’s frequency and phase estimation blocks are redesigned

using the framework developed here. Simultaneous reductions of 66% in synchronization energy

consumption and 72% in convergence time are achieved. PN3 is a 50 Kbps system designed for

use in wireless sensor network applications. A 300uW synchronization system was designed for

PN3. This is low enough so that further reduction has very little impact on system energy

consumption.

2
Acknowledgements

I would like to thank my advisor, Jan Rabaey, for his grand vision and subtle guidance (except

when otherwise required). If I am half as successful in my career as he has been in his, I will be

fulfilled. I would also like to thank him, in conjunction with Bob Brodersen, for creating the

Berkeley Wireless Research Center. It has been a true gift to be able to earn my Ph.D. in such a

rich environment.

Thanks to Bora Nikoloic for setting such high standards in his EE225C class. My final project

in that class was the genesis of this work. Thanks also for being the much needed harsh critic and

for the constant encouragement to do good work.

To Heinrich Meyr for his encouragement and for teaching his seminar on Digital

Communication Receivers where I first got hooked on synchronization. Much of my work grows

out of the fundamentals in his two volumes on communication receivers. Vielen Dank.

Thanks to Paul Wright for adding a different perspective with flair.

Thanks to Tom Knight, my original advisor at MIT, for supporting my move to Berkeley and

for his constant wisdom, both technical and non-technical.

To my lab-mates: Mike Sheets, Ian O’Donnell, Dave Sobel, and Johan Vanderhaegen.

Thanks, Mike, for being such a good friend as well as constantly saving me from the tools.

Thanks, Ian, for your endless supply of interesting conversations, your sense of humor, and your

cynical opinions. Thanks, Dave, for your clarity of thought, for always reminding me to be

methodical, and for helping me locate those pesky factors-of-two I always seem to be missing.

i
Thanks, Johan, for being so incredibly smart and precise. When we no longer work together, I

will sorely miss the safety net your knowledge provides.

To the old Bob-and-Jan group core: Varghese George, Marlene Wan, Vandana Prahbu, and

Jeff Gilbert. George showed me the ropes of Berkeley and all the good coffee shops and Indian

restaurants in town. Marlene is a never ending source of fun times and the inspiration that it is

possible to “do it all”. Vandana never fails to make you smile and remind you to be carefree.

And, Jeff is the sage. Thanks also to all of you for showing me what life looks like on the other

side of graduation.

A special thanks to Rhett Davis for all his untiring work on the first chip.

To my housemates: Sunny, Carol, Marina, and Sandon. Thanks for letting me be a part of

your lives, and teaching me to squeeze every ounce out of every experience.

To Tony Gray and Olin Shivers. Two great men, whose advice I always recall when things

look bad.

To my family for all the years of love and support. Especially my sisters. I will always

remember finishing my thesis as the time when Erin stopped being my little sister and became my

friend. Marissa, for having the courage to be yourself. I am so proud of your accomplishments.

And Candy, for being the consummate enduring friend. My Mom deserves a special

acknowledgement. Her constant support and pride, even in my smallest accomplishments, has

motivated me throughout.

Finally, I would like to thank Misha. I do not have the words to describe to what extent this is

all not possible without you. ‘Here’s to being speechless and those who make us so.’

Bali, Indonesia

August 28, 2004

ii
Table of Contents

Acknowledgements ..................................................................................................................... i

Table of Contents ...................................................................................................................... iii

List of Figures .......................................................................................................................... vii

List of Tables.............................................................................................................................. x

1 Introduction ............................................................................................................................. 1

2 Background ............................................................................................................................. 7

2.1 Introduction ...................................................................................................................... 7

2.2 Synchronization................................................................................................................ 8

2.3 Metrics for Comparing Algorithms................................................................................ 13

2.4 Wireless Channel Models............................................................................................... 16

3 Evaluation and Exploration Environment ............................................................................. 22

3.1 Introduction .................................................................................................................... 22

3.2 Simulation and HDL Description of Algorithms ........................................................... 24

3.3 Gate-level Power Estimation.......................................................................................... 26

3.4 Analog to Digital Converter Power Estimation ............................................................. 32

3.5 System Power Estimation Tool ...................................................................................... 33

3.6 Conclusion...................................................................................................................... 34

4 PNII System .......................................................................................................................... 35

4.1 Introduction .................................................................................................................... 35

iii
4.2 System Details................................................................................................................ 36

4.3 Synchronization System................................................................................................. 37

4.3.1 Timing Recovery..................................................................................................... 38

4.3.2 Course Timing Estimation ...................................................................................... 39

4.3.3 Fine Timing and Frequency Estimation .................................................................. 40

4.3.4 Frequency Correction and Timing Tracking ........................................................... 41

4.3.5 Phase Estimation and Correction ............................................................................ 42

4.3.6 Synchronization System Performance..................................................................... 43

4.4 Results and Conclusion .................................................................................................. 44

5 Frequency Estimation............................................................................................................ 46

5.1 Introduction .................................................................................................................... 46

5.2 Frequency Estimation Algorithms ................................................................................. 47

5.3 Power Estimation Methodology..................................................................................... 52

5.4 Algorithm Comparison and Results ............................................................................... 53

5.5 Conclusion...................................................................................................................... 57

5.6 Postscript: Application to DSSS Systems ...................................................................... 57

6 PNII System Refinement....................................................................................................... 60

6.1 Introduction .................................................................................................................... 60

6.2 Frequency Estimation Refinement ................................................................................. 61

6.3 Frequency and Phase Estimation Redesign.................................................................... 63

6.3.1 Differential Modulation Penalty.............................................................................. 64

6.3.2 Phase Error vs. SNR Degradation ........................................................................... 65

6.3.3 Feed-Forward Phase Estimation.............................................................................. 69

6.3.4 Frequency and Phase Estimation Redesign............................................................. 70

6.3.5 System Results ........................................................................................................ 74

iv
7 Interpolation .......................................................................................................................... 77

7.1 Introduction .................................................................................................................... 77

7.2 Interpolation Background............................................................................................... 78

7.3 Farrow Interpolator Exploration..................................................................................... 81

7.4 Achieving the Timing Resolution Specification ............................................................ 86

7.5 Achieving the Output SNR Specification ...................................................................... 88

7.6 Conclusion...................................................................................................................... 90

7.7 Postscript: Interpolator Hardware Implementation Specifics......................................... 91

8 PN3 System Design............................................................................................................... 92

8.1 Simplification of Synchronization ................................................................................. 93

8.2 Analog vs. Digital Implementation ................................................................................ 96

8.3 Matched Filtering ........................................................................................................... 97

8.4 Amplitude Estimation .................................................................................................... 98

8.5 Timing Estimation.......................................................................................................... 99

8.6 Digital Synchronization Scheme Summary ................................................................. 102

8.7 Analog Synchronization Scheme Summary................................................................. 103

8.8 Comparison of Synchronization Schemes.................................................................... 104

8.9 Conclusion and Future Work ....................................................................................... 106

8.10 Postscript: Simulation Environment........................................................................... 107

9 Conclusion and Future Work .............................................................................................. 109

A Power Estimation Scripts ................................................................................................... 112

A.1 Makefile ...................................................................................................................... 112

A.2 Netlist Script................................................................................................................ 115

A.3 Reporting Script .......................................................................................................... 117

A.4 Testbench .................................................................................................................... 118

v
A.5 Simulate Script ............................................................................................................ 120

A.6 Synthesis Script ........................................................................................................... 121

vi
List of Figures

Figure 1-1: Area and power of digital synchronization functions as a portion of PHY layer. a,b)

Bluetooth c) PNII d) 802.11a .................................................................................................... 2

Figure 2-1: Illustration of a typical communication system............................................................ 8

Figure 2-2: Realistic communication system includes synchronization system ............................. 9

Figure 2-3: Illustration of timing error ............................................................................................ 9

Figure 2-4: Illustration of frequency error..................................................................................... 10

Figure 2-5: Feed-forward vs. feedback estimation........................................................................ 11

Figure 2-6: Synchronization algorithm classification. Highlighted blocks are those addressed in

this thesis.................................................................................................................................. 13

Figure 3-1: Flow diagram of tools used in this thesis.................................................................... 23

Figure 3-2: Example synchronization system in Simulink............................................................ 24

Figure 3-3: Estimation accuracy requirements.............................................................................. 28

Figure 3-4: Accurate gate-level power estimation flow ................................................................ 31

Figure 3-5: Proposed power estimation method comes within 15% of the EP method for a wide

range of block sizes. ................................................................................................................. 32

Figure 4-1: PNII system block diagram. ....................................................................................... 37

Figure 4-2: Flow diagram of the PNII synchronization system .................................................... 38

Figure 4-3: Coarse timing block diagram...................................................................................... 40

Figure 4-4: Joint frequency and fine timing estimation ................................................................ 40

vii
Figure 4-5: Power loss in correlator with frequency offset ........................................................... 41

Figure 5-1: Meyr and Kay weighted and unweighted performance. ............................................. 49

Figure 5-2: Meyr weighted D = {1, 2} performance..................................................................... 51

Figure 5-3: Block diagram of the weighted Kay estimator ........................................................... 51

Figure 5-4: Block diagram of the weighted Meyr estimator ......................................................... 52

Figure 5-5: Meyr weighted vs. unweighted comparison ............................................................... 55

Figure 5-6: Kay weighted vs. unweighted comparison ................................................................. 55

Figure 5-7: Meyr vs. Kay weighted comparison ........................................................................... 56

Figure 5-8: Meyr weighted D = 1 vs. D = 2 comparison............................................................... 56

Figure 5-9: Variance of frequency estimation applied to chips versus symbols for 802.11b-like

symbols .................................................................................................................................... 58

Figure 6-1: Convergence time of different frequency estimators.................................................. 62

Figure 6-2: BER of coherent and differential QPSK, BPSK......................................................... 65

Figure 6-3: QPSK BER with Gaussian and fixed phase errors ..................................................... 67

Figure 6-4: QPSK BER with Gaussian phase error....................................................................... 68

Figure 6-5: BER vs. SNR with uniform phase error in the range of [0..lim] ................................ 69

Figure 6-6: Phase estimation variance vs. L for different SNRs ................................................... 70

Figure 6-7: System power consumption for different schemes ..................................................... 76

Figure 7-1: Block diagram of the Farrow interpolator. ................................................................. 80

Figure 7-2: Tap error (dB) vs. (N, M) and WT ............................................................................... 84

Figure 7-3: Interpolation performance for timing resolution of 1/16 ............................................ 86

Figure 7-4: Interpolation performance for timing resolution of 1/64 ............................................ 87

Figure 7-5: Interpolation performance for timing resolution of 1/1024 ........................................ 87

Figure 7-6: Interpolator performance for Wµ = 2.......................................................................... 89

Figure 7-7: Interpolator performance for Wµ = 4.......................................................................... 89

viii
Figure 7-8: Interpolator performance for Wµ = 8.......................................................................... 90

Figure 8-1: Digital (a) and analog (b) synchronization header structure..................................... 102

Figure 8-2: Performance breakdown of the digital synchronization scheme .............................. 103

Figure 8-3: Energy-per-useful-bit vs. packet length of analog and digital schemes ................... 104

Figure 8-4: Energy savings of 0-bit and 9-bit headers vs. 18-bit headers ................................... 106

Figure 8-5: Digital algorithm high level simulation and digital synchronization block.............. 107

Figure 8-6: Simulation results of the digital synchronization system timing correlator ............. 108

ix
List of Tables

Table 2-1: SNR degradation due to carrier phase and timing errors for PSK and QAM modulation

.................................................................................................................................................. 16

Table 2-2: Average path loss parameters for an indoor office environment at 2 GHz [ITU] ....... 18

Table 2-3: R.m.s. delay spread for 2 GHz indoor office environment [ITU]................................ 19

Table 2-4: JTC indoor office environment channel models [JTC]................................................ 20

Table 4-1: Implementation losses in PNII synchronization and detection .................................... 43

Table 4-2: BBP statistics ............................................................................................................... 44

Table 4-3: Physical layer receiver power consumption................................................................. 44

Table 6-1: New and old frequency estimation methods ................................................................ 63

Table 6-2: Frequency/phase estimation methods to be considered ............................................... 72

Table 6-3: Comparison of different Frequency/Phase Estimation Schemes ................................. 74

Table 6-4: Parameters used in system exploration ........................................................................ 74

Table 7-1: MMSE coefficients for λ = 4, (N, M) = (2, 2).............................................................. 81

Table 7-2: Vo coefficients for λ = 4, (N, M) = (2, 2) .................................................................... 81

Table 8-1: Summary of synchronization requirements for the PN3 system.................................. 96

Table 8-2: Target synchronization implementation losses ............................................................ 97

x
Introduction
1
“Assuming perfect synchronization…”

- Arbitrary Communication Text

Synchronization is an increasingly important component in a wireless communication device.

Synchronization performance is critical to system performance and it is where a large amount of

design time and receiver area and power is spent. There are few places where the whole topic of

synchronization is covered. In fact, in most texts on digital communication, the topic of

synchronization is examined very briefly if at all. Further, very few sources examine the

implementation costs of synchronization, especially the power consumption. This research shows

that significant system power savings can be realized through systematic exploration of

synchronization power consumption.

Synchronization is a significant component of wireless communication devices. Figure 1-1

highlights the area and power attributed to digital synchronization functions in three commercial

radio chips ((a) [KOK], (b) [CHA], and (d) [THO]) and one academic radio from this work (c). It

is shown that synchronization can consume up to 45% of the physical layer area.

1
Figure 1-1: Area and power of digital synchronization functions as a portion of PHY layer.
a,b) Bluetooth c) PNII d) 802.11a

The synchronization system typically has the highest clock rates and duty cycles of all digital

blocks. This, coupled with the large area, indicates that power consumption of synchronization

blocks is a significant component of physical layer power. Indeed, it will be shown in Chapter 4,

despite efforts to reduce power, the synchronization system still consumed 18% of the physical

layer power.

Not only is synchronization important, but the relevance is increasing due to four factors:

1) Decreased transmit distances use lower transmit power and, therefore, receiver power

begins to dominate.

2) The wireless channel is more frequency selective at higher transmission speeds which

require increased synchronization functionality.

3) Trends toward higher bandwidth efficiency moves modulation to higher order

constellations where synchronization specifications are tighter.

2
4) The push for integration moves RF functionality to digital CMOS processes with low

supply voltages forcing the synchronization system to contend with more front-end

nonidealities.

There are few authoritative sources where the whole topic of synchronization is addressed as a

cohesive unit. The seminal volumes by Meyr [MEY] are a noted exception. Further, very little

work on synchronization systematically considers implementation issues (again, the Meyr

volumes are an exception). Often existing research stops at complexity bound approximations.

This is to be expected. Development of synchronization algorithms requires a deep

understanding of communication and estimation theory. It is rare that someone with these skills

also has a deep understanding of circuit implementation technologies.

Most important of all, nowhere to the author’s knowledge is the power consumption of

different algorithms systematically compared. However, power consumption is one of the most

critical factors in the design of un-tethered wireless devices. Most notably, in the emerging field

of wireless sensor networks, power consumption is the most important factor [RAB2].

The topic of synchronization power consumption is too large to be solved in one dissertation.

Rather, this dissertation sets up a framework for the systematic exploration of power consumption

in synchronization systems, applies this framework to a few representative problems, and uses

some system examples to show the impact of this type of exploration. The two wireless

communication systems considered here are non-standards based systems called PNII and PN3.

PNII is a 1.6 Mbps personal area network system designed to carry voice over short distances

(10-30 m) for wireless intercom type applications [AMM]. PN3 is a 50 Kbps system designed for

use in wireless sensor network applications [SHE].

While the main focus of this work is on power consumption, it is not the only significant

metric. Circuit area, convergence time, and component cost are also important. Indeed, these

metrics are not orthogonal; often they are intricately linked. Therefore, it would be simplistic to

consider power consumption in isolation from the other criteria. Certainly, the framework

3
developed in this thesis is applicable to these other metrics. Wherever power consumption is

considered in this work, the effect on other metrics is noted. Although sometimes gains in one

metric must be traded for losses in another; sometimes both can be simultaneously improved.

Specific contributions of this thesis are:

• Definition of a framework for the systematic exploration of power consumption

in synchronization systems.

• Development of a fast and accurate method for power estimation that is within

15% accurate of the best available method and over 50 times faster. This is an

enabling step in being able to systematically characterize synchronization power

consumption over a meaningfully sized parameter space.

• A systematic exploration of feed-forward data-aided frequency estimation

algorithms that resulted in the development of straight-forward rules for which

algorithm to choose for a given system specification. Simultaneous reductions in

energy consumption and convergence time of more than a factor of 4 are possible

in some scenarios.

• A systematic exploration of the Farrow-style interpolating filter which is critical

to the future systematic exploration of most timing recovery algorithms. Joint

optimization of interpolation and ADC power consumption illustrate the ability

of this framework to lower system power consumption, not just block power

consumption.

• Application of the frequency estimation exploration results to reduce the energy

consumption of the frequency estimation unit of the PNII system by 84% and the

convergence time by 50%.

• Systematic comparison of 4 different phase and frequency synchronization

methods for the PNII system including considering differential versus coherent

4
modulation schemes. Synchronization energy consumption was reduced by 66%

resulting in a system energy consumption reduction of 7% for coherent schemes,

and it was determined at what packet lengths it makes sense to move to

differential modulation. This is the first instance, to the author’s knowledge, in

which differential versus coherent modulation was systematically evaluated from

a system power consumption standpoint.

• Using the framework developed here, a synchronization system was designed for

a wireless sensor network application that consumes 300uW (including ADC

power). This is low enough so that further reduction has very little impact on

total system energy consumption.

While full characterization of the synchronization space is not completed in this work,

following it through to completion is a worthwhile goal. Results of this work show that this type

of exploration has meaningful impact on system performance. Completion of this research will

have a few fundamental ramifications. First, it will instruct proper algorithm selection for given

synchronization parameters. Second, it will illustrate which synchronization parameters are the

most difficult to estimate in terms of power consumption or convergence time. Third, it will

highlight areas where existing algorithms are inefficient. These answers most likely change in

different channel environments and over different modulation schemes and data rates. This

information can highlight promising areas for new algorithm development. It can assist in

producing the most efficient implementation for existing wireless communication standards. And

finally, it can assist in the creation of new wireless communication standards to meet the quality

of service goals with the lowest power or lowest synchronization overhead.

The remainder of this dissertation is as follows: Chapter 2 details the background information

necessary to understand this work including defining a classification of synchronization

algorithms, and the metrics on which synchronization algorithms are evaluated. Chapter 3

describes the tools used for system simulation and analysis, implementation, and power

5
consumption estimation. Chapter 4 describes the PNII system, shown in Figure 1-1c to motivate

the necessity of this research and to provide a system example for illustrating the improvements

possible with this research. Chapters 5 and 7 delve into systematic exploration of the power

consumption of specific classes of synchronization algorithms (frequency estimation, and timing

recovery respectively). These serve as examples for how algorithm exploration should be

conducted, and what information is necessary for these results to be used in a system design.

Chapters 6 and 8 move back up the system level to apply the techniques developed here to show

the significance of the results at the system level. First, in Chapter 6, the results of Chapter 5 are

used to improve the PNII system described in Chapter 4. Second, in Chapter 8, the framework is

applied to a wireless sensor network system (where power is the primary concern) to reduce the

power consumption of the synchronization system and show the impact on the system power

consumption.

6
Background
2
2.1 Introduction

This chapter details all the background information required to understand this thesis. For

some topics the reader is referred to canonical references. Topics are more fully described here

when canonical sources don’t exist, the information is used in a unique way for this work, or the

information is deemed too fundamental to this work to be omitted.

It is assumed that the reader is familiar with basic digital communication theory to the extent

described in the text by Proakis [PRO]. In particular, familiarity with the standard modulation

schemes such as OOK, M-PSK, M-QAM, and DSSS are required. The concept of theoretical

bounds on BER versus SNR for different modulation schemes is assumed known; however

specific bounds are reiterated when used. The reader is expected to be familiar with the use of

transmit filters such as the root-raised-cosine (RRC). Basic channel concepts such as multi-path,

frequency selective vs. frequency flat fading and the basic techniques used to combat these

effects such as AGC and equalizers are assumed. Familiarity with the basic network protocol

stack (especially physical, data-link, network, and application layers) including basics of media

access control (MAC) is also helpful [ISO].

7
It is assumed that the reader is familiar with basic low power digital design principles to the

extent described in [RAB]. While no esoteric low power circuit techniques are used in this thesis,

these techniques can be applied orthogonally to these algorithms for further power reduction. It is

assumed that designers make use of the standard low power techniques available as built-in

functionality to industry standard tools, specifically, parallelization, optimizing out fixed

parameters from logic, gated clocks, low-leakage standard cell libraries, and using the lowest

supply voltage required for correct circuit operation.

The remainder of this chapter sets out to describe three other pieces of background

information. First, synchronization is described within the context used in this thesis. Second,

the metrics for comparing different synchronization algorithms are discussed. Last, the indoor

wireless channel model used for many examples throughout the thesis is defined.

2.2 Synchronization

A canonical communication system (Figure 2-1) typically considers the source and channel

coders (classified as outer receiver functionality), and some channel that perturbs symbols.

However, a realistic communication system (Figure 2-2) also considers what is called the inner

receiver consisting of the modulator, a waveform channel (one that perturbs transmitted

waveforms, not the more simplistic one that just perturbs symbols), and the synchronization

system in the receiver.

Figure 2-1: Illustration of a typical communication system

8
Figure 2-2: Realistic communication system includes synchronization system

The four salient synchronization parameters are timing (θε), phase (θφ), frequency (θΩ), and

amplitude (θΑ) (some of which may include multipath effects). Timing errors occur because of

the small mismatches in the transmitter and receiver oscillators and from the unknown time of

flight between transmitter and receiver (Figure 2-3).

Figure 2-3: Illustration of timing error

Phase errors occur because of mismatches in the transmitter and receiver carrier references

and from the unknown time of flight between the transmitter and receiver. In multipath channels,

each multipath arrival has a different time of flight, and therefore a different phase error to be

estimated.

Amplitude errors arise mostly because of attenuation in the channel, but also are contributed to

by mismatches in the transmitter and receiver front-end gain stages. As with phase errors, in

multipath channels, each multipath arrival takes a different path through the channel and therefore

has a different attenuation.

Frequency errors, more correctly termed carrier frequency errors, are cause by a frequency

mismatch in the transmitter and receiver carrier references (Figure 2-4). Frequency errors show

9
up as a rotating phase error in the received signal. While, it is possible to lump frequency errors

in with phase errors, most systems correct for frequency separately from phase, and therefore, it is

classified as a separate synchronization parameter.

Figure 2-4: Illustration of frequency error

With all four parameters, there is a notion of the rate of change being either slowly-varying or

static. What is important is whether the parameter varies enough to matter over the observation

interval. If not, it can be treated as static for the purposes of synchronization. While static

parameters can be estimated once and that estimate used for the interval over which the parameter

is deemed to be static, varying parameters need to be either continually re-estimated or

continuously tracked. Of course, re-estimating or tracking parameters costs more power and area

(and potentially more synchronization preamble bits) than estimating static parameters once.

Sometimes system design can be used to reduce the number of varying parameters, and therefore

the power consumption of the synchronization system. One instance of this is using clock

references with tighter specifications so the variation over, say, one packet is negligible. Here is

where the channel model (including the variation of clock references, and front-end components)

is critical for specifying the required functionality of the synchronization system.

Estimation algorithms can be classified according to their type along two axes: the

configuration of the estimator and parameter adjustment blocks, and what additional information

is used to achieve the estimation. There are two configurations for the estimation and parameter

adjustment blocks: feed-forward (FF) and feed-back (FB) (Figure 2-5). In FF systems, the

estimator receives the input signal and computes the parameter estimate which is fed to the

10
parameter adjustment block. In FB systems, the estimator receives the output of the parameter

adjustment bloc and computes an error which is fed back to the parameter adjustment block.

Figure 2-5: Feed-forward vs. feedback estimation

There are three categories of what additional information is used to achieve the estimation:

non-data-aided (NDA), data-aided (DA), and decision-directed (DD). When no additional

information other than the input signal is used, the estimation is termed non-data-aided. When

known data symbols are sent (such as within a synchronization header, or pilot symbols

interspersed with the data), and these known data symbols are used to help the estimation, it is

called data-aided estimation. When no known data is sent, but detected symbols are used in the

place of known-data symbols, the estimation is called decision-directed. Non-data-aided and

data-aided estimation can be performed in a feedback or feed-forward configuration, however,

since detected symbols can only be know after parameter adjustment has been made, decision-

directed estimation can only be performed in a feedback configuration.

All together, there are 20 different algorithm classifications (4 parameters x 5 estimation

types). Each classification can contain tens of algorithms that have been proposed in the

literature in addition to any new algorithms that are developed in the future. This thesis addresses

8 of these classifications in varying degrees (Figure 2-6). First, Chapter 5 performs a complete

exploration of 4 different feed-forward data-aided frequency estimation algorithms. The results

of this exploration are twofold. First, it is determined which among these four algorithms

achieves the lowest power for a given input SNR and variance requirement. Second, absolute

11
numbers for power consumption and convergence time are determined which allow these

algorithms to be evaluated in a system-level framework. Chapter 5 serves as a model for how

these comparisons should be conducted and the results that are needed to allow a system level

designer to make use of the information.

The component exploration in Chapter 5 is continued in Chapter 7. A major component of

most timing recovery algorithms of any type is a timing interpolator to perform the parameter

adjustment. Therefore, a study of timing recovery algorithms relies on accurate power

consumption estimates of interpolators of various sizes and performance. Chapter 7 performs a

thorough study of the commonly used Farrow type of interpolator over a wide range of

parameters. The results of this work can be used to conduct the study of timing recovery

algorithms of all types.

The other three chapters explore entire synchronization systems rather than just a single block.

Within these chapters, several types of synchronization algorithms are used. In Chapter 4, timing

is performed in two steps. The coarse estimation is done with a feed-forward data-aided

algorithm. The fine timing estimation is done jointly with the frequency estimation and uses a

different feed-forward data-aided algorithm. Timing tracking is done with a non-data-aided feed-

forward algorithm. Phase acquisition is performed using a data-aided feed-back algorithm, and

phase tracking is done with a feed-back decision-directed algorithm. In Chapter 6, different

frequency and phase estimation methods are compared. In addition to the method used in

Chapter 4, a feed-forward non-data-aided phase estimation method is explored for initial

estimation and tracking. In Chapter 8 a feed-forward data-aided timing recovery method is

compared to a feed-back data-aided method. In addition, a feed-forward non-data-aided

algorithm is used for amplitude estimation.

12
Figure 2-6: Synchronization algorithm classification. Highlighted blocks are those
addressed in this thesis.

2.3 Metrics for Comparing Algorithms

Systems in this thesis compared on a cost vs. performance basis. For synchronization

algorithms, cost is a multi-faceted metric. Three interrelated components usually are considered:

power consumption, area, and component cost. Area and component cost are usually inextricably

tied because each square millimeter of silicon area costs more money. However, area also

determines how small the package and potentially how small the ultimate system. Component

cost also includes the cost of external components, such as off-chip filters and crystal oscillators

(whose cost scales with required accuracy). Power consumption affects size and component cost

through the size of the battery, or in cooling mechanisms to dissipate the generated heat. Power

consumption also affects quality of service, in that the device may need the batteries recharged

more often.

Variance and convergence time are the main metrics used to measure the performance of a

synchronization algorithm. Specifically, the variance measured is that of the parameter estimate

produced (assuming the estimation is unbiased). If the estimation is biased, MMSE may be a

more appropriate metric. Convergence time is the number of symbols required to achieve that

variance. Bounds, called the Cramer-Rao bounds (CRB), are available to determine what

13
variance is theoretically possible for different synchronization parameters given the input SNR

and convergence time. The actual Cramer-Rao bounds, especially for timing estimation, depend

on the actual received waveform, so are dependent on modulation rate among other things, and

can be difficult to calculate exactly. Approximations are available, called modified Cramer-Rao

bounds (MCRB), given in [DAN] for phase and frequency.

1
MCRB (φ ) = (2-1)
2 N ( Es N 0 )

6
MCRB(Ω) =
( )
N N − 1 ( Es N 0 )
2
(2-2)

where N is the estimation length or convergence time, and Es


N0 is the signal to noise ratio for

symbols.

Tighter bounds are given in [TAV] for M-PSK signals, but are more difficult to compute.

There are algorithms for phase and frequency estimation that are known to achieve the CRB at

high SNR. The CRB for timing is given in Meyr [MEY] under some realistic simplifying

assumptions: 1) independent noise samples, 2) signal pulse shape, g(t), is real, and 3) random

data.

+∞
∫−∞ G(ω ) dω
2
1
CRB (ε ) = (2-3)
2 N ( Es N 0 ) T 2 + ∞ω 2 G (ω ) 2 dω

−∞

Of course, the SNR gives a direct measure of the amplitude variance for one symbol. Therefore,

CRB ( A) = 1
N ( Es N 0 )
.

Next, the block-level metrics of variance and convergence time are translated to system-level

metrics. Convergence time is the easiest, since the convergence time for all synchronization

blocks can be summed to get the total convergence time (assuming no synchronization blocks

operate in parallel). However, translating the different variances for each synchronization

parameter into a global system specification is more difficult.

14
The official goal of the inner receiver system, as defined by Meyr [MEY], is to produce output

Y such that the outer receiver performance is as close as possible to the case where the estimated

values are equal to the actual values, i.e.

{θˆε ,θˆφ , θˆA ,θˆΩ } = {θε , θφ ,θ A ,θ Ω } . (2-4)

This combined effect can not be evaluated until the entire system is designed and simulated

together because it includes interactions between the synchronization parameters and the coding

used in the outer receiver. For this reason, it is impossible to partition separately amongst the

different synchronization blocks. Instead, the SNR margin metric is used in practice. Typically,

a communication system will specify a data rate and uncoded BER requirement. The input SNR

to the inner receiver will contain some margin over the theoretical SNR required to achieve this

BER. This SNR margin is typically how synchronization systems are specified and evaluated.

The total SNR margin is usually divided amongst the synchronization blocks using designer

experience to get an initial partitioning, and iterating once preliminary design of the different

synchronization blocks is completed. This ad-hoc process is not guaranteed to achieve the

optimal system design, but it is the method available using current information. The results of a

complete exploration of the synchronization space would allow this process to be deterministic

and achieve the optimal design. However, this process is currently prohibitive for any practical

system.

Formulas that compute the SNR degradation versus the variance of different synchronization

algorithms are used to convert between the metric of the algorithm: variance; and the metric for

the system: SNR loss. The BER degradation due to amplitude is easy to calculate since it can be

directly tied to SNR. The BER degradation for timing and phase errors is more difficult and is

treated in [MEY] for M-PSK, M-PAM, and M2-QAM modulation. [MEY] gives approximations

to the degradation, D (measured in dB) defined as the increase in Es/N0 required to maintain the

same BER as the receiver without synchronization errors. The approximation,

15
10
D= (A + 2B ⋅ Es N0 ) var[ψ ] (dB) . (2-5)
ln 10

is officially valid for BER degradations < 0.2 dB (but is pretty accurate in most scenarios for D<1

dB). Table 2-1 gives the parameters A and B for degradation due to carrier phase and timing

errors for M-PSK, M-PAM, and M2-QAM.

Table 2-1: SNR degradation due to carrier phase and timing errors for PSK and QAM
modulation

Carrier Phase Errors Timing Errors


A B A B
M-PSK 1 ( M)
cos 2 π − h′′(0)T 2 ∑ (h′(mT )T )
m
2
M =2
∑ (h′(mT )T )
2
1
2 m
M >2
M- 1 0 − h′′(0)T 2 ∑ (h′(mT )T )
2

PAM m

M2 - 1 ½ − h′′(0)T 2 ∑ (h′(mT )T )
1 2
2
QAM m

The quantity, A, accounts for a reduction in the useful signal. The quantity, B, accounts for an

increase in the variance at the input of the decision device. Observe that the degradation due to

timing errors is dependent on the transmit pulse shape, h(t).

2.4 Wireless Channel Models

The simplest channel is the additive white Gaussian noise (AWGN) channel where noise with

a Gaussian distribution of zero-mean and variance σ2 is added to symbols in the channel. This

channel is often used when exploring outer receiver functionality. To explore inner receiver

functionality, a more complicated channel must be considered. This channel model must include

the effects of the transmitter and receiver front-ends in addition to the effects of the channel (over

the air).

Effects of the transmitter and receiver local oscillators and carrier references can be modeled

in a straight-forward manner using just one offset that is the sum of the errors in both the

16
transmitter and receiver. To model timing offset in simulation, an interpolation filter can be used.

To model carrier frequency offset, the modulated waveform is multiplied by a rotating phasor.

The clock accuracy (specified in parts per million or ppm) is an important parameter because

it determines how often the timing needs to be re-estimated. If the required timing estimation

resolution is εT, where T is the symbol period, clocks can drift up to ½ εT over the course of the

packet before needing to be re-estimated. If the crystal accuracy ppm is lower than 1e6*ε/2N,

then no timing tracking is needed. Where N is the maximum number of symbols in a packet, and

ε is the fractional timing resolution requirement. A similar equation can be used for determining

whether the frequency estimation is essentially static over the course of the packet.

Amplitude and phase models are more complicated since they depend on a combination of

factors in the wireless environment. This work uses the approach given in [ITU], reproducing

here the general channel modeling equations. However, in the interest of brevity, only the actual

coefficients for a 2GHz indoor office channel are given because that is the one used in this thesis

wherever a channel model is required.

Path loss effects are divided into two effects: average path loss, and associated shadow fading

statistics. Average path loss is that loss that is common to all multipath arrivals and is given by

Ltotal = 20 log10 f + P log10 d + L f (n) − 28 dB (2-6)

where P is the distance power loss coefficient, f is the frequency (in MHz), d is the separation

distance in meters between the two terminals (d > 1 m), Lf is the floor penetration loss factor in

decibels and n is the number of floors in a multi-story building between the two terminals (only

included when n ≥ 1 ). Table 2-2 outlines the parameter values used for the indoor office

environment at 2GHz.

17
Table 2-2: Average path loss parameters for an indoor office environment at 2 GHz [ITU]

Parameter Value
P 30
f 2,000 MHz
d 1-100 m
Lf 15+4(n-1)

Paths with a line of sight (LOS) component are dominated by free-space loss and have P=20.

The indoor shadow fading statistics are log-normal with a standard deviation of 10dB for our

channel.

The radio propagation channel varies in time and with spatial displacement. Even in the static

case where the transmitter and receiver locations are fixed, the channel can be dynamic since

scatters and reflectors are likely to be in motion. The term multipath arises from the fact that,

through reflection, diffraction, and scattering, radio waves can travel from a transmitter to a

receiver by multiple paths. There is a time delay associated with each of these paths that is

proportional to the path length. Each delayed signal has an associated amplitude (with real and

imaginary parts) and together they form a linear filter with time-varying characteristics. Since the

radio channel is linear; it is fully described by its impulse response. The impulse response is

usually represented as a power density that is a function of excess delay, relative to the first

detectable signal.

Although the r.m.s. delay spread is very widely used, it is not always a sufficient

characterization of the delay profile. However, if an exponentially decaying profile can be

assumed, it is sufficient to express the r.m.s. delay spread instead of the power delay profile. In

this case, the impulse response can be reconstructed approximately as:

⎧e − t /τ r . m . s . for 0 ≤ t ≤ t max
h(t ) = ⎨ (2-7)
⎩ 0 otherwise

18
where τ r .m.s . is the r.m.s delay spread and tmax is the maximum delay ( t max >> τ r .m.s . ). Table 2-3

outlines the r.m.s. delay spreads used for the example channel. Within a given building, the delay

spread tends to increase as the distance between antennas increases.

Table 2-3: R.m.s. delay spread for 2 GHz indoor office environment [ITU]

Low value Median value High value


appearing frequently appearing frequently appearing rarely
τ r . m. s . 35 ns 100 ns 460 ns

One way to model the statistical nature of the channel is to replace the many scattered paths

that may exist in a real channel with only a few N multipath components in the model. With this

method, a complex Gaussian time variant process gn(t) models the superposition of unresolved

multipath components arriving from different angles with different delays close to the delay τn of

the n-th multipath component. Then, the impulse response h(t) is given by:

h(t ) = ∑n =1 pn g n (t )δ (t − τ n ) ,
N
(2-8)

where pn is the received power of the n-th model multipath component.

The JTC channel models [JTC] give three different instantiations of the channel for

simulations of indoor office environments. Channel A is the least severe, Channel B is

intermediate, and Channel C is extremely severe. The coefficients for the model are given in

Table 2-4. Note the indoor channel models use a flat Doppler spectrum, whereas models for an

outdoor channel usually use the Jakes Doppler Spectrum [DEN] to determine the correlation in

channel coefficients over time.

19
Table 2-4: JTC indoor office environment channel models [JTC]

Channel A Channel B Channel C


Relative Average Relative Average Relative Average Doppler
Delay Power Delay Power Delay Power Spectrum
Tap (ns) (dB) (ns) (dB) (ns) (dB)
1 0 0 0 0 0 0 Flat
2 50 -3.6 50 -1.6 100 -0.9 Flat
3 100 -7.2 150 -4.7 150 -1.4 Flat
4 325 -10.1 500 -2.6 Flat
5 550 -17.1 550 -5.0 Flat
6 700 -21.7 1125 -1.2 Flat
7 1650 -10.0 Flat
8 2375 -21.7 Flat
τ r . m. s . 43 116 598

The Doppler spectrum (whether Jakes or flat) is defined by the maximum Doppler frequency

shift in the channel given by:

f D max = v ⋅ f c / c (2-9)

where fc is the carrier frequency, c is the speed of light, and v is the maximum speed of objects in

the channel (whether the transmitter, receiver, or scattering or reflecting elements in the channel).

For the 2 GHz indoor channel, 10 Hz is a common value for fDmax (translating to a speed of

around 6km/hr).

The maximum Doppler frequency is an important parameter because it dictates how quickly

the channel is changing and therefore whether the phase and amplitude synchronization

parameters are static or slowly varying. Specifically, 1 / f D max is the coherence time of the

channel, or the time at which channel estimates become uncorrelated with each other. Therefore,

if the estimate made at the start of the packet is to be, say, x% correlated with the last symbol in

the packet, the packet length (tpacket) must be shorter than

t packet < (1 − x%) / f D max . (2-10)

The r.m.s. delay spread is also an important parameter because it determines whether the

channel is frequency selective or frequency non-selective (flat). A synchronization system for a

20
frequency selective channel must combat multipath effects (for instance with the use of an

equalizer or RAKE receiver), but no such complexity is required for the flat channel.

Specifically, 1 / τ r .m.s is the coherence bandwidth of the channel, or the frequency difference over

which channel estimates become uncorrelated with each other. Therefore, if the bandwidth of

the signal is less than 10% of the coherence bandwidth, we say the channel is flat and multipath

effects need not be considered. However, if the signal bandwidth is greater than 10% of the

coherence bandwidth, the channel is frequency-selective, and multipath effects must be taken into

account:

⎧< 0.1(1 / τ r .m.s. ) flat


BW ⎨ . (2-11)
⎩> 0.1(1 / τ r .m.s. ) frequency selective

21
Evaluation and Exploration Environment
3
3.1 Introduction

Simulation and implementation tools are an important component of this research. First, a

rich simulation environment for communication algorithms is required. Second, the ability to

move quickly from simulated algorithm to implementation is also desired. Lastly, two levels of

power estimation are needed. The first is to accurately estimate the absolute power consumption

of an algorithm. The second is to compare different synchronization systems in framework that

considers total system power consumption. A flow diagram of the different tools used in this

research is show in Figure 3-1. To compare two algorithms, only relatively accurate estimations

are required. However, an absolutely accurate power estimation, though more difficult to

achieve, is necessary to be able to use in the system framework where power consumption of

other components is included.

22
Figure 3-1: Flow diagram of tools used in this thesis

Packet-based communication systems often require re-synchronization with every packet.

Especially in ad-hoc networks where transmitters communicate with different receivers at

different times, the synchronization parameters are different every time and therefore can not be

stored between packets. In this case, the synchronization convergence time can be a significant

portion of the packet length. The energy expended in the synchronization along with the energy

transmitting and receiving the synchronization header must be calculated into a system-level

metric. Different synchronization algorithms may take different amounts of time to converge to

the required accuracy. In this case, the algorithms must be compared in a system framework.

Higher power algorithms with shorter convergence time may be favored over lower power

algorithms with longer convergence times. In order for the designer to make the appropriate

trade-off, the power estimates must be absolutely accurate, and the power consumption of other

subsystems, such as the transmitter and receiver front-end power, must be known.

23
3.2 Simulation and HDL Description of Algorithms

Synchronization algorithm implementation costs (area and power) are often dominated by

datapath operations such as multipliers and adders, with relatively simple control requirements.

Mathworks Simulink [MAT] was chosen for algorithm simulation. It is a graphical data flow tool

with many provided library functions which make it easy to simulate and analyze communication

systems. An additional program, Stateflow [MAT], is integrated into Simulink to allow graphical

entry of state machines for programming control functions. An example of a Simulink

synchronization system simulation is shown in Figure 3-2.

Figure 3-2: Example synchronization system in Simulink

For hardware coding, Synopsys Module Compiler [SYN] was chosen as the entry point for the

datapath portions of the algorithms. Its high-level HDL language allows an algorithm to be

parameterized and later synthesized in different configurations. (For instance, it’s possible to

synthesize a frequency estimation algorithm for different input SNRs and estimation lengths.) It

is built to optimize datapath operations with features such as allowing adder implementations to

24
be easily customized between carry-save and ripple-carry. It is known to achieve better area for

datapath blocks than standard HDL synthesizers [HAI].

The use of Module Compiler enables re-use of many smaller modules within larger designs.

Some built-in functions in module compiler have facilitated easy implementation of

communication algorithms in this thesis:

• Various adder types (carry-save, ripple-carry, etc.)

• Various multiplier types (booth, signed/unsigned, +/- A*B, A*(B+C) where C is one

bit, pipelined/unpipelined, etc.)

• Square (special multiplier for two identical inputs)

• Scalar Multiply ACcumulate (MAC)

• Comparators/Muxes/Selectors

• Shift registers

A small library of the following parameterized blocks built on the basic blocks has served to

implement most blocks in this thesis:

• Filters (fixed and adaptive coefficients are automatically detected by Module

Compiler)

• CORDIC (a single parameterized CORDIC slice is arrayed in several configurations

to implement iterative/pipelined angle-finder/rotator blocks)

• Complex MAC

By creating a Simulink library of corresponding parameterized blocks, larger designs can be

implemented and simulated in Simulink with good assurance that they can be quickly translated

to the equivalent behavior in hardware. Verification test-benches ensure that the Simulink and

hardware versions are equivalent through simulation.

For control flow, an automated tool, called SF2VHD [CAM], automatically converts Matlab

Stateflow diagrams into VHDL for synthesis. Since the control is usually a small part of the

25
synchronization algorithm, no effort was spent optimizing these state machine implementations

beyond compilation in standard synthesis tools.

For power comparison, each algorithm is coded as a parameterized module in Module

Compiler. Each module is synthesized as a gate-level VHDL netlist in Module Compiler for a

range of parameters, such as input SNR and estimation length. Realistic input vectors for each

block are synthesized in MATLAB by simulating the block inside a realistic system and capturing

the inputs. Each synthesized VHDL netlist from Module Compiler is sent through the gate-level

power estimation tool using the input vectors from MATLAB.

For the examples in this work, power estimation is done assuming a 0.13um technology. In

the component exploration sections of this thesis (Chapter 5 and 6), the impact of changing

technology on the presented results are discussed. In all cases, the highly automated flow allows

automatic re-characterization in a new process once new libraries are available.

3.3 Gate-level Power Estimation

The most accurate power estimation method available with current tools is to extract parasitics

from a post-placed-and-routed design and simulate in a switch-level simulator like Power-Mill or

Nano-Sim (called Extracted Physical or EP estimation method). Our own experience and reports

from our foundry show this method of power estimation to be within 15% accurate compared to

power consumption of actual chips. However, placing and routing a design can take considerable

time, and switch-level simulation is very slow. It can take up to two days to complete the

placement, routing, extraction, and simulation of a moderately sized block with today’s

computers and tools. Since this research relies on the accurate power estimation of several

algorithms across many different parameter sets (for instance over 100 frequency estimation

blocks), this research would be impossible with power estimation this slow. Therefore, a faster

power estimation method was required. The method should automatically characterize the same

algorithm over a set of parameters, and make as much use as possible of existing tools. In this

26
way, this power estimation flow benefits from the constant improvements made in the existing

tools.

Faster methods of power consumption than the EP method are available, but typically incur

errors in proportion to their estimation speed. Therefore, to get the fastest estimation feasible for

this research, it is necessary to examine the required power estimation accuracy. To reach the

correct conclusion when comparing two items, the accuracy of the estimate must be better than

the difference between the two items being compared. As stated in Chapter 1 synchronization

systems can consume around 15% of the physical layer power. In order to make an impact on

system power consumption (say greater than 5%), synchronization power consumption has to

improve by at least 30%. Estimation accuracy should be on the order of (or better than) this

desired improvement. Figure 3-3 shows an estimation accuracy of 30% and the desired

improvement of the original versus the revised system of 30%. In order to guarantee that the

actual revised system is at least 30% better than the original system, the estimates have to show

an improvement of almost a factor of two (y=50%). Since test chips are not available for

comparison, the proposed power estimation method will be compared to the EP estimation

method. Therefore, a method which is within 15% accurate of the EP method is required.

27
Figure 3-3: Estimation accuracy requirements

The fastest, but least accurate power estimation methods are statistical gate-level methods

(called PG for probabilistic gate-level). Here, the gate-level netlist is analyzed assuming

statistical activity factors on the inputs, which are propagated throughout the design to produce a

statistical activity factor for each net. Statistical activity factors are multiplied by statistical wire

load models, and statistical switching probabilities of the gates to produce a power estimate.

Because communication data is often highly correlated, these statistical methods, which assume

randomness, are not accurate enough for our purposes. For instance, in an illustrative experiment

a complex MAC with 8-bit inputs and 23-bit outputs, consumes 22 uW with random inputs, but

only 10uW with realistic inputs as would occur in a frequency estimator of a communication

system.

To capture the power savings from correlations in the data stream, the design must be

simulated to determine the actual activity factors on each net and within each gate. Gate level

simulation is around 50 times faster than switch level simulation not including time to place and

route, and requires fewer tools. Existing synthesis tools, such as Synopsys Design/Power

28
Compiler have the built-in capability to use gate-level simulation information to produce a power

estimate. However, typical gate-level power estimation with simulation (called SG for simulated

gate-level) is still not accurate enough because there are some critical components missing from

all gate-level estimation methods.

A typical flow for taking a gate-level netlist to a placed-and-routed netlist is:

• Place the gates in standard cell rows

• Insert a clock tree and route the clock net

• Insert hold time buffers to eliminate race conditions between register stages

• Route the signal nets

In comparison to the EP method, the SG power estimation is missing 3 pieces of information

which make the estimations less accurate. First and most important is the clock tree, which often

accounts for 30-50% of the block power. Second, the power of the hold time buffers can be

significant especially where there is little combinational logic between registers (such as in

communication systems components like filters and delay chains. Third, the exact wire loads are

not known until the design is placed and routed.

An accurate gate-level power estimation method (called AG for accurate gate-level) has to

address these three issues. The easiest issue to address is the wire loads. Although the exact

length of each wire is unknown before placement and routing, current tools do a good job of

estimating the average load of a wire in the system. These estimates are based on the technology

and the number of gates in the block. Since placement tools don’t use information about the

activity factor on the nets, they are just as likely to force long routes on high-activity wires as

low-activity wires. Therefore, the statistical wire load models are used. The second issue to

address is the hold time buffers. Hold time buffers are averted if there is enough combinational

circuit delay or wire delay between registers. Placing hold time buffers are placed assuming

statistical wire load models. Insertion of hold-time buffers is achieved in Synopsys Design

29
Compiler with a built-in function that fixes hold times on specified nodes. The last issue to

address is the clock tree insertion. It turns out that the exact clock tree is not necessary for power

estimation purposes. It is possible to force the tools to insert a “good enough” clock tree into the

gate level netlist. This is achieved by tagging the clock as a high-fanout node in Synopsys Design

Compiler. By placing constraints on the rise and fall times of the clock net, the tool inserts a

“good enough” clock tree into the design. By addressing these three issues, gate-level power

estimation accuracy can be within 15% of the EP method as will be shown below. Of course, the

accuracy of the estimation relies on the accuracy of the standard cell library characterization. To

achieve these results, no extra characterization was required. The foundry-supplied libraries were

characterized well enough for to meet the power estimation accuracy goals.

As process technology scales, leakage power is becoming a significant source of power

consumption both when blocks are in use and when they are in standby mode. Because leakage

power can be significant, it is included in the power consumption estimates produced by the AG

method. In standby mode, aggressive low power designs have block-level gated clocks and

power rails. By gating both the clocks and power rails, standby power is reduced to near zero and

need not be considered in the system power framework.

The new AG estimation flow is shown in Figure 3-4. Each VHDL netlist is incrementally

compiled in Synopsys Design Compiler to insert a clock tree and to add buffer delays to fix hold

time violations. The block is then simulated at the gate level in ModelSim using realistic input

vectors to verify functionality and to determine the switching activity on each node. Synopsys

Power Compiler is used to estimate the power consumption of the block using the back annotated

switching activity and statistical wire load models.

30
Figure 3-4: Accurate gate-level power estimation flow

Five frequency estimation blocks with a wide range of parameters were compared using the

AG method versus the EP method. The results are shown in Figure 3-5 along with the SG power

estimation method. Over a wide range of block sizes, the AG estimation is within 15% of the EP

estimation (see error bars) however, the SG method had errors of 30-50%. The Makefile and

scripts for running the AG power estimation for a range of frequency estimation blocks are given

in Appendix A.

31
EP vs. AG Power Estimation Method Group 1
160.00 (EP)

140.00
Group 2
(EP)
120.00

100.00 Group 1
Power (uW)

(AG)
80.00

60.00
Group 2
(AG)
40.00
Group 1
20.00 (SG)
0.00
Group 2
1 2 3
(SG)
L

Figure 3-5: Proposed power estimation method comes within 15% of the EP method for a
wide range of block sizes.

The AG power estimation method is over 50 times faster than EP method (not including the

time required to place-and-route the block and thereby extract accurate parasitics). The total time

to characterize 21 different chosen instantiations of a frequency estimation algorithm is less than

3 hours using the AG method. Execution time will vary with the size of the block, the duration of

the simulation interval, and different server processor and memory configurations.

3.4 Analog to Digital Converter Power Estimation

The analog to digital converter (ADC) is often a significant power-consuming component of a

communication system. Because different synchronization systems place different requirements

on the ADC, a method to estimate the power consumption of ADCs with different specifications

is required. In a survey of over 100 ADCs published in the literature from 1978 to 1999 [WAL],

a simple but accurate architecture-independent figure of merit (FOM) is determined for

comparing them:

32
2 SNRbits ⋅ f samp
FOM = . (3-1)
Pdiss

Here fsamp is the sampling rate, Pdiss is the power dissipation, and SNRbits is the equivalent

number of bits given by:

SNRbits = ( SNR (dB) − 1.76) / 6.02 . (3-2)

FOMs from the surveyed ADCs range between 1x1010 and 1.2x1012 with a mean around

1x1011. Given that the designer does the best design possible with the given process technology,

the FOM is dependent on how extreme are the given ADC specs relative to the fundamental

process capabilities. For instance, an fsamp that is closer to the maximum frequency of a process is

likely to achieve a lower FOM than one that has a much lower fsamp. Therefore, to predict the

power consumption of an ADC with arbitrary specifications, one needs to find an appropriate

FOM. This can be achieved by finding a similar ADC in the literature and using the same FOM,

or by extrapolating an FOM by determining how extreme the required specs are relative to the

fundamental process capabilities.

3.5 System Power Estimation Tool

To compare algorithms with different convergence times, a system-level power estimation

tool is required. For the purposes of this work, two communication system variables are

generally considered: the length of the header, and the transmit power. Other variables, such as

the number of bits per packet, and the required BER are typically fixed for a given scenario.

Because the synchronization system is well within the physical layer, a sensible metric is energy-

per-useful-bit (EPUB), taking energy over the physical layer components. EPUB may not be the

right metric for upper levels of the protocol stack, like the network or MAC layer (where network

uptime or latency may also be considered). For instance, the number of packet collisions

increases with increasing packet length. Therefore, packets with more header bits will incur more

packet collisions, and therefore, more energy per useful bit. However, for comparisons where the

33
difference in packet lengths is within 10%, the increased power consumption due to increased

packet collisions can be safely ignored. Therefore, EPUB is used because it is simple and

adequate for the purposes of this research.

The energy consumed by the system per packet includes the power in the transmitter and

receiver and is equal to:

EP = ( BS + BD )( PDiss ,TX + PDiss , RX ) + BS PS + BD PD (3-3)

Where BS is the number of synchronization bits, BD is the number of data bits, PDiss,TX is the

transmitter power dissipation including radiated power, PDiss,RX is the receiver front-end power,

PS is the baseband power when synchronizing, and PD is the baseband power when receiving

data. Energy per useful bit is computed by dividing EP by BD.

EPUB = EP (3-4)
BD

3.6 Conclusion

The MATLAB Simulink and Stateflow tools are used for simulation and analysis of

communication algorithms. SF2VHD and the developed libraries in Synopsys Module Compiler

allow quick translation into implementation. An accurate and fast power estimation method has

been developed. The key steps to getting accurate power estimation at the gate-level are to add a

clock tree, hold time buffers, and to simulate with realistic input vectors. These steps are

achieved using Synopsys Designs Compiler, Power Compiler, and ModelSim. This method has

proven to be within 15% accurate versus the EP method, and believed to be 30% accurate versus

real chip measurements.

Use of parameterized modules in Simulink and Module Compiler allow one hardware

description to be automatically synthesized, verified, and characterized over a wide parameter

space. Because the block-level estimation is absolutely accurate, it is possible to compare

algorithms in a system-level framework using the EPUB metric.

34
PNII System
4
4.1 Introduction

The PNII system is a 1.6 Mbps personal area network system designed to carry voice over

short distances (10-30 m) for wireless intercom type applications [AMM]. PNII was the impetus

for this research on low power synchronization. Much effort was expended to make PNII a low

power synchronization system. Despite these efforts, the synchronization system still consumed

18% of the physical layer power. Most of the power reduction effort was centered on circuit

implementation, such as choosing the right adder types and complex multiply structures, using

the lowest possible supply voltage, and gating clocks on unused blocks. Therefore, it was

determined that to further reduce synchronization power consumption it was necessary to move to

higher levels of design, such as up to algorithm selection or system design.

The preliminary design of the synchronization system was documented in [HUS]. Much of

the structure of the physical layer, from the data rate, modulation scheme, and ADC oversampling

rate was dictated by the front-end and system designers [YEE]. This is not an uncommon

scenario in radio design. Often the synchronization system is designed to accommodate

35
constraints dictated by other radio subsystems rather than the other way around. One goal of this

thesis is to show that this is not always an advantageous design methodology from a system

energy perspective.

This chapter is devoted to describing the original PNII synchronization system and some of

the power saving implementation methods employed. This is not to say that this system is in any

way optimal. In fact, parts of the system are provably suboptimal (as will be shown in Chapter

7). Rather, the goals here are threefold: 1) To provide an example of the design and

implementation a complete synchronization system, 2) to provide a design example for future

refinement gains to be illustrated, and 3) as motivation for the necessity of this research.

4.2 System Details

The protocol used in PNII, called Intercom, allows for ad-hoc peer-to-peer communication of

64kbps uplink/downlink between 20 sensor/communicator nodes [AMM]. A data rate of

1.6Mbps and a BER of 1e-5 is required to support this functionality.

The physical layer is made compatible with a commercially available RF front-end

(performing carrier up/down conversion), ADC, and DAC. Although the commercial

components have high power consumption resulting from their tight design specs, the PHY

accommodates significantly relaxed specs for integration with a custom, low-power front end

[YEE] (e.g. by only requiring a free-running clock with 50 ppm accuracy). The chip integrates

all other PHY receiver and transmitter functions, such as carrier detect, synchronization, and

detection.

The air-interface is direct sequence spread spectrum (DSSS) with a length 31 spreading code

at 25 Mcps (Million Chips per Second) and QPSK modulation resulting in a raw data rate of 1.6

Mbps. The primary receiver specifications are ± 100 KHz maximum carrier frequency offset

(+/-50 ppm from a 2GHz carrier reference), and a 50ppm ADC sample clock. The transmit filter

is root-raised cosine with alpha=0.3. The minimum input SNR (per chip) at the input the ADC is

36
-2.9 dB for a SNR per symbol of 12dB1. Ideal detection of QPSK symbols requires 9.6dB to

achieve a BER of 1e-5. Therefore, the 12dB input SNR gives a realistic (if overly generous)

2.4dB implementation target. The PNII supports a typical indoor frequency-selective wireless

channel with mobile units traveling at foot speeds as described in Chapter 2.

4.3 Synchronization System

Figure 4-1: PNII system block diagram.

A block diagram is shown in Figure 4-1. The RX/TX Controller interfaces with the protocol

processor and controls the data flow from one data path block to another. During receive, the

baseband signal is sampled by dual off-chip 8-bit ADCs at 100 Msps (4 samples per chip) using a

free-running clock. These 100 MHz streams are each split into 4 parallel streams of 25 MHz each

so that the BBP could operate off the slower 25 MHz chip clock reducing power by allowing a

1
The original synchronization design required an input SNR per chip of 5dB for a SNR per symbol of 19.9dB. However, as the

system specs were dictated to require 1e-5 BER, it was determined that the original SNR spec was grossly wasteful, and a lower input

SNR could achieve the design goals.

37
lower operating voltage. Parallel filter techniques interpolate the streams to increase the receiver

timing resolution to 8 samples per chip. Performing on-chip interpolation of the signal is lower

power than running the ADC at twice the rate. However, further reduction of the ADC sampling

rate is prohibited by the front-end filter specs.

4.3.1 Timing Recovery

Figure 4-2: Flow diagram of the PNII synchronization system

A flow diagram of the synchronization system is shown in Figure 4-2. The overall goal of the

timing recovery unit is to select the best of 8 timing instances per chip. This is completed in two

steps: a coarse timing estimation which estimates the timing to within 3/8 chip and a fine timing

estimation which estimates timing to within 1/8 chip. The timing variance due to quantization is

(
var(ε ) Q = 1
2OSR
)
2
(4-1)

where OSR is the relative symbol oversampling ratio. In the final timing estimation step, the

OSR is 248 (8 samples per chip * 31 of chips per symbol). Therefore, the variance for the final

timing estimate is 4.1e-6. Whereas, in the initial timing estimation step, the OSR is 8/3 * 31 =

83, for a variance of 3.7e-5. The variance of the selection process must be lower than the

variance caused by the quantization in order for the final result to be quantization-limited. If the

system is not quantization-limited, energy has been wasted in the ADC, interpolation filter, and

synchronization hardware to accommodate the unnecessarily high oversampling ratio. The SNR

38
degradation due to timing recovery for this DSSS signal with root-raised cosine data with

alpha=0.3 is determined by simulation to be 0.3 dB.

4.3.2 Course Timing Estimation

Before coarse timing estimation, the system performs carrier detect using an algorithm that

compares the code-matched filter output to an adaptive threshold, set using the RSSI

measurement. If the code-matched filter output exceeds the threshold twice with a delay of one

symbol between threshold crossings, it is assumed that the correct code is being sent and carrier

detect status is declared. The coarse timing block then estimates timing to within 3/8 chip by

selecting the best of streams 2, 4, and 7 using a data-aided feed-forward algorithm (Figure 4-3).

The variance of this algorithm is treated in [MEY], and for root-raised cosine data with α=0.3 is

given by:

1 1 0.3 1
var(ε )T = 2
( * + 2 * 0.8) (4-2)
L C 2SNR C

where C is the number of chips used in the estimate, L is the number of chips per symbol, and

SNR is given per chip. The L2 factor is due to the estimate being produced in fractions of chips,

whereas we are interested in fractions of symbols. A variance of 1.1x10-5 is achieved with

estimation performed over one symbol (C=31 chips). This is lower than the quantization error of

3.7x10-5 for this stage, so the performance is sufficiently quantization-limited.

39
Figure 4-3: Coarse timing block diagram

4.3.3 Fine Timing and Frequency Estimation

The fine timing block estimates timing to within 1/8 chip and the carrier frequency offset to

within 2.5 Hz using the unweighted Meyr algorithm (Figure 4-4). While this algorithm is

typically used solely for frequency estimation, Meyr suggests its use as a joint frequency and

timing estimator [MEY]. The timing variance of this method is not computed analytically by

Meyr, but simulation shows it to be lower than 1e-6 under worst case frequency offset conditions.

This is sufficiently smaller than the required variance of 4.1e-6. The variance of the frequency

estimation is 4.5e-5 with 35 symbol estimation, giving a 3-sigma residual offset of less than the

2.5 KHz required by the pull-in range of the PLL.

Figure 4-4: Joint frequency and fine timing estimation

40
4.3.4 Frequency Correction and Timing Tracking

The rotate and correlate block corrects the frequency offset, correlates the incoming signal

with the spreading code, and performs early/late detection to track the optimal timing instant

(using a FF NDA algorithm to choose the best of the chosen stream or one if its direct neighbors).

Since 50ppm clocks are used, the system should switch streams no more frequently than once

every 40 symbols.

The coarse frequency offset needs to be corrected before entering the PLL for two reasons.

First, the pull-in range of the PLL is limited. Second, the frequency offset must be corrected

before entering the code correlator to avoid the power loss associated with correlation in the

presence of a large frequency offset. Figure 4-5 shows post-correlation power loss as a function

of frequency offset. The power loss with a 200HKz offset is 0.45dB, while the power loss with a

2.5KHz offset is less than 0.01dB.

Figure 4-5: Power loss in correlator with frequency offset

The early-late detection circuit was a late addition so that the system could work with an off

the shelf radio. The original custom radio used the same clock reference to derive the carrier

reference and the sample clock. Therefore, once the carrier frequency offset was resolved, the

41
timing offset was known and separate timing tracking was unnecessary. This is one example of

how system-level design can greatly reduce the power consumption of the synchronization

system.

4.3.5 Phase Estimation and Correction

A digital phase locked loop (PLL) corrects the phase error of the correlated symbols using

feedback and the QPSK symbols are demodulated. During acquisition, the PLL operates in data-

aided mode using known header bits to lock to the correct phase. During data reception, the PLL

operates in decision-directed mode where sliced symbol phase is compared to the received

symbol phase to produce an error signal. As with all decision-directed algorithms, there is the

possibility of error-propagation when an incorrect decision is made. And, with all PLLs, there is

the chance of cycle-slip. Both of these occurrences typically have the catastrophic effect of

ruining the remainder of the packet. Whereas recovery from decision errors is dependent on the

loop filter coefficients, recovery from cycle-slip is highly unlikely. Where possible, coefficients

were restricted to factors of two, so that shift-and-add operations could be used instead of the

more power hungry multiplication operations.

While the complete details of optimal PLL design are beyond the scope of this thesis, the

integral and direct coefficients chosen for the second-order PLL are 1
8 and 1
2 respectively. This

results in a normalized natural frequency (ωnT) of 0.35 and a dampening factor of 0.5. The loop

bandwidth is 140 KHz. The convergence time is 19 symbols. This PLL design achieves a 1 dB

SNR degradation including cycle slipping and error propagation effects (assuming 512-bits per

packet). As expected, this performance degrades with packet length. For in-depth analysis of

PLLs, the reader is referred to [STE].

42
4.3.6 Synchronization System Performance

A breakdown in the implementation losses in the synchronization and detection system are

shown in Table 4-1. The loss from phase estimation process is worse than timing or frequency.

However, total losses are much better than the allotted 2.4dB.

Table 4-1: Implementation losses in PNII synchronization and detection

Subcomponent Loss (dB)


Timing Variance 0.3 dB
Correlation Loss w/ Frequency 0.01 dB
Offset
PLL (incl. phase noise, cycle 1 dB
slip, and error propagation)
Total Loss 1.31 dB

Total convergence time (and therefore header length) is 57 symbols. This breaks down as 2

symbols for carrier detect and coarse timing, 35 for joint frequency estimation and fine timing,

one symbol of overhead in switching from frequency estimation to phase estimation, and 19 for

the PLL convergence. This constitutes an 11% overhead on the typical packets of 512 data bits.

In the transmit mode, data bits are mapped into QPSK symbols, spread by a dual-channel

spreader, raised-cosine filtered (25-taps, alpha = 0.30), and passed to dual off-chip DACs.

In addition to the power savings already mentioned, the PNII uses other methods to save

power. The PNII incorporates 5 gated clock domains that are adaptively switched on by the

RX/TX Controller for maximal energy efficiency. Adder types were chosen between ripple-

carry, carry-save, and carry-look ahead for lowest power operation. Several structures for

complex multiply accumulate were explored and one chosen to minimize power. For code

correlation, several structures were explored for selective negation and accumulation of the chips

to form symbols including choosing the adder type in the accumulator.

43
4.4 Results and Conclusion

Average power consumption is measured from actual chips while sending a short packet

consisting of the header and 40 data bits. Longer packets have lower average power consumption

because the system consumes more power during synchronization than during data reception. A

chip plot of the PNII system was show in Figure 1-1c. The PNII chip statistics are detailed in

Table 4-2.

Table 4-2: BBP statistics

Process Tripple-well, 0.18 u digital


CMOS, with 6 metal layers
# Transistors 600K
Area Core: 2.2mm2
Die: 14.5mm2
Package 208-pin PGA
Core Power Supply 1V
I/O Voltage 1.8 V
Clock Frequency 25MHz
Avg. power/node (3 3mW (15% duty cycle)
node network)

Physical layer receiver on-state power is shown in Table 4-3. Even after a concerted effort to

reduce power, synchronization accounts for 18% of the physical layer receiver power. The fact

that most of the low power techniques were focused on the circuit level is suggestive that it is

necessary to move to higher levels of design, such as algorithm selection or system design, to

further reduce power.

Table 4-3: Physical layer receiver power consumption.

Analog RF 70 mW
Synchronization 15 mW
Percentage Synchronization 18%

This design example illustrates the need for new synchronization design methodologies

especially for types of applications like wireless sensor networks, where power is primarily

44
important. To highlight the drastic improvements that are necessary, it is illustrative to explore

scaling this system down to data rates used in a sensor network. It is estimated that after scaling

this system down to 100Kbps, it would consume 2mW, which is twice the entire wireless sensor

network node power budget [RAB2]. In addition, the header length of 57 symbols would impose

almost a 400% overhead on the frequently-used control packets of 30 bits (15 QPSK symbols). It

will be shown in Chapter 8 how the use of the tools and methodologies developed in this thesis

can produce a synchronization system for a wireless sensor network node which meets the

stringent low power goals.

If systematic exploration of the synchronization space is to be conducted, it makes sense to

start where there has the potential to be the biggest impact. In the PNII design, the frequency

estimation has the largest cost, requiring 35 symbols of the 57 symbol convergence time (over

60%). Therefore, frequency estimation was chosen as the first synchronization parameter to

explore (see Chapter 5). The combined frequency and phase estimation of PNII takes 55 of the

57 symbols. Overhaul of this part of the design is conducted in Chapter 6 which uses the results

of Chapter 5 to significantly reduce the power consumption and convergence time of the PNII

system.

45
Frequency Estimation
5
5.1 Introduction

The impetus for conducting this experiment on frequency estimation was that in the PNII

system (Chapter 4), frequency estimation required the longest convergence time of all

synchronization parameters (35 out of 57 total symbols). Therefore, potentially the largest

system improvement could be achieved by reducing the frequency estimation convergence time

and power consumption. Feed-forward data-aided estimation was chosen for initial exploration

because it is most common type of frequency estimation used in actual systems (probably because

of its relatively fast convergence time compared to FB estimation), so the results of this study

could be used by a wide variety of systems as a drop-in replacement for an existing feed-forward

data-aided frequency estimation block.

This work examines four feed-forward data-aided frequency offset estimation algorithms and

systematically compares the estimation performance and power consumption of each over

estimation length and input SNR. A modification of these algorithms is also presented that

simultaneously achieves lower power and faster convergence time.

46
In this study, it is determined which among these four algorithms achieves the lowest power

for a given input SNR and variance requirement. In addition, absolute numbers for power

consumption and convergence time are determined which allow these algorithms to be evaluated

in a system-level framework. This chapter serves as a model for how these algorithm

explorations should be conducted and the results that are needed to allow a system level designer

to make use of the information.

5.2 Frequency Estimation Algorithms

In a typical wireless communication system, imperfect up- and down-conversion caused by

nonidealities in the transmitter and receiver local oscillators (LO) result in a carrier offset at the

receiver. This offset causes a continuous rotation of the signal constellation, and must be

corrected for reliable demodulation of the received signal.

As described in [MEY] and [TAV], in the absence of ISI and with moderate frequency offset

(less than ~15% of the symbol rate, because larger offsets incur large power loss in the matched

filter), the sampled output of the matched filter (at one sample per symbol), assuming perfect

symbol synchronization, is given by

rn = an e j (φ + n∆ωT ) + wn , (5-1)

where an is the n-th (complex) data symbol, φ is the carrier phase, ∆ω is the carrier frequency

offset, T is the symbol duration, and wn is a complex Gaussian white noise process with

independent, zero-mean real and imaginary parts each with variance σ 2 = N 0 /(2 E s ) where E s is

the symbol energy and N 0 , the one-sided spectral density of the noise. Also, as in [MEY], the

convenient notation of normalized frequency offset is used, defined as Ω = ∆ωT .

Since this study is examining the case of data-aided estimation, the known data symbols are

canceled before frequency estimation. This reduces to the problem of frequency estimation with

47
an unmodulated carrier [TAV]. This is the most common use of frequency estimation in systems

where a synchronization header is used, such as 802.11b.

There are two well-known algorithms for frequency estimation operating with timing

information derived from the maximum likelihood (ML) equations. The difference between the

two depends on whether the angle of rn is taken inside or outside the averaging function. If the

angle is taken inside, the result is the Kay estimator [TAV],

{ }
L −1
ˆ = ∑ b arg r r *
Ω n n n −1
n =1 , (5-2)

if the angle is taken outside, the result is the Meyr estimator [MEY],

ˆ = arg ⎧⎨∑ b (r r * )⎫⎬


L −1
Ω n n n −1
⎩ n =1 ⎭, (5-3)

where

6 n (L − n )
bn =
L( L2 − 1) . (5-4)

Neither algorithm requires phase unwrapping (to disambiguate phases that may have “wrapped”

around π), and both are limited to frequency offsets that obey

Ω <π
(5-5)

It should be noted that, while different weighting functions, bn, can be used, the one given in

(5-4) is optimal. A simplification, suggested in [MEY], that is often used in practice, substitutes

an integrate-and-dump filter (bn=1/L) that computes an unweighted average, for the filter function

in (5-4) that computes a weighted average. This simplification is applied to both the Meyr and

Kay estimators to expand the number of estimators considered here to be four. The variance of

the weighted Ω̂ Mw and unweighted Ω̂ Mu versions of the Meyr estimator are given in [Meyr] as,

48
[ ] L(L12− 1) 2E 1/ N
ˆ
Var Ω Mw = 2
+
12 1 L2 + 1 1
5 L L2 − 1 (2 E s / N 0 )2
s 0
(5-6)

and

ˆ
Var Ω [ ] 1
Mu = 2
2
+
1 2
L 2 E s / N 0 L (2 E s / N 0 )2
. (5-7)

The simulated performance of the four estimators (Meyr, weighted and unweighted, and Kay,

weighted and unweighted) is shown in Figure 5-1.

1.E-01
Meyr Meyr Kay Kay
1.E-02 Weighted Unweighted Weighted Unweighted
1.E-03
SNR=12dB SNR=12dB
1.E-04
Variance

1.E-05
1.E-06 SNR=24dB

1.E-07
1.E-08
SNR=48dB SNR=24dB
1.E-09
1.E-10
1.E-11
0 50 L 100 150 0 50 L 100 150

Figure 5-1: Meyr and Kay weighted and unweighted performance.

The simulations match the performance predicted by Meyr very closely. As expected, at high

SNR, the performance of the two weighted estimators approach the modified Cramer-Rao bound

given in (2-2) as

6
MCRB (Ω) =
L( L − 1)( E s / N 0 ) .
2
(5-8)

While these algorithms have been derived for the flat fading channel, in practice, they also work

for the frequency selective fading channel.

49
The improved convergence time is achieved by exploiting an underutilized modification to

these algorithms described in [MEY]. In the estimator equations, the product (rn rn*−1 ) is replaced

with (rn rn*−D ) , so instead of using the current and previous symbol, the current and D-th previous is

used. While this is not a new result, it is often ignored in the literature. The variance of the

estimator is improved roughly as D. For instance, the performance of the unweighted Meyr

estimator is given in [MEY] by

[ ]
ˆ
Var Ω Mu =
1 ⎛⎜ D
2 ⎜ 2
2
+
1 2
D ⎝ L 2 E s / N 0 L (2 E s / N 0 )2



⎠. (5-9)

The algorithms are now limited to frequency offsets that obey

ΩD < π
 (5-10)

In practice, many systems can tolerate D > 1. If following the rule of thumb that frequency

offset should be less than 15% of the symbol rate, then D ≤ 3 is possible. In 802.11b, with a

25ppm carrier offset from a 2.4GHz reference, the maximum frequency offset is ±120 KHz,

allowing D = 4 to be used. Figure 5-2 shows that even D = 2 yields a huge improvement

(decrease) in L for a given variance.

50
1.E-01
D=1 D=2
1.E-02
1.E-03
1.E-04 SNR=12dB
Variance
1.E-05
1.E-06 SNR=24dB
1.E-07
1.E-08
SNR=48dB
1.E-09
1.E-10
0 50 L 100 150

Figure 5-2: Meyr weighted D = {1, 2} performance

The block diagrams for the Kay and Meyr weighted estimators are shown in Figure 5-3 and

Figure 5-4. To implement the unweighted estimators, one or two scalar multipliers are removed

from the Kay or Meyr estimators respectively.

I
Q Z-D complex Rect

X to
Polar
X SUM
Ω̂ Kw

bn clear

Figure 5-3: Block diagram of the weighted Kay estimator

51
I
Z-D complex

XX S
Q Rect Ω̂ Mw

X SUM to
Polar

bn clear

Figure 5-4: Block diagram of the weighted Meyr estimator

The goal is to choose the frequency estimation algorithm that gives the lowest system power

for the required variance. The Mery and Kay algorithms seem to have similar hardware

complexity upon first glance because they consist of the same operations but in a different order.

However, the ordering of operations in hardware can have a large impact on the power

consumption. The simplification suggested in [MEY] of bn=1/L, reduces the hardware, but incurs

a performance penalty. It is to be investigated at what point, if any, this hardware simplification

will actually decrease energy consumption. Increasing D requires marginally more hardware but

gives a significant improvement in performance. It is expected that this will be a good trade-off

because the hardware cost is so small. It is the goal of this study to provide the information,

beyond convergence time and variance, that is required to choose the best algorithm for a low

power system.

5.3 Power Estimation Methodology

Each frequency estimation algorithm was coded as a parameterized module in Synopsys

Module Compiler. Each module was synthesized in Module Compiler for a range of parameters,

such as input SNR and estimation length. It was then placed through the block estimation tool

described in Chapter 3.

Energy, rather than power, is used as the cost metric for each block. This is because the

frequency estimation takes a different number of cycles depending on the input SNR, required

estimation variance, and which algorithm is selected. Aggressive low power designs will gate the

clock and power rails to the frequency estimation block when not in use. Therefore, the way to

52
fairly compare different blocks is the energy consumption, which is the power consumed when

the block is on times the amount of time the block needs to be on to achieve the desired variance.

The energy consumption reported here is for a 0.13um CMOS process. While the actual

energy consumption will change for different processes, the comparison of one algorithm vs.

another is valid for most contemporary processes. Obviously, the ratio of leakage power to

switching power and power consumed in the wires will vary between process and this will alter

the crossover points of the curves; however the general results will remain true.

5.4 Algorithm Comparison and Results

For each implementation, it is assumed that the number of bits at the input to the estimator is

scaled depending on the input SNR. This is a reasonable assumption because most systems

would not pay the cost penalty of implementing an ADC that converted more bits than necessary

nor a frequency estimator that achieved better precision than was needed. The bit widths are

scaled up in subsequent blocks to accommodate the growing precision. The accumulators are

pre-scaled to accommodate the summation of L samples, and the precision of the weighting taps,

bn, is increased with L. The bn coefficients are hard-wired before synthesis for the lowest power

operation. The rectangular-to-polar conversion is performed by a CORDIC [TUR] and the

number of CORDIC stages is increased depending on the required precision. These adjustments

ensure that the hardware is not significantly limiting the expected variance.

The resulting energy consumption of each estimator is shown versus variance for a range of

input SNR and L. Since lower variance and lower energy consumption are desired, data points to

the bottom and left are better. While this is the right presentation of the data for optimizing the

energy of the frequency estimator in isolation, L must be considered if a system-wide reduction in

power consumption is to be achieved because the RF and analog front-end are on for different

amounts of time. For instance, in the case where the front-end power dominates that of the

53
frequency estimation, choosing an algorithm with smaller L may optimize system energy even if

it has higher frequency estimation energy. Since the absolute energy consumption and the L for

each data point is given in the graphs, the designer can make the appropriate trade-off.

Obviously, cases where both the power consumption and convergence time (L) are decreased for

the same variance are hands-down winners.

Figure 5-5 compares the energy consumption vs. variance of the weighted and unweighted

versions of the Meyr algorithm. At low SNR and at high required variance, it is more energy

efficient to use the non-weighted version. Here, there is a small difference in variance between

the two algorithms, so the hardware simplification of unweighted combining pays off. However,

at high SNR or low variance, it is more energy efficient to use the weighting function. Here the

energy savings from the unweighted averaging are outweighed by the longer correlation times

required to overcome the degradation in variance. For instance, at 24db SNR, and a required

variance of 3x10-7, the unweighted estimator converges in 128 samples, whereas the weighted

estimator takes only 64 samples and as a result, consumes marginally less energy.

Figure 5-6 compares the weighted and unweighted versions of the Kay algorithm. For the

Kay estimator, it is almost always better to use the weighted version of the algorithm. This is due

to the variance of the unweighted version of the Kay algorithm severely under performing the

weighted version. In this case, the hardware simplification of an unweighted average is not worth

the degradation in variance. For instance, at 24db SNR, and a required variance of 2x10-6, the

unweighted estimator converges in 128 samples, whereas the weighted estimator takes only 32

samples and as a result, consumes 1/3 as much energy.

54
1.E+05
SNR=48dB SNR=24dB SNR=12dB
L=
1.E+04
128
64
32
1.E+03
16
Energy (pJ)

8
1.E+02
4
2
1.E+01

Unweighted
Weighted
1.E+00
1.E- 11 1.E-09 1.E- 07 1.E-05 1.E-08 1.E-06 1.E-04 1.E-02
1.E-06 1.E- 04 1.E- 02 1.E+00
Variance

Figure 5-5: Meyr weighted vs. unweighted comparison

1.E+04
SNR=24dB SNR=12dB

L=
1.E+03
128
64
32
Energy (pJ)

1.E+02 16
8
4
1.E+01
Unweighted 2

W eighted
1.E+00
1.E- 09 1.E-07 1.E- 05 1.E- 03 1.E- 01 1.E- 08 1.E- 06 1.E- 04 1.E- 02 1.E+00
Variance

Figure 5-6: Kay weighted vs. unweighted comparison

55
1.E+05

SNR=48dB SNR=24dB SNR=12dB


L=
1.E+04 128
64
32
1.E+03
16
Energy (pJ)

8
1.E+02
4
2
1.E+01
Meyr
Kay
1.E+00
1.E- 08 1.E-06 1.E- 04 1.E- 02 1.E- 07 1.E- 05 1.E- 03 1.E- 01
1.E- 11 1.E- 09 1.E- 07 1.E- 05
Variance

Figure 5-7: Meyr vs. Kay weighted comparison

1.E+05
SNR=24dB SNR=12dB
L=
1.E+04
128
64
32
1.E+03
16
Energy (pJ)

8
1.E+02
4
2
1.E+01
D=1
D=2
1.E+00
1.E- 08 1.E- 06 1.E- 04 1.E- 02 1.E- 06 1.E- 04 1.E- 02 1.E+00
Variance

Figure 5-8: Meyr weighted D = 1 vs. D = 2 comparison

Figure 5-7 compares the weighted versions of the Meyr and Kay algorithms. The weighted

Kay algorithm is almost always better than or equal to the weighted Meyr algorithm. At low

SNR, the marked advantage of the Kay algorithm is due to the combination of achieving better

variance and requiring considerably less hardware to implement than the Meyr algorithm. At

high SNR where the algorithms have similar variance performance and similar hardware

56
requirements, the minor differences mostly result from the correlation of the data as it flows

though the hardware. At high variance there is little difference between the two, while at low

variance the Kay algorithm wins out.

Figure 5-8 compares the weighted version of the Meyr algorithm for D=1,2. Increasing D is

usually the right choice, especially for low variance. The power penalty is very small (only one

extra register) and the convergence time can be markedly better. For example, for an input SNR

of 12dB and required estimation variance of 2x10-5, the convergence time is decreased by a factor

of 4 while simultaneously decreasing the energy consumption by a factor of 4.3.

5.5 Conclusion

Four feed-forward frequency estimators were characterized for energy consumption and

variance for a given input SNR and correlation length. It was found that the weighted Kay

estimator is a safe bet for all regions of operation, especially for high SNR and low required

variance. The unweighted Meyr estimator may be used for low SNR and high required variance.

Exploiting D is the most powerful way to simultaneously decrease convergence time and energy

consumption especially for low required variance. It is surprising to find that certain hardware

simplifications, such as using the smallest D (D=1) and unweighted averaging does not usually

result in lower energy consumption. The degradation in variance due to these simplifications

requires longer convergence times and more energy consumption.

5.6 Postscript: Application to DSSS Systems

For DSSS, it is sometimes suggested in the literature to apply these frequency estimation

algorithms to chips rather than to post-correlated symbols to maximize D; this is not usually

advantageous. Whereas the normalized frequency offset, Ω = ∆ωT, has been used thus far, when

comparing the variance between chips and symbols, the non-normalized variance,

57
Var[∆ω]=Var[Ω]/T2, must be used. For the same convergence time, and assuming minimal

power loss in the code correlator due to frequency offset, the performance when operating on

chips is significantly worse than when operating on symbols. Figure 5-9 shows the difference for

802.11b-like symbols. The difference is more pronounced for long convergence times and low

SNR. Therefore, estimation should be conducted on post-correlated symbols.

1.00E-02 Chips 12dB Chips 24dB


Symbols 12dB Symbols 24dB

1.00E-03
Symbol Variance

1.00E-04

1.00E-05

1.00E-06

1.00E-07
0 50 100 150
Convergence Time (Symbols)

Figure 5-9: Variance of frequency estimation applied to chips versus symbols for 802.11b-
like symbols

The only caveat is with a large frequency offset. If, the constraint in (5-5) is not satisfied for

symbol operation, one could operate on chips using the algorithms described here without having

to resort to more complex FFT-based algorithms. Even if (5-5) is satisfied, there is an SNR

degradation in the code correlator due to the large frequency offset. For 802.11b-like symbols

(11-bit barker sequence spreading, root-raised cosine transmit and receive filters w/ 50% excess

bandwidth), the power loss for correlation prior to frequency-offset correction is approximately

3db with a 600 KHz offset. Therefore, it may be advantageous to operate on chips because

correlation to symbols causes an SNR loss. A coarse/fine estimation may be employed where

58
coarse estimation is performed on chips, and then fine estimation performed on coarsely-

corrected symbols.

In all cases, because of the SNR degradation due to correlation in the presence of frequency

offset, even if frequency-offset estimation is performed on symbols, the frequency-offset

correction should be applied to chips.

59
PNII System Refinement
6
6.1 Introduction

The PNII design (Chapter 4) consisted of phase and frequency estimation algorithms that

required a total of 55 symbols (out of a total 57) to converge. Here, an exploration is conducted

within a system framework to improve these two systems. Two levels of refinement are applied

to the PNII system. First, keeping the existing architecture, the results of Chapter 5 are applied to

see if the frequency estimation block power consumption and convergence time can be improved

by selection a different FF algorithm. Second, more radical changes are considered including FF

phase estimation rather than the FB DPLL currently in use. Also, the use of a differential

modulation scheme is explored in place of the coherent QPSK currently in use. All exploration is

done within the context of lowering system power consumption, so the system power

consumption estimation tool is used to compare the various alternatives.

The original PNII system was designed in a 0.18 µm CMOS process. Since that process is no

longer available, the refinement of the system is conducted for a 0.13 µm process. Therefore, the

numbers are not directly comparable to the measurements in Chapter 4. However, the original

system power consumption is re-estimated for the new process and all refinements compared to

60
that estimation. A few modifications were made to the original system to correct problems

causing unnecessary power consumption in order to make a more fair comparison. First,

although clocks to the sub-blocks were gated when not in use, the input signals continued to

switch causing a non-negligible amount of power to be consumed. In this study, the input signals

are also gated (using the clock gating signal). Second, the original system included an early-late

correlator so that it could operate with an off the shelf radio front-end. However, in this study,

we are assuming the use of the custom radio front end which doesn’t require the timing tracking

unit. Therefore, the rotate/correlate block power is reduced by a factor of 3. The original system

now consumes 375 nJ over the 57 symbol synchronization header, an average of 5.4 mW during

synchronization, and 6.2 mW during data reception.

6.2 Frequency Estimation Refinement

The frequency estimation component of the PNII system described in Chapter 4 is re-

examined using Chapter 5 as a guide for reducing power consumption and convergence time.

The complete specs for the frequency offset estimation block for the PNII system is described

in Chapter 4. The relevant parameters for the following discussion are: The maximum input

frequency offset is 210 KHz. The specification for frequency estimation is a variance of ΩT =

4.5e-5 so that the 3-σ variation is within +/- 2.5 KHz frequency offset. The minimum input SNR

is 12 dB.

The original design uses the unweighted Meyr estimator with D = 1 and L = 35.

Figure 6-1 details the performance of the different Meyr and Kay algorithms achieving 4.5e-5

variance. The unweighted Meyr algorithm used in the original design performs the worst in terms

of convergence time at 35 symbols. The weighted Kay algorithm performs the best with

convergence time of 17 symbols for D = {1, 2, 3} and convergence time of 18 symbols for D = 4.

When the convergence time is small, it is not uncommon for there to be convergence time

61
increases for a larger D. In essence, the extra delay of D symbols in waiting for the first

estimation is larger than the corresponding reduction in convergence time.

Figure 6-1: Convergence time of different frequency estimators

The requirement of both the Meyr and Kay algorithms is that ΩD < π . The maximum

frequency offset of 210 KHz implies that D = 1 is the largest D possible without using a more

complicated coarse/fine estimation method. Fortunately, in this instance, the weighted Kay

algorithm with D = 1 gets the best performance and there is no reason to resort to the more

complicated coarse/fine estimation schemes.

Table 6-1 details the new frequency estimation method for PNII vs. the original one. The

convergence time for frequency estimation is reduced from 35 symbols to 17 for a savings of

50%. Over the entire original synchronization header of 57 symbols, this results in 30% savings.

Due to the shorter convergence time and lower power consumption, the new algorithm consumes

16% of the original algorithm’s energy consumption.

62
Table 6-1: New and old frequency estimation methods

Algorithm D Convergence Time Total Energy (pJ)


(symbols)
Unweighted Meyr 1 35 3407
Weighted Kay 1 17 535
Reduction 50% 84%

This shows that the work of this thesis and especially in Chapter 5 can result in significant

system energy and convergence time savings as applied to actual systems.

6.3 Frequency and Phase Estimation Redesign

Further refinement of the PNII system is explored further by deviating from the original

design. The phase and frequency estimation algorithms required a total of 55 symbols (out of a

total 57) to converge. Therefore, a joint optimization of these two systems is conducted to see if

an improvement can be made. First, alternative phase estimation schemes are explored where a

FF algorithm is used in lieu of the FB DPLL in the original design. Lastly, a change in the

modulation scheme itself is considered. Differential PSK can reduce the synchronization

overhead, but incurs a BER penalty versus the coherent QPSK modulation in the original design.

These trade-offs are explored in a system power consumption framework.

In many standards-based wireless communication systems (i.e. 802.11b), the modulation

specified is differential PSK, or DPSK. This choice is made to alleviate the synchronization

requirements since in using differential modulation schemes, a coherent phase does not need to be

estimated and tracked. As will be shown, this also relieves the frequency estimation

requirements. However, differential modulation incurs a BER penalty as compared to coherent

modulation. The PNII system was analyzed to see if system energy consumption is reduced or

increased if differential modulation is used.

63
6.3.1 Differential Modulation Penalty

To fairly evaluate the systems, it is necessary to examine the performance degradation of

differential BPSK and QPSK versus the coherent versions. Proakis [PRO] gives the bit error rate

for Differential PSK (DBPSK and DQPSK) as

1 −γ b
Pb 2 = e (7-1)
2
1
1 − ( a 2 +b 2 )
Pb 4 = Q1 (a, b ) − I 0 (ab )e 2 (7-2)
2

(
a = 2γ b 1 − 1 2 ) (7-3)

(
b = 2γ b 1 + 1 2 ) (7-4)

where Q1(a,b) is the Markum Q function and I0(x) is the modified Bessel function of the first

kind of order zero [PRO]. The results are plotted in Figure 6-2 as well as Pb for coherent BPSK

and QPSK:

⎛ 2 Eb ⎞
Pb = Q⎜⎜ ⎟.
⎟ (7-5)
⎝ N0 ⎠

The SNR degradation for DBPSK is less than 1dB for Pb < 1e-5. However, the degradation

for DQPSK is 2.3 dB for moderate to high SNR. Higher order modulation schemes, with M > 4,

typically incur a 3 dB penalty.

64
Figure 6-2: BER of coherent and differential QPSK, BPSK

6.3.2 Phase Error vs. SNR Degradation

To compare the different schemes, the designs are normalized to a common design target. The

instantiation of each scheme is designed to meet the design target, and then the system power

consumption of all the designs meeting the same design target are compared against each other.

The design target is specified as an SNR degradation versus the ideal detection scheme. Different

kinds of phase errors will affect the BER in different ways. So, a discussion of phase error effects

on BER is required.

The probability of symbol error for PSK modulation in AWGN [PRO] is

π
PM = 1 − ∫ πM pΘ r (Θ r )dΘ r (7-6)

M

where,

1 − 2γ s sin 2 Θ r ∞ −(V −
pΘ r (Θ r ) = ∫0 Ve
4γ s cos Θ r ) 2 / 2
e dV . (7-7)

Assuming gray-coded symbols, the BER, Pb can be computed as,

65
1
Pb = PM . (7-8)
log 2 M

For BPSK and QPSK, (7-6) exactly matches the analytical formulas

⎛ 2 Eb ⎞
P2 = Q⎜⎜ ⎟
⎟ (7-9)
⎝ N0 ⎠
2
⎛ 2 Eb ⎞ ⎛ ⎞
P4 = 2Q⎜⎜ ⎟ − Q⎜ 2 Eb ⎟ (7-10)
⎟ ⎜ N ⎟
⎝ N0 ⎠ ⎝ 0 ⎠

[PRO]. Therefore, the BER for both BPSK and QPSK is

⎛ 2 Eb ⎞
Pb = Q⎜⎜ ⎟,
⎟ (7-11)
⎝ N0 ⎠

which is exact for BPSK and slightly pessimistic, but fairly accurate for QPSK. There is no

closed form solution for the integral for M>4, so it is computed numerically.

The BER given a fixed phase offset, ε, can be computed by changing the integration limits on

the integral in (7-6) to [− π M + ε K π M + ε ]. The results for QPSK and errors of 1 ⋅ 10 −1 ,

1 ⋅ 10 −2 , 1 ⋅ 10 −3 radians is shown in Figure 6-3 (labeled “fixed”). It can be seen that errors

below 1 ⋅ 10 −3 have negligible BER degradation (<0.1dB) versus the optimal detection case

also shown.

66
Figure 6-3: QPSK BER with Gaussian and fixed phase errors

The BER for a random phase error can be computed by evaluating the stochastic integral

∞ π +ε
⎡ ⎤
E [PM ] = ∫ ⎢⎣ ∫−π M +ε pΘ r (Θ r )dΘ r ⎥⎦ p (ε )dε
1 − M
(7-12)
ε = −∞

where pΘ r (Θ r ) is defined as in (7-7). The results for QPSK and zero-mean Gaussian phase

noise with variance 1e-1, 1e-2, 1e-3, and 1e-4 are also shown in Figure 6-3. Simulations match

these calculations very well. It can be seen that variances lower than 1e-3 have negligible BER

degradation (< 0.1 dB) versus the optimal detection case. Note that stochastic phase errors with

variance σ2 have slightly worse performance than constant phase errors of σ especially at high

variances.

67
Figure 6-4: QPSK BER with Gaussian phase error

Meyr [MEY] computes an approximation to the BER degradation (in dB) for stochastic phase

noise as

10 ⎛ ⎞
⎜1 + 2 log 2 ( M ) cos ( pi / M ) b N ⎟ var(φ )
E
D= 2
(7-13)
ln(10) ⎝ o⎠

where, M is the constellation size, and Eb


N0 is the signal to noise ratio per bit. These are also

plotted along with the calculated values in Figure 6-4. The Meyr equation is designed to be valid

for small SNR degradations (< 0.2 dB). In practice, the Meyr equation is a good approximation

except for very large phase variances, such as 1e-1. However for variances of 1e-2 or smaller, the

Meyr approximation fits very well.

Phase errors are not always Gaussian distributed. Sometimes, as in symbols with a residual

frequency offset, the phase is uniformly distributed. The BER can be calculated using the same

technique for the Gaussian distributed errors with the Gaussian PDF replaced with a uniform

PDF. Figure 6-5 shows the BER of QPSK symbols with phase errors uniformly distributed with

different bounds.

68
Figure 6-5: BER vs. SNR with uniform phase error in the range of [0..lim]

6.3.3 Feed-Forward Phase Estimation

A FB algorithm for phase estimation, the DPLL, was used in the original PNII design. FF

algorithms are known to have faster convergence time than FB algorithms. Therefore, a FF phase

estimation scheme is considered in this exploration. The most common FF phase estimator is the

Viterbi&Viterbi (V&V) estimator [TAV],

1
arg ⎧⎨∑n =0 rn e jM arg{rn } ⎫⎬
N −1 L
φˆ = (7-14)
M ⎩ ⎭

where L = 2 has been shown to be nearly optimal for QPSK symbols. The estimator is NDA, and

therefore has a π/M phase ambiguity. This ambiguity can be resolved with the use of known

synchronization header bits.

A plot of V&V variance versus estimation lengths for different SNRs is shown in Figure 6-6.

The modified Cramer-Rao bound,

69
1
MCRB(φ ) = , (7-15)
2 N ( Es / N 0 )

for each SNR is also shown. It can be seen that SNRs of 12 dB and above essentially achieve the

MCRB.

Figure 6-6: Phase estimation variance vs. L for different SNRs

It is beyond the scope of this thesis to compare different FF phase estimation schemes or to

estimate the power consumption for a wide range of phase estimator parameters. However,

whenever a FF phase estimator is required for the following discussion, the V&V estimator will

be used, and the power consumption will be estimated for that specific instantiation of the

estimator.

6.3.4 Frequency and Phase Estimation Redesign

Finally, enough background information has been given to describe and compare the different

schemes considered in this exploration. A maximum packet length of 1024 bits is assumed along

with the typical channel model in Chapter 2. 50 ppm crystals are assumed as in the original

70
design. The design goal is an implementation margin of 1dB at 12dB input SNR for coherent

modulation and a margin of 1 dB at 14.3 dB input SNR for differential modulation.

Using (2-10), the 95% correlation time of the channel is 4032 symbols. Therefore, phase and

amplitude of the channel are static throughout the 1024-bit packet. However, the frequency offset

can be up to 210 KHz, which, unless corrected, causes a fast-changing phase offset. If the

frequency offset can be corrected to a tolerance that makes the phase seem static over the course

of the packet, phase can be estimated once and not tracked throughout the packet.

Using Figure 6-3 and Figure 6-5, it can be seen that a constant phase offset of 1e-1 (referred to

as “Bound 1”), a Gaussian phase offset with variance 7e-3 (referred to as “Bound 2”), or a

Uniform phase offset with bound 1.5e-1 (referred to as “Bound 3”) can be tolerated without

exceeding the 1dB implementation margin.

The four frequency/phase estimation methods to be considered are shown in Table 6-2. The

first scheme is the original scheme described in Chapter 5 with the improvements described in

§6.2. The second scheme uses an estimate-once method where the frequency offset and phase are

estimated only once at the start of the packet. The severe frequency estimation requirements for

this system may render this scheme impractical. The third method uses FF frequency and phase

estimation, but re-estimates the phase every symbol. This method gives the most relaxed

frequency estimation requirements, and therefore the shortest packet header length, of any of the

coherent schemes. Lastly, the differential QPSK modulation scheme is considered. The DQPSK

scheme has the most relaxed frequency estimation requirements, and no need for phase

estimation, so it achieves the shortest packet header length, but as was shown above, incurs a

2.3dB SNR penalty.

71
Table 6-2: Frequency/phase estimation methods to be considered

Method Modulation Frequency Estimation Phase Estimation


Original: FB Coherent FF, var=4.5e-5 PLL, pull-in range
Coherent 2.5KHz
Tracking
Estimate Once Coherent FF, var=9.5e-9 FF, estimate once,
var=7e-3
FF Coherent Coherent FF, var=2.8e-4 FF, re-estimate every
Tracking symbol,
Var=3.5e-3
Differential Differential FF, var=1.1e-3 none

The revised original method has been described §6.2 and takes 17 symbols for the frequency

estimation and 19 symbols for the PLL for a total of 38 (including the additional 2 for coarse

timing). Energy consumption is 271 nJ during synchronization. Power consumption averages

5.74 mW during synchronization and 6.18 mW during data reception. It was shown in Chapter 4

that this method achieves better than the required 1 dB implementation margin for frequency and

phase estimation.

The estimate once method requires a small residual frequency offset so that the maximum

phase error over the length of the packet is less than 1.5e-1 radians. Then, since the frequency

offset shows up as a constant phase error ramp over the lenght of the packet, the BER will follow

a uniform distribution of Bound 3. The required frequency estimation variance to achieve this

error is 9.5e-9. However, the convergence time for this variance is greater than 100,000 symbols.

Therefore, an estimate once scheme for phase and frequency is not practical for systems with a

large frequency offset and will not be considered further.

A FF coherent tracking method estimates the frequency once at the start of the packet, and re-

estimates the phase every N symbols. Therefore, the frequency estimation needs only be accurate

enough so that the phase error remains small over N symbols. The phase error allowance is split

evenly between the phase estimator and the frequency error over N symbols. Therefore, every N

symbols, the phase is re-estimated to within a variance of 3.5e-3 (½ Bound 2) using an estimation

72
length of L. For 3.5e-3 variance at 12 dB input SNR, the V&V estimator takes L = 5 symbols to

converge. The energy consumption of the V&V estimator for L = 5, and a symbol rate of 806

KHz is 647 uW. The initial frequency offset variance requirement is 2.78e-4/N2 (to achieve a 3-σ

variation of ½ Bound 1, the frequency offset variance is (½ Bound 1/3/N)2).

With increasing N, the power consumption during reception decreases, however, a tighter

frequency estimation variance is required thereby increasing the convergence time of the

frequency estimator. The smallest packet header is achieved when N = 1 and the frequency

estimation variance is the most relaxed. For N = 1, the required frequency estimation variance is

2.8e-4 which can be achieved using the weighted Kay estimator with an estimation length of 9.

The header length must include the time for the frequency estimation and for the first phase

estimation to be produced, for a total of 14 symbols for phase and frequency estimation.

It may seem strange to have N < L, but it is possible if the V&V phase estimator is pipelined

producing one result every symbol that is the average of the previous L symbols.

Energy consumption for the frequency/phase estimator is 126 nJ over the 16 total symbols of

the packet header. Total power consumption is 5.97 mW during synchronization and 6.68 mW

during data reception. The 1 dB implementation margin is met.

The differential method requires only frequency estimation so that the phase error per symbol

is less than 1e-1 (Bound 1). The required frequency estimation variance for 3 sigma operation is

1.1e-3. The convergence time to achieve this variance is 6 using the weighted Kay algorithm.

Energy consumption for the frequency estimation is 63 nJ over the 8 symbols in the synch header.

Total power consumption is 6.36 mW during synchronization and 5.78 mW during data

reception. The 1dB implementation margin is met. Table 6-3 details the results of the 4

algorithms.

73
Table 6-3: Comparison of different Frequency/Phase Estimation Schemes

Method Convergence Time Modulation Energy during Power during data


(frequency/phase Penalty (dB) synchronization reception (mW)
component) (nJ)
Original: FB 38 0 271 6.18
Coherent
Tracking
Estimate Once >100,000 0 N/A N/A
FF Coherent 16 0 126 6.68
Tracking
Differential 8 2.3 63 5.78

6.3.5 System Results

Comparing the convergence time and base power consumption in isolation does not take into

account the system impact of the 2.3 dB of the differential scheme, the different synchronization

power consumption during data reception, or the front-end power that is consumed during the

longer packet headers. The system power consumption is based on a set of parameters, such as

transmitter efficiency, required BER and input SNR, receiver power consumption, baseband

power consumption during synchronization and data reception, and packet header length. The

values for the PNII system are outlined below in Table 6-4.

Table 6-4: Parameters used in system exploration

Txdiss (for 0dBm 30 mW


transmit power)
Tx efficiency 20%
Rxdiss 70 mW

Figure 6-7 shows the system energy consumption comparing the original, FF coherent

tracking, and differential methods along with the original PNII system. It is shown that for 1024-

bit packets, the FF coherent tracking scheme is the lowest power consumer by a 1% margin vs.

the differential method. The coherent PLL scheme is 2% worse than the FF coherent tracking

scheme. For short packets where header length is the dominant factor in system power

74
consumption, differential schemes can pay off. Especially considering the extra design time and

extra risk associated with implementing the coherent scheme, the differential scheme looks

attractive. However, for longer packets, the Differential scheme never wins. This is because the

longer header length of the coherent scheme is amortized over more data bits, and therefore the

2.3dB penalty of the differential scheme matters more. However, in this instance, the margin for

infinitely long packets is only 2%. This is due to the specific power consumption of the different

components of the system. In this system, the extra power required to transmit the additional 2.3

dB (170%) is only 2.7 mW. This is due to the low original transmit power of 0 dBm (1 mW) and

the moderately good PA efficiency of 20%. This additional power is relatively small in

comparison to the front-end power of 100 mW. The differential scheme will be even less

favorable in systems where the additional transmitted power is small compared to the front-end

power. Low transmitter efficiencies raise the additional power required to transmit the additional

2.3dB. Therefore, low transmitter efficiencies favor the coherent scheme. Compared to the

original PNII system, the FF Coherent Tracking method achieves a reduction of 66% in

synchronization energy consumption, resulting in a 7% lower system energy for packets lengths

of 512 used in the original system.

75
Figure 6-7: System power consumption for different schemes

The rapid power estimation tool described in Chapter 3 allows four frequency and phase

estimation schemes to be analyzed. The system power estimation tool then allows the schemes

with different convergence times and power consumption to be compared. This exploration

resulted in a reduction of 66% in synchronization energy consumption, 75% in synchronization

header length, and 7% in system energy from the original PNII system design. It also determined

packet lengths at which differential modulation is advantageous. Theoretically efficient

modulation schemes, which result in more synchronization overhead, are useful only for long

packets and long transmit distances. Otherwise, less efficient modulation schemes (such as

differential PSK) result in lower system power because of the reduced synchronization overhead.

76
Interpolation
7
7.1 Introduction

A major component of most timing recovery algorithms of any type is a timing interpolator to

perform the parameter adjustment. Therefore, a study of timing recovery algorithms relies on

accurate power consumption estimates of interpolators of various sizes and performance.

Two styles of interpolating filters exist: ones that use a lookup-table (LUT) for the

coefficients, and the others that compute the coefficients on the fly, called the Farrow

interpolators [MEY]. LUT-based interpolators are straightforward to characterize. The output

SNR requirement is achieved by the precision and number of taps, and interpolation granularity is

achieved through the number of coefficient sets that are stored. However, the Farrow

interpolators are more complex and less straight-forward to specify. A study of timing recovery

would not be complete if it didn’t consider both types of interpolation. Therefore, before a study

of timing recovery can be conducted, the Farrow interpolator must be characterized.

This chapter performs a thorough study of the Farrow type of interpolator over a wide range of

parameters. The results of this work can be used to conduct the study of timing recovery

algorithms of all types. Specification of the interpolator is inherently linked to the specifications

77
of the ADC preceding it. Therefore, a joint optimization of interpolation and ADC power

consumption is conducted in this study. The results of this study show the necessity in using a

system-level framework to evaluate power consumption, rather than just examining the power

consumption of individual blocks.

7.2 Interpolation Background

The main objective of an interpolation/resampling filter is to reproduce the samples of a

digitized sampled analog signal at the desired instant, µ, with no or minimum distortion from a

given sampled version. The primary performance metrics on which interpolators are judged are

the SNR degradation through the interpolator, and the granularity of the achievable timing

offsets.

This background on interpolators follows closely the description in [MEY]. The ideal linear

interpolator has a frequency response

( ) ⎛ 2π ⎞

1
H I e jωTs , µTs = ∑H I
⎜ω −
⎜ n, µTs ⎟⎟ (6-1)
Ts n = −∞ ⎝ Ts ⎠

with

⎧⎪T exp( jωµT ) ω < 1


H I (ω , µTs ) = ⎨ s s 2π 2Ts . (6-2)
⎪⎩ 0 elsewhere

The corresponding impulse response is the sampled sinc(x) function. Conceptually, the filter

can be thought of as an FIR filter with an infinite number of taps. The taps are a function of µ.

For a practical receiver, the interpolator must be approximated by a finite-order FIR filter.

( ) ∑ h (µ )e
N −1
H e jωTs , µ = n
− jωTs n
. (6-3)
n=− N

The 2N coefficients of the FIR filters must be pre-computed and stored in a memory for a

number L of possible values. This represents the LUT style interpolator. As a consequence, the

78
timing resolution suffers from a maximum discretization of L/2. Assuming the word length of

each tap is W, implementation complexity depends on the structure parameters, L, 2N, and W.

An alternative to storing a set of coefficients is to compute them on the fly using polynomial

interpolation. Each coefficient hn(µ) is approximated by a (possibly different) polynomial in µ of

degree M(n):

M (n)
hn (µ ) = ∑ c (n )µm
m
. (6-4)
m=0

For a 2N-th order FIR filter, the

N −1
2N + ∑ M (n)
n=− N
(6-5)

coefficients are obtained by minimizing the quadratic frequency-domain error averaged over all

µ:

2
N −1 M ( n )
σ 2 1 2πB ⎡ ⎤
σ = x ∫ ∫ e jwT µ − ∑ ⎢ ∑ cm (n )µ m ⎥e − jnωT dω dµ .
2 s s
(6-6)
4πB 0 − 2πB
e
n= − N ⎣ m=0 ⎦

where σx2 is the input signal power. The optimization is performed within the passband of the

signal x(t). No attempt is made to constrain the frequency response outside B. The errors are

reported as a ratio vs. input signal power in dB.

Since the function is restricted to be of polynomial type, the error of the polynomial

interpolator will be larger than that for the LUT-based MMSE interpolator described by (6-3),

although it can be made arbitrarily small by increasing the degree of the polynomial. Though the

polynomial interpolator performs worse, it is often chosen because it can be implemented very

efficiently in hardware.

For simplicity, it is assumed that all polynomials have the degree M(n) = M. Inserting for

Hn(µ) the polynomial expression of the FIR transfer function and interchanging summation,

results in

79
∑ [c (n)z ].
M N −1
H (z, µ ) = ∑µm m
−n
(6-7)
m=0 n=− N

The inner sum describes a time-invariant FIR filter that is independent of µ, and there is one of

these for each degree m of the polynomial. The polynomial interpolator can thus be realized as a

bank of M parallel FIR filters where the output of the m-th branch is first multiplied by µm and

then summed. This structure was devised by Farrow [MEY]. A block diagram of this structure is

shown in Figure 7-1.

HM(z) x
cm(-N)
x

HM(z) HM-1(z) HM-2(z) ... H0(z) cm(-N-1) z-1


+

...
...
X + X + X + y
cm(N-1) z-1
? +
rm

Figure 7-1: Block diagram of the Farrow interpolator.

The MMSE criterion (6-6) chosen here is a common one, but other metrics have been

proposed in the literature. For instance, [VO] suggests using a weighted average of time- and

frequency-domain error and timing detection error due to the imperfect interpolator. Other

metrics may include side-lobe magnitude when near band interferer rejection is a required feature

of the interpolation filter. Regardless of the optimization function, the general structure of the

taps is very similar (in terms of number of ‘0’ and ‘1’ in the taps). For instance, the Vo taps for

λ = 4 (N, M) = (2, 2) are shown in Table 7-2 along with the MMSE taps in Table 7-1. Therefore,

the power consumption results obtained here can be applied to both. However, the final metric by

which one wants to compare power consumption (i.e. power consumption vs. final SNR or power

consumption vs. timing recovery error) will change which parameters optimize the criteria. The

80
rapid power estimation tool described in Chapter 3 could aid the designer in re-evaluating the

power consumption of the interpolators with different tap values and optimality criteria. The rest

of this discussion assumes the use of the MMSE criteria (6-6).

m=0 m=1 m=2


n = -2 0.0000 -0.3688 0.3688
n = -1 1.0000 -0.6570 -0.3430
n=0 0.0000 1.3430 -0.3430
n=1 0.0000 -0.3688 0.3688
Table 7-1: MMSE coefficients for λ = 4, (N, M) = (2, 2)

m=0 m=1 m=2


n = -2 0.0000 -0.2867 0.2867
n = -1 1.0000 -0.7133 -0.2867
n=0 0.0000 1.2867 -0.2867
n=1 0.0000 -0.2867 0.2867
Table 7-2: Vo coefficients for λ = 4, (N, M) = (2, 2)

7.3 Farrow Interpolator Exploration

The Farrow structure has several parameters:

• Coefficients themselves

• Coefficient bit widths (WT)

• Bit-widths of input data (SNR of input data) (WI)

• Oversampling rate of input data (λ)

• Bit-widths/resolution of timing offset (Wµ)

• Filter lengths (N)

• Polynomial order (M)

The oversampling ratio (OSR) of the input data to the interpolator is dictated by the OSR of

the ADC preceding it. Although an oversampling ratio of 2 is required to meet the Nyquist

sampling requirements, larger OSRs are often used in practice. As seen in Chapter 4, there are

system considerations other than from the synchronization system that may govern this

81
parameter. For instance, a bandwidth-limiting filter must precede an ADC to avoid aliasing of

the signal. The specs of this filter are relaxed if a higher oversampling rate is used in the ADC.

Each of the parameters above affects the two metrics (output SNR, and timing resolution) of

the output as follows:

Timing resolution = λ ⋅ 2

• Output SNR = F (coefficients, WT , WI , λ , N , M ) , where F is some unknown function.

There are several factors we can use to constrain the choice of (N, M). For M=1, the error is

invariant of N. Therefore, only (1, 1) is considered. For the error required by typical

applications, N ≤ 4 suffices to produce small enough errors (below –60 dB for λ = 2, and –90 dB

for λ = 4). With N ≤ 4, there is almost no difference between M = 3 and M = 4. For M = {2,3},

the errors for N = 1 are indistinguishable. Therefore, only (1, 2), (2, 2), (3, 2), and (4, 2) are

considered for M = 2, and (2, 3), (3, 3), and (4, 3) are considered for M = 3.

The exploration is limited to λ = {2, 4} because 2 is the minimum λ which meets the sampling

theorem requirement, and λ > 4 are rarely used [MEY]. Also, the input SNR is limited to {2, 4,

8} bits which gives a range most commonly used in low data rate networks that typically use low

order modulation constellations BER lower than 10-4. Bit resolutions on Wµ are limited to {2, 4,

8} which gives a minimum resolution of 1/512 for λ = 2 and 1/1024 for λ = 4. This is well

beyond the range that most system would ever require (typical systems require 1/8 or 1/16

precision). WT is limited to {2, 4, 8} bits. For high SNR systems, 16 bits should be included for

M = 3 to reach errors smaller than –50 dB for λ = 2 and –60 dB for λ = 4 (see Figure 7-2). Even

with these restrictions on the parameter space, the power consumption of over 500 interpolators is

to be estimated. This would not be possible without the fast gate-level power estimation method

developed in Chapter 3.

Output SNR is calculated assuming the incoming data is AWGN, and that the interpolator

error is white as well. In that case, the variances of the noises add to obtain the output SNR.

82
Since the interpolator noise is not quite white, this is an optimistic, but reasonable approximation

similar to the ones made in ADC analysis.

Hardware complexity (i.e. the number of equivalent full adders) is approximately equal to:

( M + 1) ⋅ (2 N ) ⋅ WI ⋅WT
+ ( M + 1) ⋅ (2 N − 1) ⋅ (WI + WT )
2 N −1
+ ( M + 1) ⋅ (2 N − 1) ∑ log(i + 1)
i =1
(6-8)
+ M ⋅ (WI + WT ) ⋅ (Wµ + 1)
+ M ⋅ log 2 (2 N ) ⋅ (Wµ + 1)
M
+ Wµ ∑ log(i )
i =1

If all the circuitry in the system were switching every cycle, the power consumption would be

proportional to this hardware complexity times the clock frequency. However, there are several

factors that influence how much of the circuitry is switching each clock cycle:

• Density of taps (number of 1’s)

• Correlation in µ

• Correlation in input data

Correlation in µ increases with an increase in λ because it is assumed the µ value is updated at

symbol rate (as in most timing recovery systems). Therefore, µ is changed only every λ samples.

For similar reasons, correlation in input data increases with increasing λ because the symbol rate

remains the same. Therefore, there are more samples of the same symbol, which are highly

correlated. The average input data correlation per bit and µ increase with fewer bits. This is

because the bits used are the MSB’s and in correlated data, the MSB’s exhibit more correlation

than LSB’s. Essentially, the more bits there are, the more LSB’s there are and there is less

correlation between bits of adjacent samples.

83
Figure 7-2: Tap error (dB) vs. (N, M) and WT

For implementation exploration, each set of taps is truncated to WT bits. The error metric is

re-computed for each set of taps at each different resolution and plotted in Figure 7-2. The higher

order and longer filters perform worse than the lower order and shorter filters in the case of small

number of bits. The conclusion is that the higher order and longer filters take more tap resolution

to implement with desired results. WT =16 should be included if the low errors of λ = 4, M = 3

are to be achieved. Looking at the graphs, there are clearly some sets of taps that would never be

used because they perform worse than or equal to a set of taps with less complexity (and hence

less power consumption). Below are the sets of taps (N, M), λ, and WT that would be used:

WT = 2:

λ = 2: (1, 1), (2, 2)

λ = 4: (1, 1)

WT = 4:

λ = 2: (1, 2), (2, 2), (2, 3)

λ = 4: (2, 2), (3, 2)

84
WT = 8:

λ = 2: (2, 2), (3, 2), (2, 3), (3, 3)

λ = 4: (1, 2), (2, 2), (3, 2), (2, 3)

The current method is to optimize the infinite precision taps, and then truncate those taps to

the requisite number of bits. Future work is to optimize the taps for each WT. In this case, maybe

some of the higher order and larger filter lengths could achieve better results at the lower tap

resolutions.

Using these sets of taps listed above, the power consumption is estimated for different WI and

Wµ. The clock speed is increased proportional to λ to keep the symbol rate constant over all

simulations. A symbol rate of 1 MHz is assumed. When comparing interpolation filters with

different WI or λ, the power in the ADC preceding the interpolation filter must be included if a

fair system trade-off is to be made. ADC power is estimated using (3-1) and (3-2) with an FOM

of 1.2e12.

The results show that choosing the wrong parameters could have a detrimental impact (up to

factors of 10) on power consumption. It was expected that the results would show succinct rules

for dictating which interpolator parameters to use for the lowest power implementation of a given

spec. However, nice trends, like those that emerged in Chapter 5 for frequency estimation, are

not observed. Rather, the results highlight the importance of the rapid power estimation

framework for examining a large parameter space. The designer of a Farrow interpolator must

examine many instantiations to determine which one is lowest power for the given specification.

The rest of this chapter describes the results of the exploration. Some basic guidelines are

given for designing the interpolator to meet the final specifications for timing resolution and

output SNR with the lowest power. Interspersed with these guidelines are some examples of how

much power can be wasted if a systematic exploration of the space is not conducted.

85
7.4 Achieving the Timing Resolution Specification

First, the timing resolution specification is addressed. λ and Wµ are the only parameters that

influence timing resolution. Since increasing λ incurs more ADC power, it is expected that the

most power efficient way to achieve the required timing resolution is to use the minimum λ

and Wµ = log 2 (resolution / λ ) .

However, there are a few rare cases where it is advantageous to move to a higher λ to achieve

a required timing resolution and SNR with lower total power. Figure 7-3 through 7-5 show

graphs of equal timing resolutions achieved with λ = 2 versus λ = 4 including ADC power. In

one example, to achieve 22 dB final SNR and 1/1024 timing resolution for 1 MHZ data, the best

solution with λ = 2, Wµ = 9 achieves 22.3 dB SNR for 106 uW (79 uW in the interpolator and 27

uW in the ADC). However, the best solution with λ = 4, Wµ = 8 achieves 23.4 dB SNR for only

68 uW (15 uW in the interpolator and 53 uW in the ADC). So, there is a power savings of 50%

with higher oversampling even when ADC power is taken into account. While such a case is rare

in the data examined in this study, the power penalty for choosing the wrong configuration is

high.

Figure 7-3: Interpolation performance for timing resolution of 1/16

86
Figure 7-4: Interpolation performance for timing resolution of 1/64

Figure 7-5: Interpolation performance for timing resolution of 1/1024

The relative power consumption of the ADC versus the interpolator is a result of the data rate

and the process technology parameters. As the data rate increases but the process stays the same,

the FOM of the ADC decreases, making the ADC more expensive that just linear scaling with

frequency. As process technology scales, the digital circuitry (interpolator) consumes less power,

but analog circuitry power (ADC) doesn’t scale as quickly. Therefore, unlike the frequency

87
estimation results which were essentially technology-independent, the results for the interpolators

are specific to the data rate and technology chosen. Therefore, the designer is encouraged to

conduct a systematic exploration of the design space for the given process and data rate to

achieve the lowest power implementation.

Another method to increase timing resolution that doesn’t incur the power penalty of the

higher speed ADC would be to run the ADC at the minimum oversampling rate of 2x, have a 2x

fixed interpolator after the ADC to upsample the data to 4x oversampling, and then run through a

λ = 4 variable interpolator. The fixed interpolator would add some noise to the signal, and some

extra power consumption, but may be worthwhile versus the corresponding increase in Wµ. This

analysis is beyond the scope of this thesis and is left for future work.

7.5 Achieving the Output SNR Specification

From a systems perspective, the lowest power solution for a given SNR spec usually has the

output SNR limited by the input SNR and not the errors in the interpolator itself (internal SNR).

Typically, achieving the input SNR has a high marginal cost because circuit power in the

transmitter, receiver front-end, and ADC has to be expended to achieve this SNR. If the

interpolator output SNR is not input SNR limited, these subsystems are wasting power achieving

an unnecessarily high input SNR.

To achieve an input SNR-limited implementation, the coefficients, N, M and WT must be

chosen so that the noise produced in the interpolator is much lower than the noise of the input.

However, when there are several configurations of N, M, and WT that achieve this goal, there are

no hard and fast rules for how to select them to achieve the lowest power result. Figure 7-6

through 7-9 show the exploration results for the interpolators examined in this study.

There are cases where increasing the performance actually requires less power. For any two

set of points (SNR, power): (x1, y1), (x2, y2) if x1 > x2 and y1 < y2, then (x1, y1) would always

be favored over (x2, y2) even if it exceeds the SNR requirement. For instance, for WI = 4, λ = 2,

88
and Wµ = 8, the configuration with N = 1, M = 2, and WT = 8 achieves (20.5 dB, 167 uW),

however the configuration with N = 2, M = 2, and WT = 2 achieves (22.3 dB, 102 uW). There is

an increase in performance by 1.8 dB for a decrease in power of almost 40%.

Figure 7-6: Interpolator performance for Wµ = 2

Figure 7-7: Interpolator performance for Wµ = 4

89
Figure 7-8: Interpolator performance for Wµ = 8

There can also be a large penalty in power for a marginal increase in performance for a given

WI, λ, and Wµ. For instance, for WI=2, λ=2, and Wµ=8, the configuration with N=2, M=2 and

WΤ=2 achieves (11.9dB, 32uW) however, the configuration with N=3, M=3, and WΤ =8 achieves

(12.0dB, 371uW). There is an increase of a factor of 11 in power, but only a 0.1dB increase in

performance. The designer with an output SNR specification of 12dB might make a system-level

decision to reduce it to 11.9dB to save over 300uW in the interpolator.

Because the effect on power consumption in these examples is significant, the designer is

encouraged to conduct a systematic exploration of the design space to achieve the lowest power

implementation using the framework and analysis implemented in this thesis.

7.6 Conclusion

Characterization of the Farrow interpolator is required for exploration of the timing estimation

space. Over 500 instantiations of the Farrow interpolator were examined covering the parameter

space used in typical systems. It was shown that a designer who follows heuristic guidelines to

implement these interpolators could incur huge power penalties. Interpolator design should not

be done in isolation. Joint optimization of the ADC and interpolator is necessary to achieve

optimal system performance. Designers are encouraged to conduct a systematic exploration of

90
the design space to achieve the lowest power implementation. The tools developed in this thesis

allow that exploration to be easily conducted.

It is expected that the Farrow-style interpolator will outperform the LUT-based interpolator

when the storage space required for the taps is large (i.e. when N ⋅ WT ⋅ 2

is large). However,

the exact crossover point has not been evaluated.

7.7 Postscript: Interpolator Hardware Implementation Specifics

The implementation chosen here follows Figure 7-1 very closely. Signed integers are used for

the input data and intermediate results throughout the interpolator. The µ data is unsigned

fractional representation to represent values in [0..1) with the appropriate precision (Wµ).

Intermediate results are scaled to accommodate growing bit precision without truncation errors.

There are several hardware improvements that can be made to improve the area and power of

the interpolators. The use of carry save arithmetic may improve the performance and/or reduce

the power consumption of the interpolator. This would be especially effective for large N and

large input bit widths. An automated tool has been developed that scales intermediate results of

FIR filters while keeping resulting overall quantization noise within a certain bound [SHI]. This

would result in area and power savings if applied to these interpolators. For specific

configurations, there are some hardware simplifications that will reduce the power consumption.

For instance, the symmetry in the filter taps for M=2 can be exploited to reduce hardware as

described in [VO2]. Incorporating these changes into the interpolation implementation is beyond

the scope of this thesis.

91
PN3 System Design
8
The field of Wireless Sensor Networks (WSNs) has many applications, including closed-loop

environmental control of smart buildings, ecological monitoring, and structural monitoring. The

embedded nature of these applications and the large number of nodes makes changing batteries

impractical. Therefore, these applications can only be mass deployed when power consumption

is reduced to levels below what can be scavenged from the environment. It has been calculated

that nodes consuming less than 1mW can achieve reasonable duty cycles by harvesting light or

vibration energy typical in office environments [RAB2].

In Chapter 6, it was shown that differential modulation schemes, while theoretically

inefficient, can consume less power in practice than coherent schemes if short packets are used.

One design example where short packets are indeed common is sensor networks.

Differential modulation schemes are one example of where the system specs are designed to

relax the synchronization requirements therefore reducing the length of the packet header. Going

further, fully non-coherent modulation schemes might reduce the header length even more. The

system power estimation tool can be used to evaluate whether these changes actually improve the

system power consumption.

92
PN3 is a 50 Kbps system designed for use in wireless sensor network applications [SHE]. The

PN3 system design is an exercise in extreme simplicity to study the effects on system power

consumption. A non-coherent modulation scheme is used, no AGC is employed, and other

system specs are selected, such as packet length and crystal accuracy, to ensure very relaxed

synchronization requirements. In addition to designing the synchronization scheme with a short

header length, both analog and digital implementations are explored.

The modem front-end in use is a MEMS-based RF transceiver designed for low power and

fast turn on times for WSNs. The carrier frequency is 2 GHz. The radio employs a self-mixing

signal down conversion using an envelope detector [OTI]. The front-end consumes 3 mW in

receive mode, and around 6mW in transmit mode. The maximum data rate is 50 Kbps, and, since

a self-mixing scheme is employed, on off keying (OOK) is the only modulation scheme available

within one channel. (If several channels are available, FSK modulation is possible.) The 1 µs

oscillator startup time in the transmitter dictates the transmitted signal envelope.

MAC layer schemes for WSNs are described in [LIN]. WSNs use packet-based

communication where the synchronization parameters need to be re-estimated for every packet.

Packet lengths for typical applications are 30 bits for control packets and 200 bits for typical data

packets with a maximum of 500 bits. The small packet lengths dictate the need for short

synchronization headers. In contrast, the 128-bit preamble for 802.11b would impose overheads

over 300% for control packets. MAC layer simulations show that higher data rates reduce overall

system power consumption by reducing the duration of packets and consequently the collision

probability.

8.1 Simplification of Synchronization

The low power design strategy of this work is to reduce the complexity of the synchronization

system through careful system design. A trade-off of packet length, data rate, clock accuracy,

93
and modulation scheme achieves simplified synchronization requirements able to be implemented

with low power consumption.

First, the choice of RF radio architecture (itself a product of a simplification strategy to reduce

power) has already greatly simplified the synchronization requirements. First, interferers do not

need to be considered in the synchronization system. The high-Q MEMS channel-select filter

sufficiently suppresses out-of-band interference. In-band interferers are dealt with at the MAC

layer with carrier sense mechanisms. Second, the self-mixing architecture removes frequency

and phase information from the incoming signal. Therefore, OOK is the only available

modulation scheme for which frequency and phase synchronization are not required. For any

OOK radio, the only synchronization elements required are timing and amplitude. It is desirable

to estimate them only once rather than continuously track them throughout the packet for low

power operation.

Amplitude estimation requirements are dictated by packet length, data rate, and channel

Doppler and multipath. It is desired to keep the symbols slow enough so that multipath effects do

not need to be estimated and corrected. This scheme renders individual links vulnerable to deep

fades which may not be acceptable in all systems. However, WSNs are robust to single link

failures, relying, instead, on the ability to route packets to a group of nearby nodes, one of which

will not be suffering from a fade due to the spatial diversity.

Delay spreads of up to 460ns for indoor wireless channels have been reported in the literature.

Multipath effects are insignificant if the delay spread is much less than the symbol time. This

ensures that the channel looks flat with respect to the signal bandwidth. For all multipath arrivals

to fall within the first 10% of the symbol time, symbol times longer than 4.6 µs or data rates

slower than 200 Ksps are required. Having removed multipath effects, only the signal amplitude

needs to be estimated to achieve optimal decisions. More complicated and power hungry

circuitry, such as equalization, are avoided. Therefore, for any WSN with robustness to single-

link deep fades, no equalizer is required if symbol rates are slower than 200 Ksps.

94
To estimate the amplitude at the start of the packet only, rather than track it throughout the

packet, the entire packet must be transmitted before the channel changes substantially. The rate

of change of the channel is dictated by the Doppler rates, typically 10Hz for indoor wireless

channels. For 90% correlation between the header and the end of the packet for any indoor

wireless link, the packet length must be less than 10ms. Therefore, there is a trade-off between

the maximum packet length and the data rate. Faster data rates allow longer packets to be sent. If

a maximum packet length of 1024 is required, then data rates faster than 100Kbps are needed.

In this particular case, the maximum data rate that can be achieved by the RF front-end is 50

Kbps. This will allow the 500-bit packets required for normal operation to be sent, but

unfortunately is slower than the 100 Kbps required to send 1024-bit packets without having to re-

estimate the signal amplitude. The 1024-bit packets used for development and debugging efforts

can either be fragmented into smaller pieces, or sent whole knowing the packet error rate (PER)

will be impacted.

Timing offset is affected by packet length, clock accuracy, and required synchronization

timing resolution. Clocks can slip up to ½ the required timing resolution before needing to be re-

estimated. For any communication system, if the crystal accuracy in ppm is better than

1e6*ε/2N, then no timing tracking is needed. Where N is the maximum number of symbols in a

packet, and ε is the fractional timing resolution requirement. In our case, with a required timing

resolution of one-tenth the symbol period, and 1024-bit packets, 50 ppm clocks are required to

not have to re-estimate the timing instant during the packet. Shorter packets and coarser timing

resolution requirements will ease the clock accuracy requirements.

Finally, carrier sense is a required feature of our MAC protocol [LIN]. While there are many

possible ways of implementing carrier sense, a simple method is chosen for this system. The

chosen carrier sense method integrates the channel energy over 10 symbols and compares the

energy to a fixed threshold. Coding in the data-link-layer ensures that the data streams have at

95
least 5 ones in any string of 10 symbols. A coding scheme, with an extremely simple

implementation, uses one extra symbol every 9 symbols to achieve this requirement. The CS

threshold is left programmable so that the MAC layer can tradeoff the probability of miss detect

vs. the probability of false detect.

Table 8-1 details the synchronization requirements for the PN3 system. Only timing and

amplitude are required. Further, since these parameters are static over the length of the packet, it

is only necessary to estimate them once, rather than continuously track them.

Table 8-1: Summary of synchronization requirements for the PN3 system

Synchronization Synchronization
Parameter Requirement
Timing Estimate Once
Frequency Not estimated
Phase Not estimated
Amplitude Estimate Once

While the minimum SNR from the envelope detector for correct operation is 13dB, the

received signal often has higher SNR. Since no automatic gain control (AGC) is employed, the

signal amplitude can be almost 30 times larger than that of a 13 dB SNR signal. To digitize this

range of signals while keeping quantization noise from affecting the low-SNR operation, an 8-bit

analog-to-digital converter (ADC) is required. A discussion of the implications of this AGC-less

design on system power consumption is found in §8.8.

8.2 Analog vs. Digital Implementation

Two implementation methods for the synchronization system are investigated. First, an all-

digital scheme is used that takes the signal from the envelope detector directly to an ADC after

which all synchronization processing is done in the digital domain. The second scheme is

designed to eliminate the largest single power consumer in the synchronization system, the ADC,

by performing the synchronization with analog circuits that are controlled by a digital

96
synchronization loop. For each scheme, the synchronization header length and content are

chosen to optimize the performance of the selected algorithms.

Ideal OOK detection achieves 1e-4 BER with 11.4 dB SNR [PRO]. Therefore, the total

implementation loss in the baseband must be less than 1.6 dB to achieve the 13 dB requirement.

There are several places where loss will occur: non-ideal matched filtering, threshold variance,

and timing variance. In addition, for the digital algorithm, there will be clipping and quantization

noise in the ADC. In the analog algorithm, there will be circuit noise and non-idealities such as

offset and non-linearity. As was described in Chapter 2, the total SNR loss is initially divided

amongst the synchronization blocks using designer experience and initial estimates. From initial

calculations, it was determined that matched filtering and threshold estimation would dominate

the implementation losses. Therefore, it was decided to split 1dB loss evenly between them, and

split the remaining 0.6dB evenly between timing estimation and other losses. The design targets

for the implementation losses are detailed in Table 8-2.

Table 8-2: Target synchronization implementation losses

Matched filter losses 0.5 dB


Threshold variance losses 0.5 dB
Timing variance losses 0.3 dB
Other (ADC noise, circuit noise, etc.) 0.3 dB
Total 1.6 dB

In the following, the two implementation schemes are described and analyzed.

8.3 Matched Filtering

Since the startup time of the oscillator is small compared to the symbol time (less than 1us),

the incoming signal can be matched filtered to a square wave without much loss of SNR. Square

wave matched filtering is performed by simple integration of the signal energy. This is

performed by an adder in the digital domain and an integrator in the analog domain.

97
It is possible to calculate the SNR loss of non-ideal matched filtering if the exact transmit and

receive waveforms are known. However, for our system, an analytical expression for the

waveforms is difficult to determine. First, the startup time of the oscillator dictates the shape of

the transmitted waveform. Second, the waveform is altered by the non-linearity in the envelope

detector of the receiver. The SNR loss can be bounded by estimating that the received signal

deviates from a square wave in the first and last 10% of the symbol. Therefore, correlation of the

received waveform with a square matched filter results in less than 0.5dB loss for both the analog

and digital schemes.

8.4 Amplitude Estimation

The threshold, for both the digital and analog schemes, is estimated by averaging the energy in

N symbols of alternating 0’s and 1’s. With N even, this threshold estimator automatically

accounts for any offset present in the signal that could result from the RF and analog circuitry and

the ADC. In addition, because of the alternating 0’s and 1’s in the header, the threshold can be

estimated without prior timing information. The variance of the estimated threshold equals the

symbol variance divided by the estimation length, N.

Threshold variance directly degrades the SNR of the symbols. This can be seen upon

examination of the decision equation for noisy OOK symbols with a noisy threshold:

1
>
Es + N o Th + N t (8-1)
<
0

Where Es is the symbol energy, No is the symbol noise, Th is the ideal threshold, and Nt is the

threshold estimate noise. Moving Nt to the other side of the equation gives the standard OOK

decision equation with increased noise.

With N = 10 symbols, this threshold estimator achieves 1/10 the variance of the symbols.

Therefore, the SNR degradation due to the amplitude estimation is approximately 0.5 dB.

98
With ideal circuitry, the analog threshold estimation scheme achieves the same variance as the

digital scheme. In actuality, the integrator used to average the 10 symbols suffers form non-ideal

gain (ideal gain is 1/10 to achieve the appropriate average), non-linearity in the integration

function, offset, and noise. It is assumed that circuits can be implemented so that these effects are

within the margin allotted for circuit impairments.

8.5 Timing Estimation

Timing estimation is the main area where the digital and analog schemes diverge. The digital

scheme can take advantage of the fact that parallelism is easily implemented with digital circuits

and use a parallel feed-forward search for the correct timing instant. Approximate estimates

show that implementing a similar algorithm in the analog domain would consume more power

than the ADC (replicating the code correlator of length 7 with 10x oversampling would require at

least 70 amplifiers, and the peak detection circuit would require several comparators and precise

gain stages). Therefore, a feedback DLL-like algorithm is used in the analog scheme.

The chosen digital synchronization scheme uses an ADC that samples at 10 times the symbol

rate. The ADC oversampling ratio (OSR) of 10 achieves the timing variance of 2.5e-3 required to

keep implementation losses below the 0.5 dB target. Theoretically, oversampling ratios as small

as 2 could be used. However, the data would then have to be digitally interpolated to achieve the

required timing resolution. Given the very low data rates of 50 Kbps used here, 10x

oversampling yields an ADC sampling rate of 500 KHz, which is achievable for under 200 uW.

The optimal timing instant is estimated using a maximum-likelihood feed-forward data-aided

estimator,

N −1
arg max
ε
∑a z
n=0
n n (ε ) , (8-2)

where an are the known data bits (mapped from OOK to antipodal representation) and zn(ε) is the

received signal at the fractional timing offset ε. The input stream is correlated against the known

99
7-bit sequence in the header, and the maximum search is performed with a peak-detection

algorithm. In addition to performing timing estimation, this peak detection method also yields

packet synchronization because a sequence is used that has high autocorrelation and low cross

correlation with other sequences in the preamble.

The variance of the timing estimate is a combination of the variance from the estimator and

the variance due to quantization of the timing resolution. The variance of the estimator is given

in [MEY] for root-raised cosine data. Since the transmit shape is unconventional and has a high

excess bandwidth, the expected variance is coarsely estimated to be approximately the variance

for root-raised cosine shape with maximum excess bandwidth (α = 1) to be:

1 0.25 1
var(ε ) T = * + 2 * 0.004 , (8-3)
N 2 SNR N

or 1.0e-3 (σε ~= 3% of the symbol time) at 13 dB input SNR with an estimation length of 7.

The variance due to quantization of the timing estimator is largest when the optimal timing

instant is exactly halfway between two of the samples. In this case, the variance is:

(
var(ε ) Q = 1
2OSR
)
2
(8-4)

The chosen timing quantization of 10 samples per symbol yields a variance of 2.5e-3.

Typically increasing the sampling rate of the ADC to improve the timing variance is

expensive, so the optimal solution achieves var(ε)T much less than var(ε)Q. In this

implementation, the variance due to the timing estimator is one half that due to the timing

quantization.

The chosen analog synchronization scheme uses a data-aided feedback algorithm to determine

the optimal timing instant. The loop filter and voltage/numerically controlled oscillator

(VCO/NCO) could be implemented in either the digital or the analog domain. However, the

symbol integration would be implemented with analog integrators. Analysis in [MEY] shows

that feedback algorithms take approximately two times the amount of time to converge to the

100
same variance as feed-forward algorithms. Therefore, a header sequence of length 14 would be

needed for timing estimation. The header sequence would most likely be alternating 0’s and 1’s,

which carries the most timing information.

Since alternating 0’s and 1’s are used for timing estimation, packet synchronization needs to

be performed separately in the digital domain once timing synchronization is achieved. A 7-bit

PN sequence is required to get the same detection probability of the digital algorithm.

While impact of timing variance on SNR is treated in [MEY], but requires the exact pulse

shape to be known. Since the exact pulse shape is unknown in this case, it is difficult to calculate

the exact SNR degradation due to timing variance. From the analysis in [MEY], we do know that

errors in timing yield a reduction in the useful component of the signal and introduce ISI.

For the digital scheme, rough analysis is possible with the reasonable assumption that most

timing errors have a magnitude less than 1/OSR. In the worst case, when the correct timing

instant is halfway between two samples, the timing error is 1/2OSR. When a fractional timing

error of x occurs, then x% of the previous (or next) symbol’s energy is integrated into the current

symbol and x% of the current symbol is integrated into the next (or previous) symbol. With

uniformly distributed uncorrelated data, there is a 50% chance that the previous (or next) symbol

is the same as the current one. In that case, the current symbol integrates the correct amount of

signal energy. The other 50% of the time, the previous (or next) symbol is different from the

current symbol and the energy integrated by the current symbol is 1-x% (without loss of

generality, assuming the current symbol is a ‘1’). Therefore, the probability of error in the worst

case can be estimated as

1
Pe = 0.5 * Q( SNR ) + 0.5 * Q( SNR (1 − ))
2OSR (8-5)

The inverse Q function can be used to determine the SNR loss.

101
Using (8-5), the digital scheme timing estimation variance of 2.5e-3 yields approximately 0.3

dB loss, which was the implementation target. It is assumed that the analog scheme could

achieve the same variance and implementation loss.

8.6 Digital Synchronization Scheme Summary

The synchronization header structure for the digital scheme is shown in Figure 8-1a. The

chosen digital synchronization scheme uses an ADC that oversamples the signal from the

envelope detector by a factor of 10. The threshold is estimated by averaging the energy in 10

symbols of alternating 0’s and 1’s. Next, the optimal timing instant and packet synchronization is

estimated using a maximum-likelihood feed-forward data-aided estimator that correlates the data

with a 7-bit sequence and performs peak detection to find the maximum over the 10 timing

instants provided by the ADC. Lastly, during data reception, symbols are matched filtered, and

sliced using the estimated threshold. The total synchronization header length is 18 (10 for

threshold estimation, 7 for timing estimation, and 1 for overhead because the peak detection

algorithm latency renders the first data symbol unusable).

Figure 8-1: Digital (a) and analog (b) synchronization header structure

Figure 8-2 details the simulated implementation loss of the various cumulative impairments in

the digital scheme. The first simulation includes only losses from non-ideal matched filtering

(‘MF’). The second simulation additionally includes quantization and clipping in the ADC

(‘quant’). The third simulation additionally includes the threshold variance (‘thresh var’). And,

the fourth simulation includes all the implementation impairments by finally adding in the timing

102
variance (‘timing var’). The total implementation loss is 1.3dB, thereby achieving better than 1e-

4 BER at 13dB input SNR. Documentation of the simulation environment is found in §8.10.

1.0E-03

1.0E-04
BER

Ideal OOK
1.0E-05 MF
M F, quant
M F, quant, thresh var
M F, quant, thresh var, timing var
Implementaito n Target
1.0E-06
10 11 12 13 14
SNR (dB)

Figure 8-2: Performance breakdown of the digital synchronization scheme

8.7 Analog Synchronization Scheme Summary

The synchronization header structure for the analog scheme is shown in Figure 8-1b. In the

chosen analog synchronization scheme, an integrator that averages the energy in 10 symbols of

alternating 0’s and 1’s performs the threshold estimation. The estimated threshold value is

sampled and held for use throughout the data portion of the packet. Next, timing estimation is

performed using a data-aided feedback algorithm. Lastly, during data reception, symbols are

matched filtered (integrated), and sliced against the estimated threshold in the analog domain

using a comparator. Unlike the digital scheme, where the timing estimator also yields packet

synchronization, the analog scheme must perform this function separately. This is done in the

digital domain by correlating the received data with a 7-bit sequence. The total synchronization

header length is 31 (10 for threshold estimation, 14 for timing estimation, and 7 for packet

synchronization).

103
8.8 Comparison of Synchronization Schemes

Because the two schemes have different header lengths, a comparison must be done within the

system power framework described in Chapter 3. The energy per useful bit metric (3-3), includes

the number of synchronization bits (BS), the number of data bits (BD), the transmitter power

dissipation including radiated power (PDiss,TX), the receiver front-end power (PDiss,TX), the

baseband power when synchronizing (PS), and the baseband power when receiving data (PD).

It is estimated that the digital scheme will consume 300 µW. The ADC power estimation is

200 µW using an FOM of 7e11 although FOMs as high as 8.6e12 have been reported for specs in

this range [SCO]. The digital circuitry power estimate is less than 100 µW as determined by the

aforementioned power estimation tool. Since the ADC dominates the power consumption and the

digital circuitry power is reduced only slightly from synchronization mode to data reception

mode, the power during synchronization and data reception are approximately equal.

Even if the analog synchronization circuits consume no power, the analog scheme consumes

more total energy for packets shorter than 400 bits because of the increased synchronization

header length. Figure 8-3 shows the energy per useful bit for the system assuming PTX = 6 mW,

PRX = 3 mW, PS = PD = 300 µW and BS = 18 for the digital scheme, and PS = PD = 0 and BS = 31

for the analog scheme.

4.4.E-07
Analog Scheme
Energy per Useful Bit

3.9.E-07 Digital Scheme

3.4.E-07

2.9.E-07

2.4.E-07

1.9.E-07
10 100 1000
Packet Length

Figure 8-3: Energy-per-useful-bit vs. packet length of analog and digital schemes

104
If an analog scheme could be devised in which the synchronization circuitry consumes ½ as

much power as the digital scheme, it could use only two extra header bits (20 total header bits)

before the system power consumption analysis would favor the digital scheme.

The conclusion is that header length is more important for system power consumption than

baseband circuit power once baseband circuit power is reduced to a small fraction of the receiver

and transmitter front-end power.

One drawback of the digital algorithm is that the ADC remains on during data reception.

Once the correct timing instant is known, the ADC could be turned off and an analog integrator

could be used to integrate the energy and slice it in the analog domain. The integrator and

comparator would consume less power than the ADC. Therefore a hybrid scheme where

synchronization is done in the digital domain and reception is done in the analog domain would

consume less power than the all-digital scheme. However system power analysis shows that this

hybrid digital/analog scheme has the potential to decrease energy per useful bit by only 2-3%

even assuming the analog components could be implemented for no power. For the next

generation transceiver, which has reduced PTX and PRX by a factor of two, the hybrid scheme

could reduce system energy consumption by 3-6%. It is unknown whether these small gains

would be worth the design time and increased area incurred by this scheme.

Finally, the dynamic range requirement of the ADC is large because no AGC is implemented.

A perfect AGC could reduce the ADC to just over 2 bits (13dB SNR). This would yield a 40x

decrease in the ADC power. The system power consumption would be improved as long as the

AGC power was less than the decrease in ADC power. However, even if the AGC could be

implemented with zero power and requiring no extra header bits, this would only yield a 3%

change in the system energy. Further, if inclusion of the AGC required more header bits, it would

adversely affect the system performance. Therefore, the current system design, which does not

use an AGC, has better system performance.

105
50%
0-bit header
40% 9-bit header

Savings
30%

20%

10%

0%
10 100 1000
Packet Length

Figure 8-4: Energy savings of 0-bit and 9-bit headers vs. 18-bit headers

The conclusion is that the digital synchronization and detection scheme power consumption is

low enough so that further reduction has very little impact on system power performance. The

synchronization parameter that has the potential to substantially impact system performance is the

header length. As shown in Figure 8-4, a half-length header has the potential to decrease system

energy by 19% for 30-bit control packets, and 4% for typical data packets of 200 bits, while a

zero-length header has the potential to decrease the system energy by 38% for control packets and

8% for typical data packets.

8.9 Conclusion and Future Work

Synchronization performance requirements were reduced through careful selection of

modulation, data rate, packet length, and clock accuracy. Of the four channel parameters

(frequency, timing, phase, and amplitude), only two parameters, timing and amplitude, need to be

estimated for our radio in a WSN environment. Further, these two parameters need to be

estimated only once in a packet rather than continuously tracked. Implementation schemes across

the digital/analog boundary were explored. It was determined that the digital algorithm results in

lower system power because it requires a shorter header. Because great care has been taken to

reduce synchronization requirements at a system level, and to reduce the power consumption of

the synchronization circuits at the block level, the digital algorithm power consumption is low

106
enough so that further reduction has very little impact on system performance. One way to

substantially reduce system energy consumption would be to further reduce the header length.

8.10 Postscript: Simulation Environment

The digital synchronization system was simulated in MATLAB Simulink and Stateflow

[MAT]. The top level simulation and main synchronization blocks are shown in Figure 8-5. The

main synchronization block consists of a few correlators and a state machine for control.

Simulation results of the timing correlator peak are shown in Figure 8-6.

Figure 8-5: Digital algorithm high level simulation and digital synchronization block.

107
Figure 8-6: Simulation results of the digital synchronization system timing correlator

108
Conclusion and Future Work
9
This thesis has shown that through a systematic exploration of synchronization power

consumption, significant system energy savings can be realized. First, a framework for exploring

power consumption in synchronization systems was defined. Then power consumption was

explored in more detail for frequency estimation and timing recovery. Lastly, these results were

used in system examples to show that the framework developed here had significant impact on

the system energy consumption.

An enabling step in this work was the development of a fast and accurate method for power

estimation that is within 15% accurate of the best available method and over 50 times faster.

This tool was used to conduct a systematic exploration of feed-forward data-aided frequency

estimation algorithms. The results of this study were straight-forward rules for which algorithm

to choose for a given system specification. Simultaneous reductions in energy consumption and

convergence times of more than a factor of 4 are possible in some scenarios.

The results of the frequency estimation study were used to improve the power consumption of

the frequency estimator block in the PNII system by 84% while simultaneously improving the

convergence time of this block by 50%. Further exploration of both the phase and frequency

estimator blocks in this system resulted in total convergence time reduction of 75%,

109
synchronization system energy reduction by 66%, and system energy reduction by 7%. In

addition, differential versus coherent modulation schemes were compared in a system power

consumption framework. While differential modulation schemes require more transmit power to

achieve the same BER, they can relax the constraints on the synchronization system. It was

determined at what packet length it makes sense to move to differential modulation schemes.

Following the framework developed here, a synchronization system was developed for a

wireless sensor network device that consumes 300uW (including ADC power). This is low

enough so that further reduction has very little impact on system energy consumption.

Because synchronization requirements are heavily dependent on system parameters like

modulation type, data rate, and analog front-end performance, no two synchronization systems

are the same (even within the same wireless communication standard). Therefore,

synchronization design has long been the domain of experts who use their experience to guide

their selection of the appropriate algorithms. While this method usually produces a system that

meets the performance requirements (variance or SNR margin), it is not necessarily optimal from

a power, area, or convergence time perspective. This work has shown that in some domains (i.e.

frequency estimation), systematic exploration of the space can result in straight forward rules for

achieving the performance (variance) with the best convergence time and power consumption.

For other spaces where straight-forward rules are not available (i.e. interpolators), heuristic

design can result in substantially suboptimal designs. Tools have been developed in this thesis to

rapidly characterize the large design space and allow the correct algorithm parameters to be

selected.

There are many fundamental benefits to completing the exploration of the synchronization

space. It will highlight areas for future algorithm development where existing algorithms are

inefficient. It will assist in producing the most efficient implementation for existing wireless

communication standards by determining which algorithms are most efficient for a given

110
specification. Finally, it can assist in the creation of new wireless communication standards to

meet the quality of service goals with the lowest power or lowest synchronization overhead.

The importance of synchronization within the wireless system is becoming more critical as

transmit distances are decreased, higher transmission speeds and higher order modulation

schemes are used, and integration concerns move the RF circuitry into digital CMOS processes.

Therefore, the results of this work will have more impact as time goes on.

111
Power Estimation Scripts
A
A.1 Makefile

#######################################################################

PROJ_NAME = FREQ_EST_7 #directory name consistent in all trees

SRC_ROOT = /tools/designs/tcir/synch/sim

MC_ROOT = /tools/designs/tcir/synch/mc/${PROJ_NAME}

NETLIST_SCRIPT = scripts/create_netlist_vlg.scr

REPORTING_SCRIPT = scripts/report_power.scr

TESTBENCH_SCRIPT = ${PROJ_NAME}/TB.vhd

SIMULATE_SCRIPT = scripts/simulate.do

SYNTHESIS_SCRIPT = scripts/syn_script.dc

MTI_ANALYZER = vcom

MTI_OPTS = -93 -source

MTI_WORK = -work work

VSIM_OPTS = -c -do

DC_SHELL = dc_shell

112
#------------------------------------------------

# report the power consumption

%.base_pwr %.node_pwr: %.pwr_scr %.mapd_vhd %.saif

dc_shell -f $(*).pwr_scr

# simulate to get the switching activity file

# depends on: .mapd_vhd, .fwd_saif .tb_vhd .do

%.saif: %.mapd_vhd %.fwd_saif %.tb_vhd %.do

-rm -r work

-vlib work

vmap work work

vmap CORE9GPLL /tools/picoradio/PN3/hw/lib/CORE9GPLL_VHDL_VITAL

vmap CORX9GPLL /tools/picoradio/PN3/hw/lib/CORX9GPLL_VHDL_VITAL

(${MTI_ANALYZER} ${MTI_OPTS} ${MTI_WORK}

/tools/picoradio/PN3/hw/lib/pulls.vhd)

(${MTI_ANALYZER} ${MTI_OPTS} ${MTI_WORK}

${SRC_ROOT}/$(*).mapd_vhd)

(${MTI_ANALYZER} ${MTI_OPTS} ${MTI_WORK}

${SRC_ROOT}/$(*).tb_vhd)

vsim ${VSIM_OPTS} $(*).do

-rm $(*).old_saif

mv $@ $(*).old_saif

sed -e 's/\\\[[0-9]*\\\]/~&~/g;s/~\\\[/\\\(/g;s/\\\]~/\\\)/g'

$(*).old_saif > $@

# analyze the vhdl file to create the database, mapped vhdl file, and

the forward saif file

%.mapd_vhd %.fwd_saif: %.phys_v %.net_scr

dc_shell -f $(*).net_scr

113
# make the netlisting script

%.net_scr: $(NETLIST_SCRIPT)

sed -e 's/BASENAME/$(*)/g' $(NETLIST_SCRIPT) > $@

# make the reporting script

%.pwr_scr: $(REPORTING_SCRIPT)

sed -e 's/BASENAME/$*/g' $(REPORTING_SCRIPT) > $@

# make the simulation do file

%.do: $(SIMULATE_SCRIPT)

sed -e 's/BASENAME/$*/g' $(SIMULATE_SCRIPT) > $@

# compile vhd source with constraints to create final physical verilog

%.phys_v: %.syn_scr %.fixed_vhd

-rm -r work/*

-vlib work

dc_shell -f $(*).syn_scr | tee $(*).compile_log

# fix the vhd from module compiler

%.fixed_vhd:

sed -e "s/INVERTER/INV/" ${MC_ROOT}/$(*).vhd > ./$(*).fixed_vhd

%.syn_scr: $(SYNTHESIS_SCRIPT)

sed -e 's/BASENAME/$*/g;' $(SYNTHESIS_SCRIPT) > $@

# can't make the testbench sufficiently parameterizable with this

version of Make

114
# need to set shell variables TEMPi and TEMPo to be the inwidth and

outwidth respectively

# before making the testbench

%.tb_vhd: $(TESTBENCH_SCRIPT)

# TEMPi := $(shell echo $* | awk '{ FS = "_" } ; {print $3}')

# TEMPo := $(shell echo $* | awk '{ FS = "_" } ; {print $4}')

sed -e

's/BASENAME/$*/g;s/INWIDTH/$(TEMPi)/g;s/OUTWIDTH/$(TEMPo)/g'

$(TESTBENCH_SCRIPT) > $@

# type: make foo.clean to clean all associated files

%.clean: FORCE

- rm $(*).tb_vhd $(*).do $(*).pwr_scr $(*).net_scr $(*).db

$(*).mapd_vhd $(*).mapd_vlg $(*).fwd_saif $(*).saif $(*).old_saif

$(*).syn_scr $(*).fixed_vhd $(*).compile_log $(*).mr $(*).st $(*).syn

$(*)__* $(*)*.pvl temp.v post*.db $(*).dc $(*).sdc

FORCE:

A.2 Netlist Script

/* dc_shell Command Log */

/* sets useful naming rules mostly for the backend */

/* unicad_setup_file =

"/tools/unicad2.4/HandOffKit_1.8.1.1/products/ptKit/etc/.synopsys_unica

d_dc.setup" */

vhdlout_write_components = true;

vhdlout_use_packages = {"IEEE.std_logic_1164", "CORE9GPLL.all" };

power_preserve_rtl_hier_names = true;

115
bus_naming_style = "%s[%d]" ;

bus_dimension_separator_style = "][";

bus_extraction_style = "%s[%d:%d]";

analyze -format vhdl BASENAME.vhd

elaborate BASENAME

set_fix_multiple_port_nets -all

/* 500 KHz clock -- denomiated in ps */

create_clock CLK -name clk -period 2000000

link

set_max_area 0

/* write the results */

define_name_rules vhdl -type port -allowed "A-Z a-z _ 0-9 () []" \

-first_restricted "0-9 _ ()

[]"

define_name_rules vhdl -type cell -allowed "A-Z a-z _ 0-9 ()" \

-first_restricted "0-9 _ ()"

define_name_rules vhdl -type net -allowed "A-Z a-z _ 0-9 () []" \

-first_restricted "0-9 _ ()

[]"

define_name_rules vhdl -special vhdl93

change_names -hier -rules vhdl

rtl2saif -output BASENAME.fwd_saif

116
write -hierarchy -format db -output BASENAME.db BASENAME

write -hierarchy -format vhdl -output BASENAME.mapd_vhd BASENAME

change_names -hier -rules verilog

write -hierarchy -format verilog -output BASENAME.mapd_vlg BASENAME

exit

A.3 Reporting Script

/* dc_shell Command Log */

bus_naming_style = "%s[%d]" ;

bus_dimension_separator_style = "][";

bus_extraction_style = "%s[%d:%d]";

vhdlout_use_packages = {"IEEE.std_logic_1164", "CORE9GPLL.all" }

power_preserve_rtl_hier_names = TRUE

read BASENAME.db

find_ignore_case = true

/* instance name must be lower case below */

read_saif -input BASENAME.saif -instance tb/dut -unit ns -scale 1 -

verbose

find_ignore_case = false

/* this one to report overall power */

report_power > BASENAME.base_pwr

/* this one to report power and switching activity per net */

report_power -flat -net -nosplit > BASENAME.node_pwr

exit

117
A.4 Testbench

-- Test Bench for BASENAME.vhd

use STD.textio.all;

library IEEE;

use IEEE.std_logic_1164.all;

use IEEE.std_logic_arith.all;

use IEEE.std_logic_textio.all;

use IEEE.std_logic_unsigned.all;

entity TB is

generic( ow : integer := OUTWIDTH;

iw : integer := INWIDTH);

end TB;

architecture stimulus of TB is

component BASENAME is

port( out_a : out std_logic_vector( ow-1 downto 0 );

in_i1 : in std_logic_vector( iw-1 downto 0 );

in_q1 : in std_logic_vector( iw-1 downto 0 );

load : in std_logic;

calc : in std_logic;

clear : in std_logic;

CLK : in std_logic );

end component;

signal oa : std_logic_vector(ow-1 downto 0);

signal oa_dum: std_logic_vector(ow-1 downto 0);

signal ii1: std_logic_vector(iw-1 downto 0);

signal iq1: std_logic_vector(iw-1 downto 0);

118
signal ld: std_logic;

signal ca: std_logic;

signal cl: std_logic;

signal clk: std_ulogic := '0';

file in_vec: text open read_mode is "./FREQ_EST_7/BASENAME.vec";

signal start : boolean := false;

begin

-- Clock process

clock: process

begin

-- define 1 MHz clock

wait for 500 ns;

clk <= not clk;

end process;

-- Device Under Test

dut : BASENAME

port map (oa, ii1, iq1, ld, ca, cl, clk);

-- Handle vector files

read_in : process (clk)

variable buf : line;

variable int : integer;

begin

if (clk' event and clk = '0') then

-- Read input vectors

readline(in_vec, buf);

read(buf, int);

ii1 <= conv_std_logic_vector(int, iw);

read(buf, int);

iq1 <= conv_std_logic_vector(int, iw);

119
read(buf, int);

ld <= conv_std_logic_vector(int, 1)(0);

read(buf, int);

ca <= conv_std_logic_vector(int, 1)(0);

read(buf, int);

cl <= conv_std_logic_vector(int, 1)(0);

assert not endfile(in_vec)

report "simulation done!"

severity FAILURE;

end if;

end process read_in;

end stimulus;

configuration cfg of TB is

for stimulus

end for;

end cfg;

A.5 Simulate Script

# execute with vsim -c -do simulate.do

vlib work

vsim +notimingchecks -t ps -c -foreign "dpfli_init

/tools/synopsys/2000.11/auxx/syn/power/dpfli/lib-sparcOS5/dpfli.so" TB

read_rtl_saif BASENAME.fwd_saif TB/DUT

set_net_monitoring_policy on TB/DUT

set_toggle_region TB/DUT

# times in ps

run 70000000

120
toggle_start

run 130000000

toggle_stop

toggle_report BASENAME.saif 1e-9 TB/DUT

exit

A.6 Synthesis Script

/****************************************/

/* Run all the synthesisys steps */

/****************************************/

/* a combination of all the scripts steps used as of 12/5/03 */

/********************/

/* setup.dc */

/********************/

sh date

/* Turn off the following error messages

(EQN-10) Warning: Defining new variable

(LINT-30) Warning: ##stuff that's fixed by

set_fix_multiple_net_ports###

(OPT-170) Information: Changed wire load model...

(OPT-171) Information: Changed minimum wire load model...

(OPT-200) Resolving this conflict by ignoring the

user_function_class attribute for this library cell.

*/

suppress_errors = suppress_errors + { EQN-10 LINT-30 OPT-170 OPT-171

OPT-200}

designer = "Josie Ammer"

company = "Berkeley Wireless Research Center"

view_background = "black";

121
common_lib_dir = "/tools/picoradio/PN3/hw/v1/syn/lib/"

common_script_dir = "/tools/picoradio/PN3/hw/v1/syn/lib/SCRIPTS/"

search_path = ". src SCRIPTS " + common_script_dir;

include common_script_dir + "synopsys_unicad_dc.setup"

include common_script_dir + "lib_load.dc"

define_design_lib work -path work

mydesign = BASENAME;

mc_designs = { "BASENAME" };

verilog_dest = mydesign + ".phys_v";

db_dest = mydesign + ".db";

flatten_design = 0;

opt_map_effort = "medium";

opt_area_effort = "medium";

opt_verify_effort = "none"

hdlin_enable_rtldrc_info = true;

hdlin_auto_save_templates = true;

timing_self_loops_no_skew = true;

vhdlout_use_packages = {"IEEE.std_logic_1164",

"CORE9GPLL.CORE9GPLL_COMPONENTS", "CORX9GPLL.CORX9GPLL_COMPONENTS"};

/************************/

/* readfiles.dc */

/************************/

analyze -format vhdl common_lib_dir + "pulls.vhd"

foreach ( mc_file, mc_designs ) {

read -format vhdl -library work mc_file + ".rst_vhd"

uniquify

elaborate -library work mydesign

current_design mydesign

122
link

write -hierarchy -output "postread.db"

/************************/

/* constrain.dc */

/************************/

include common_script_dir + "lib_minmax.dc"

auto_wire_load_selection = true

high_fanout_net_threshold = 0

compile_auto_ungroup_hierarchy = 1;

set_fix_multiple_port_nets -all -buffer_constants

set_max_fanout 20 all_inputs() - test_se

set_max_area 0.0

clk_name = {CLK}

create_clock -name clk_name -period 2000 -waveform {0 1000} {clk_name}

set_clock_skew -uncertainty 0.1 all_clocks()

set_drive 0 clk_name

set_auto_disable_drc_nets -none

/********************************/

/* boundary_initial.con */

/********************************/

max_transition_time_io = 0.1;

max_transition_time_internal = 0.2;

input_drive_cell = wc_lib_path + ":" + lib_name + "/IVLL/Z";

output_load_cell = wc_lib_path + ":" + lib_name + "/IVLL/A";

output_load_fanout = 4;

set_driving_cell -lib_cell IVLL -library wc_lib_path + ":" + lib_name -

pin Z all_inputs() - CLK - test_se

set_load {load_of(output_load_cell) * output_load_fanout} all_outputs()

123
set_max_transition max_transition_time_internal mydesign

set_max_transition max_transition_time_io all_inputs()

set_max_transition max_transition_time_io all_outputs()

/***********************/

/* optimize.dc */

/***********************/

uniquify

current_design mydesign

if( flatten_design ) {

ungroup -all -flatten

if( opt_verify_effort != "none" ) {

compile -map_effort opt_map_effort -area_effort opt_area_effort -

verify -verify_effort opt_verify_effort -auto_ungroup area

} else {

compile -map_effort opt_map_effort -area_effort opt_area_effort -

auto_ungroup area

write -hierarchy -output "postcompile1.db"

/********************************************/

/* Boundary optimization */

/********************************************/

simplify_constants -boundary

compile_delete_unloaded_sequential_cells = "true"

compile -inc -boundary

remove_unconnected_ports -blast_bus find(cell, "*", -hier)

write -hierarchy -output "postcompile2.db"

/********************************************/

/* Fix contamination delay (min path) */

124
/********************************************/

remove_attribute find( lib_cell, {wc_libx_path + ":" + libx_name +

"/DLY*"}) dont_use

set_dont_use find( lib_cell, {wc_libx_path + ":" + libx_name +

"/*X05"})

set_fix_hold all_clocks()

compile -incremental

write -hierarchy -output "postfixhold.db"

/********************************************/

/* Check the design */

/********************************************/

check_design

check_timing

/*************************/

/* writefiles.dc */

/*************************/

include common_script_dir + "change_names.dc"

write current_design -hier -format db -output postchangename.db

write current_design -hier -format verilog -output "temp.v"

write_sdc mydesign + ".sdc"

write_script -hier -format dcsh -output mydesign + ".dc"

/* Fixes extraneous "assign" statements and also renames records */

remove_design -design

read -format verilog temp.v

current_design mydesign

/* DON'T DO A COMPILE HERE WITHOUT PROPER CONSTRAINTS!! */

write current_design -hier -format db -output db_dest

write -format verilog -hier -o verilog_dest

/********************************************/

125
/* Write reports */

/********************************************/

report_area >report.final.area;

report_timing -delay max >report.final.timing;

report_timing -delay min >>report.final.timing;

report_test -configuration >report.final.test_cfg

check_test > report.final.test

report_test -scan_path -register > report.final.scanchains

report_constraint -all_violators >report.final.violations

report_design >report.final.design

exit;

126
References

[AMM] M. Josie Ammer, Michael Sheets, Tufan C. Karalar, Mika Kuulusa, Jan Rabaey, "A

Low-Energy Chip-Set for Wireless Intercom," Proceedings of the Design Automation

Conference (DAC), Anaheim, CA, June 2-6, 2003.

[CAM] Kevin Camera, "SF2VHD: A Stateflow to VHDL Translator," Masters Thesis,

Department of Electrical Engineering, University of California, Berkeley. May 2001

[CHA] Glenn Chang, et. al., “A Direct-Conversion Single-Chip Radio-Modem for Bluetooth”,

Proceedings of the International Solid-State Circuits Conference, San Francisco, CA,

USA. February, 2002.

[DEN] P. Dent, G. E. Bottomley and T. Croft, “Jakes Fading Model Revisited”, IEEE

Electronics Letters, 24th June 1993 Vol. 29 No. 13.

[HAI] Saleem Haider, “Datapath Synthesis and Optimization for High-Performance ASICs”

http://www.synopsys.com/products/datapath/datapath_bgr.html. October, 1997.

Referenced 10/4/04.

[HUS] Paul James Husted, "Design and Implementation of Digital Timing Recovery and

Carrier Synchronization for High Speed Wireless Communication," Masters Thesis,

Department of Electrical Engineering, University of California, Berkeley. May 2000.

[ISO] “Information Technology – Open Systems Interconnection – Basic Reference Model,”

International Organization for Standardization (ISO). Standard number ISO/IEC 7498-

1:1994, 1994.

127
[ITU] International Telecommunication Union. Recommendation ITU-R P.1238-2.

“Propagation data and prediction methods for the planning of indoor radio

communication systems and radio local area networks in the frequency range 900 MHz

to 100 GHz.”

[JTC] Joint Technical Committee of Committee T1 R1P14 and TIA TR46.3.3 / TR45.4.4 on

Wireless Access, Final Report on RF Channel Characterization, Paper No.

JTC(AIR)/94.0.1.17-238R4, Jan. 17, 1994.

[KOK] Masaru Kokubo, et. al., “A 2.4GHz RF Transceiver with Digital Channel Selection

Filter for Bluetooth,” Proceedings of the International Solid-State Circuits Conference,

San Francisco, CA, USA. February, 2002.

[LIN] En-Yi A. Lin, Jan M. Rabaey, and Adam Wolisz, ‘Power-Efficient Rendez-vous

Schemes for Dense Wireless Sensor Networks’, Proceedings of the International

Conference on Communications, Paris, France, 2004.

[MAT] Simulink and Stateflow, from the MathWorks, Inc., see http://www.mathworks.com

[MEY] H. Meyr, M. Moeneclaey, and S. A. Fechtel, Digital Communication Receivers:

Synchronization, Channel Estimation and Signal Processing, Wiley Press, 1998.

[OTI] B.P. Otis, Y.H. Chee, R. Lu, N.M. Pletcher, and J.M. Rabaey, ‘An Ultra-Low Power

MEMS-Based Two-Channel Transceiver for Wireless Sensor Networks’, Proceedings

of the Symposium on VLSI Circuits, Honolulu, Hawaii, 2004.

[PRO] John G. Proakis, Digital Communications, McGraw Hill Press, 1995.

[RAB] Jan M. Rabaey, Anantha Chandrakasan, Borivoje Nikolic, Digital Integrated Circuits, a

Design Perspective, Second Edition. Pearson Education, Inc. 2003.

[RAB2] J. Rabaey, J. Ammer, T. Karalar, S. Li, B. Otis, M. Sheets, T. Tuan, "PicoRadios for

Wireless Sensor Networks: The Next Challenge in Ultra-Low-Power Design"

Proceedings of the International Solid-State Circuits Conference, San Francisco, CA,

February 3-7, 2002.

128
[SCO] Michael D. Scott, Bernhard E. Boser, and Kristofer S. J. Pister, “An Ultra-Low Power

ADC for Distributed Sensor Networks,” Proceedings of the European Solid-State

Circuits Conference. Florence, Italy. September 2002.

[SHE] M. Sheets, B. Otis, F. Burghardt, J. Ammer, T. Karalar, P. Monat, and J. Rabaey, “A

5.8x3.3 cm^2 Self-contained Energy-scavenging Wireless Sensor Network Node,” The

Proceedings of the Wireless Personal Multimedia Communications Conference, Abano

Terme, Italy. Sept. 12-15, 2004.

[SHI] Changchun Shi, "Floating-point to Fixed-point Conversion," Ph.D. dissertation, UC

Berkeley, Department of EECS, Berkeley, CA 2004.

[STE] Stephens, Phase-Locked Loops for Wireless Communications, Second Edition.

[SYN] Module Compiler, from Synopsys, Inc., see http://www.synopsys.com

[TAV] G. Tavares, L. Tavares, and M. Piedade, “Improved Cramer-Rao Lower Bounds for

Phase and Frequency Estimation With M-PSK Signals”, IEEE Transactions on

Communications, Vol. 49, No. 12, December 2001.

[THO] John Thomson, et. al., “An Integrated 802.11a Baseband and MAC Processor,”

Proceedings of the International Solid-State Circuits Conference, San Francisco, CA,

USA. February, 2002.

[TUR] K. Turkowski, “Fixed-Point Trigonometry with CORDIC Iterations”. Apple Computer

White Paper, January 17, 1990.

[VO] Nguyen Doan Vo and Tho Le-Ngoc, “Optimal Interpolators for Flexible Digital

Receivers”, Proceedings of the IEEE Vehicular Technology Conference, Orlando, FL,

USA, October 2003.

[VO2] Nguyen Doan Vo and Tho Le-Ngoc, “Low-Complexity Optimal Symmetric

Interpolation Filters for SDR Receivers”, Proceedings of the IEEE Canadian

Conference on Electrical and Computer Engineering. Montreal, Quebec, Canada, May

2003.

129
[WAL] Robert H. Walden, “Analog to Digital Converter Survey and Analysis”, IEEE Journal

on Selected Areas in Communications, Vol. 17, No. 4, April 1999.

[YEE] D.G.-W. Yee, “A design methodology for highly-integrated low-power receivers for

wireless communications,” Ph.D. dissertation, UC Berkeley, Department of EECS,

Berkeley, CA 2001.

130

Das könnte Ihnen auch gefallen