Sie sind auf Seite 1von 9

IET Circuits, Devices & Systems

Research Article

Hardware implementation and VLSI design of ISSN 1751-858X


Received on 14th July 2017
Revised 26th January 2018
spectrum sensor for next-generation LTE-A Accepted on 19th February 2018
doi: 10.1049/iet-cds.2017.0292
cognitive-radio wireless network www.ietdl.org

Mahesh S. Murty1, Rahul Shrestha2


1Center for VLSI and Embedded System Technologies (CVEST), International Institute of Information Technology Hyderabad, India
2School of Computing and Electrical Engineering (SCEE), Indian Institute of Technology Mandi, India
E-mail: rahul_shrestha@iitmandi.ac.in

Abstract: This paper presents reconfigurable and hardware-efficient VLSI architecture of time domain cyclostationary-feature
detector (TCD) for spectrum sensing in the cognitive-radio wireless network. It incorporates new architecture for autocorrelator
that supports the entire range of subcarriers used by orthogonal frequency division multiplexing signals compliant to 4G LTE-
Advanced wireless network. A novel scheme of overflow/underflow protection is proposed for the coordinate rotation digital
computer engine of TCD. Additionally, hardware-efficient techniques have been introduced for the multiply-&-accumulate and
accumulator architectures of suggested TCD design. Real-world signals are captured using universal software radio peripheral
devices and are fed to its FPGA prototype. An application specific integrated circuit synthesis and post-layout simulation of the
proposed detector have been performed using 65 nm-CMOS technology and it occupies 0.32 mm2 of core area and consumes
total power of 18.5 mW at 100 MHz clock frequency. In comparison with the state-of-the-art works, the proposed detector
requires 34 and 93% lesser hardware resource and memory, respectively

1 Introduction compliant to standard wireless networks [12]. However, they


involve complex mathematical computations and are intricate to
Cognitive radio (CR) is an innovative technology with an implement. Energy detection based spectrum sensing requires
unprecedented level of intelligence that enhances the system simple architecture [13, 14], albeit fails to deliver adequate
capacity and spectrum agility of wireless networks [1]. Current performance in lower SNR regime which is essential for wireless
spectrum allocation policies of frequency band licensing to the network applications [15]. A recent survey on the implementations
primary user (PU) has made spectrum one of the scarcest resource and performance comparison of these detectors for spectrum
in the wireless network. Albeit the allocated spectrums are sensing has been carried out by Kosunen et al. [11]. They showed
underutilised, as it remains occupied even when the PU is inactive. that the very large-scale integration (VLSI) architecture of spatial-
CR device is capable of sensing the spectrum to determine whether sign cyclic-correlation detector for spectrum sensing [9] consumes
the bands are free or occupied in order to opportunistically the least hardware. However, it showed performance degradation
communicate and maximise the spectrum utilisation. CR of 3 dB in comparison with the cyclostationary feature detectors
technology can be perfect complement to the carrier aggregation [11]. Such performance loss has an adverse effect when the
feature incorporated in LTE-Advanced (LTE-A) fourth-generation detectors are designed for next-generation wireless network where
(4G) wireless network [2]. Carrier aggregation has been conceived the received signals undergo temporal as well as spatial variations
for the surge of spectrum utilisation and eventually to enhance the due to various inherent propagation mechanisms [16]. Thereby,
data rate. CR can be recommended as the potential technology to cyclostationary feature detector is suitable for such scenario and
extend the operation of LTE-A over TV white space where there is this paper focuses on the design of hardware efficient as well as
the availability of frequency spectrum, due to the replacement of reconfigurable time-domain cyclostationary-feature detector (TCD)
analogue by digital TV. In such operating environment, CR creates with adequate performance. In the recent time, field programmable
the non-interfering existence of LTE-A, as secondary user (SU), gate array (FPGA) implementations of cyclostationary feature
considering digital video broadcasting-terrestrial (DVB-T) for detectors have been reported in the literature [11, 17–19].
digital TV as PU. Additionally, integration of CR features with the Moreover, there is scope to further optimise the internal blocks of
future 5G cellular network is feasible, as it is applicable in various TCD architecture to enhance the hardware efficiency and make it
scenarios: device-to-device communication without interfering the compliant to the standard wireless network. Additionally, this
PU; effortless transition from 4G to 5G systems; enabling the paper renders application specific integrated circuit (ASIC)
higher data-rate communication via secondary link for load synthesis and post-layout simulation of the optimised TCD
balancing; and leveraging the relaxed latency as well as data rate of architecture. Our contributions are as follows:
mobile-type communication [3].
Spectrum sensing is a significant signal-processing unit for CR (i) Reconfigurable VLSI architecture for the autocorrelator module
devices and it is responsible for the detection of PUs in licensed of TCD has been proposed to support the entire range of
frequency band [4, 5]. Each CR node is a SU that senses the orthogonal frequency division multiplexing (OFDM) subcarriers
frequency band, and checks its occupancy by PUs, for the possible compliant to the LTE-A wireless network. Additionally, we have
communication. Such SU selects the channel, which imposes suggested overflow/underflow protection technique for the
minimum interference to PU, in order to communicate with other coordinate rotation digital computer (CORDIC) architecture of
SUs in the wireless network. There are several algorithms TCD.
formulated for spectrum sensing [6–9]. Among these,
(ii) New hardware-efficient techniques as well as architectures for
autocorrelation and cyclostationary features based spectrum
multiply-&-accumulate (MAC) and accumulator modules of TCD
sensing delivers the best performance in lower signal-to-noise ratio
are presented. Finally, an optimised VLSI architecture for TCD has
(SNR) region [10, 11]. Additionally, they have good signal
been conceived by aggregating the suggested hardware-efficient
classification ability and fulfils the detection requirements

IET Circuits Devices Syst. 1


© The Institution of Engineering and Technology 2018
Figs. 1a and b. A signal x[n] is said to be cyclostationary provided
the time-varying expectation of its autocorrelation

E x[n] × x∗[n − τ] (1)

is periodic. Periodicity in the autocorrelation of OFDM frames


is due to cyclic prefix, which is Ncp number of samples appended
along the starting of ND information data samples in an OFDM
frame. Therefore, the lag parameter τ of the autocorrelation has a
value of τ = ND. The frequency of such periodic autocorrelated
signal is termed as cyclic frequency α and its value is α = 1/NsHz,
as shown schematically in Fig. 1c, where Ns is the total number of
samples in an OFDM frame. Thereby, algorithms for
cyclostationary feature detection for spectrum sensing are based on
the periodicity of such statistical properties like autocorrelation of
cyclostationary process [20]. As discussed earlier, there are various
Fig. 1  Autocorrelation of received OFDM signals for spectrum sensing algorithms to test the cyclosationarity of signal [9, 11, 17, 21] and
(a) OFDM frame with modulated data stream and cyclic prefix, (b) Equivalent our work deals with the time-domain cyclostationary-feature
representation of an OFDM frame where CP(i) and Data(i) represents cyclic prefix and detection, as it is implementation friendly and delivers adequate
modulated data stream (excluding the data stream copied as cyclic prefix) of ith frame. performance [11].
It also represents the standard time length of OFDM frame as well as cyclic prefix
compliant to LTE-A wireless network, (c) Autocorrelation process carried out at the
2.2 Time-domain cyclostationary-feature detection
receiver side for spectrum sensing
Equivalent Fourier-series expansion of the periodic autocorrelated
blocks with the simplified version of test-statistics calculation OFDM signal is represented as
module.
N−1
(iii) Performance of the proposed TCD has been evaluated using 1 2π ⋅ α ⋅ n
N n∑
^
OFDM-based communication system in additive white Gaussian ζ(τ, α) = x[n] × x∗[n − τ] × exp − j . (2)
=0
N
noise (AWGN) channel environment. The suggested architecture is
FPGA implemented and its resource utilisation is compared with ^
the reported works. Here, ζ(τ, α) is cyclic autocorrelation function and N is the total
(iv) The hardware prototype of TCD has been tested for its number of input samples. This paper deals with single cyclic-
functionality using the real-world radio frequency (RF) signals frequency autocorrelation signal, albeit it can be multi-cyclic in
captured from universal software radio peripheral (USRP) devices nature [22]. Neymann–Pearson hypothesis is used for testing the
and the timing is verified using logic analyser. Eventually, ASIC presence of cyclostationary signal [23]. Thereafter, the test statistic
^
synthesis and post-layout simulation of this design are carried out T ς is computed as [11]
using united-microelectronics-corporation (UMC) 65 nm
complementary metal–oxide–semiconductor (CMOS) technology ^ T
T ς = N × p^ × ψ −1 × p^ (3)
for area, power and timing analyses.
T
The organisation of this paper is as follows. Section 2 presents a where p^ is the transpose of p^ expressed as
brief explanation of the time-domain-based cyclostationary-feature ^ ^
detection algorithm for spectrum sensing and the system-level p^ = ℜ ζ(τ, α) ℑ ζ(τ, α) (4)
overview of conventional TCD (CTCD). The proposed architecture
of reconfigurable autocorrelator module and the overflow/ = Xt(α) Y t(α) . (5)
underflow protection technique for CORDIC module have been
included in Section 3. In Section 4, hardware-efficient techniques
Similarly, ψ −1 is the inverse of covariance matrix ψ and is given by
and architectures for MAC and accumulator modules along with
the overall TCD architecture are presented. Subsequently,
1 D −B
performance analysis, FPGA implementation and verification of ψ −1 = . (6)
the proposed design are carried out in Section 5. Additionally, this AD − BC −C A
section includes ASIC synthesis cum post-layout simulation and
comparison with the reported works. Finally, this paper concludes From (2), it is possible to represent
in Section 6.
2π ⋅ α ⋅ n
χ(τ, α) = x[n] × x∗[n − τ] × exp − j (7)
N
2 Preliminaries
This section provides a brief overview of cyclostationarity in which is a complex value
OFDM signals and a short background of spectrum sensing
algorithm based on cyclostationary feature detection. The utility of χ(τ, α) = xt[n, α] + j ⋅ yt[n, α] . (8)
CORDIC algorithm in spectrum sensing process and the CTCD
architecture are subsequently discussed here. Therefore, the mathematical expressions for matrix elements A, B,
C and D are
2.1 Cyclostationarity in OFDM signals
N−1
1
N n∑
^
OFDM is a special form of multi-carrier modulation that uses A = E Xt2(α) = xt2[n, α], (9)
closely-spaced orthogonal subcarriers for communication where =0
each of them is individually modulated using quadrature amplitude
modulation (QAM) or phase shift keying (PSK) modulation N−1
1
N n∑
^
technique [16]. Single OFDM frame comprises of a cyclic prefix, B = C = E Xt(α) × Y t(α) = xt[n, α] × yt[n, α], (10)
which is the portion of QAM or PSK modulated data stream, =0
followed by the modulated data stream (samples), as shown in

2 IET Circuits Devices Syst.


© The Institution of Engineering and Technology 2018
autocorrelation of the input signal. The CORDIC algorithm has
been incorporated to calculate exp − j((2π ⋅ α ⋅ n)/N) from (2).
Euler's formula expresses this exponent as

e jx = cos x + j ⋅ sin x (14)

and hence, CORDIC algorithm computes the cosine (cos) and


sine (sin) values of the input angle [11] in signed-two's
complement format. Further, frequency shift is performed by
feeding the outputs of autocorrelation module to the CORDIC
module, as shown in Fig. 2. Subsequently, MAC and accumulator
modules compute the values of A, B or C, D, and Xt(α) and Y t(α)
from (9), (10), (11) and (5) respectively. Eventually, the test
statistic from (12) is determined by test-statistic calculator module
^
and is compared with the predefined threshold T δ to obtain the
final TCD output.

3 Proposed architectures and techniques for


reconfigurability and accuracy
In this section, we propose the reconfigurable architecture for
autocorrelator module and the overflow/underflow protection
technique for CORDIC module of new TCD architecture.

Fig. 2  System-level overview for conventional architecture of TCD for 3.1 Reconfigurable architecture
spectrum sensing integrated with analogue RF section and antenna
The contemporary LTE-A wireless network uses OFDM for
N−1 communication where the PUs are allocated bandwidth based on
1
N n∑
^
D = E Y t2(α) = yt2[n, α] . (11) their demand [24]. Thereby, such allocation decides the number of
=0 subcarriers required to generate an OFDM frame for transmission.
In the recent trend, diverse PUs of the wireless network have
The expression for test statistic computation using the above different bandwidth requirements [2]. Thereby, various sizes of
equations can be formulated as [21] OFDM frames which are constructed with varying subcarriers are
used for communication. For such wireless network, the number of
^ Xt(α)2 × D + Y t(α)2 × A − 2 × Xt(α) × Y t(α) × B subcarriers allocated to construct different sizes of OFDM frames
Tς = N . (12) varies in the powers of 2, ranging from 64 to 1024. Therefore, the
A × D − B2 transceiver design for SU with CR capability becomes challenging
in such environment. The rationale behind this is the CTCD from
Under the null hypothesis, test statistic is χ22 distributed and Fig. 2, which is responsible for the cognitive activity of SU defers
therefore, the decision threshold is obtained from the inverse of to detect the PUs with varying OFDM frames. Hence, it is
cumulative distribution function of χ22 distribution. Consecutively, necessary to conceive TCD architecture with the capability of
^
the value of decision threshold T δ used for the test is given by detecting OFDM frames with variable number of subcarriers.
Therefore, we propose an autocorrelator architecture to incorporate
^ reconfigurability in TCD for detecting OFDM frames constructed
T δ = F −1
χ (2/2) 1 − Pη (13) using 64, 128, 256, 512 and 1024 subcarriers. These values are
compliant to LTE-A 4G wireless network [2, 24]. Additionally, the
which is pre-computed and fixed. In order to fix this decision suggested autocorrelator has been designed to switch among
threshold, it is necessary to have a trade-off between the different configurations on the fly. For the reconfigurable
probability of detection Pd and the probability of false alarm Pη . capability, lag factor τ of the autocorrelator must be configurable at
Thereby, an appropriate value of Pη is selected for the specific ^
runtime. Conjugate autocorrelation λ(τ) of the received signal x[n]
^
application. Eventually, the test statistic result T ς is compared with is given by
^
the threshold value T δ to get the final result from TCD. ^
λ(τ) = x[n] × x∗[n − τ] (15)
2.3 System overview and TCD architecture
from (2). It is to be noted that the autocorrelator has been designed
^
Overall system that encompasses the TCD has been depicted in to compute the value of λ(τ). To ensure the periodicity in
Fig. 2 where antenna at the receiver transforms electromagnetic autocorrelation of OFDM signal, lag factor τ must take the values
waves to RF signal. Band select filter passes the desired frequency of ±ND. However, we considered only the positive value of τ
signals and such feeble signal is boosted using low noise amplifier,
because the autocorrelation signal has only single cyclic frequency
as shown in Fig. 2. First mixer M1 then translates RF to
α = 1/Ns. Hence, the value of τ has been fixed at τ = ND for a
intermediate frequency (IF) signal using fixed-frequency local-
oscillator O1. Subsequently, IF signal is low-pass filtered to reject particular OFDM frame, as shown schematically in Fig. 1.
Therefore, the proposed architecture of reconfigurable
high-frequency components and passed along the second stage of
autocorrelator has been presented in Fig. 3 where the ‘delay
mixers MI and MQ. Tunable and channel-select frequency
memory’ module from the CTCD has been replaced with the series
synthesiser O2 translates low-pass IF signals to baseband signals of 1024 registers, each of 20 bit, and a multiplexer. Outputs of
and the alternate channel energy is removed by channel select 64th, 128th, 256th, 512th and 1024th registers are tapped and fed
filter. Thereafter, the inphase (I) and quadrature (Q) signal to multiplexer. Depending on the select line sel value, lag factor of
components are sampled and converted into digital signal using 10- this autocorrelator can be configured for τ = 64, 128, 256, 512 and
bit analogue-to-digital converters (ADCs), as shown in Fig. 2. 1024 for different OFDM frames compliant to the LTE-A wireless
These 10-bit I and Q samples are fed to the CTCD for spectrum network. Then the delayed signal x[n−τ ] is conjugated and
sensing [11]. Specifically, such input samples are fed to the
autocorrelation module which computes the conjugate

IET Circuits Devices Syst. 3


© The Institution of Engineering and Technology 2018
complex multiplied with the received OFDM signal x[n] to 4 Proposed hardware-efficient techniques and
^
generate the final output λ(τ), as shown in Fig. 3. architectures
Multiplication of any two n-bit numbers generates a 2n bit result
3.2 Overflow/underflow protection technique and architecture that is truncated by dividing with a scale factor to maintain
for the CORDIC module constant n-bit length in the VLSI architecture [28, 29]. Unlike, the
CORDIC algorithm has a significant role in the cyclostationary architectures with full precision arithmetic do not scale down the
feature detection for spectrum sensing, as discussed earlier in multiplier result. Thus, implementation of such architecture
Section 2. Here, we present new VLSI architecture for CORDIC to requires higher bit-width multipliers consuming larger chip area
enhance the robustness of the overall spectrum-sensing process. and power. This section presents hardware-efficient techniques and
Conventional CORDIC algorithm to compute sin and cos of any architectures to alleviate the multiplier sizes of MAC as well as
arbitrary angle is well presented in the literature [25–27]. Input fed accumulator modules, and replace the division of test-statistic
to this algorithm is an angle with a bit width of 32 bit and is calculation module with simpler design in the suggested TCD
represented in two's complement format. It computes sin and cos of architecture.
this angle and generates them in 10-bit two's-complement format.
This algorithm is implementation friendly, as it requires only 4.1 MAC and accumulator architectures
addition, shifting and a 10 × 32-bit look-up table (LUT) to compute
The MAC module of CTCD architecture reported in [11] requires
the arctangent value and is valid for input angles which lie between
21 × 21-bit multiplier and 64-bit adder to generate the output.
−90∘ and +90∘. However for the angles outside this range, the input
Similarly, the autocorrelation computation of this architecture uses
angle must be pre-rotated and the sin as well as cos computations
20 × 20-bit complex multiplier which indeed generates a 42-bit
run for ten iterations. In order to avoid the computation of
complex result (21 bit each for real and imaginary parts) [11]. In
arctan 2− i at run time, this value is calculated prior and stored in the proposed autocorrelator architecture, output of complex
16 × 32-bit LUT. Fig. 4a shows the ten-stage pipelined-CORDIC multiplier has been scaled down to 20 bit which comprises of real
architecture which is incorporated in the proposed TCD and imaginary parts of 10 bit each represented in two's complement
architecture. Each pipeline stage has been implemented for an format, as shown in Fig. 3. Output from the complex multiplier that
iteration to be performed; as a result, it requires ten stages. The multiplies the autocorrelator and CORDIC outputs (which will be
internal architecture of each pipeline stage is shown in Fig. 4b discussed in the next subsection) is also scaled down to 20 bit.
which is designed with six 10-bit adders, two right shifters and Thereby, the input bit width of suggested MAC architecture is 10 
three 2:1 10-bit multiplexers. bit, as shown in Fig. 5a. It shows that the two inputs a and b are fed
If overflow/underflow occurs while additions are being to a 10 × 10-bit multiplier to generate 20-bit result. This value has
performed in any of the intermediate pipelined stages, as shown in been scaled up by a factor which varies for different subcarriers
Fig. 4b, then the cumulative error of these pipelined stages would compliant to LTE-A wireless network and its detail discussion is
drastically corrupt the end result and degrades the performance of presented in the next subsection. The higher order 10 bit of the
TCD. Therefore, we suggest an architecture to mitigate this shifter output are tapped and fed to 20-bit adder which delivers
problem shown in Fig. 4c. In order to perform addition of two N- final output of the same bit width, as shown in Fig. 5a. Systematic
bit numbers, this adder initially sign extends both the operands process of the proposed hardware-efficient technique for MAC
using the right shifter and then performs N + 1 bit addition. computation is presented in Algorithm 1 and its corresponding
Thereafter, the detection of overflow or underflow is accomplished architecture is shown in Fig. 5a. In general, there is a need of one
with the aid of two consecutive most-significant-bits (msbs) of the N × N-bit multiplier and one 2N-bit adder for N-bit MAC
output from N + 1 bit addition, as shown in Fig. 4c. Subsequently, architecture, albeit the proposed MAC requires only N-bit adder
if there is no overflow or underflow then this value is routed along with N × N-bit multiplier. On the other side, the proposed
unaltered via multiplexers to the output. On the other hand, architecture for accumulator module is shown in Fig. 5b and has
maximum NMax or minimum NMin possible value is transferred been designed with two 20-bit adders and two shifters, one each for
to the output whenever there is overflow or underflow, the real and imaginary part. There are two 42-bit adders used in the
respectively, at the output of N + 1 bit addition. In order to measure accumulator architecture of the CTCD [11]. Steps of the suggested
the accuracy of suggested CORDIC architecture, Table 1 shows the technique for accumulator are presented in Algorithm 2. It states
comparisons of sin and cos floating-point values of different angles that the accumulator input must be scaled by shifting, as discussed
with the fixed-point values (for N = 10 bit) computed by hardware earlier for MAC architecture, and fed to 20-bit adder. It presents
architecture of CORDIC with overflow/underflow protection input value and previously accumulated values to generate 20-bit
technique and without it. The error percentages Ecv and E pv are accumulator output, as shown in Fig. 5b. In this work, we have not
computed as utilised the full precision outputs of the multipliers and their results
are scaled down. This introduces quantisation noise and thereby,
F pv − Fxc F pv − Fxp there is a need for scaling up the multiplier output of the MAC
Ecv = × 100% and E pv = × 100%, (16) module and both the inputs of accumulator module to tackle this
F pv F pv
issue, which is discussed in the next subsection.
respectively, where F pv represents the floating point value for  
Algorithm 1: Proposed technique for MAC
the sin or cos of angle. Similarly, Fxc and Fxp represent the fixed
point values for sin as well as cos of angle without and with the 1: procedure MAC a, b
overflow/underflow protection technique, respectively. On an
2:   mulRes ← a × b
average, the proposed CORDIC architecture delivers 1.05 and
0.35% errors for the computation of sin and cos angles, 3:   scalMulRes ← ShiftLeft(mulRes, scalFactor)
respectively. On the other side, the conventional CORDIC 4:   accRes ← accRes + scalMulRes
architecture results 74.8 and 44.1% of average error while 5:   return accRes
computing sin and cos angles, respectively. Though, the suggested 6: end procedure
technique for CORDIC delivers lesser error albeit it requires more
hardware than the conventional CORDIC, as it is clear from Fig. 4. 4.2 Handling quantisation noise and need for scaling in MAC
To quantify this drawback, both the architectures have been and accumulator modules
implemented in 65 nm-CMOS FPGA platform and their post place-
&-route report has been presented in Table 2. It shows the To tackle the issue of quantisation noise induced by reducing the
percentages of extra hardware required by the proposed CORDIC multiplier precision of our design, its end result which is the test
^
architecture in terms of register, logic and memory. statistic T ς values has been analysed. Fig. 6 shows the absolute
error plot for OFDM signals with ND = 64 subcarriers where the

4 IET Circuits Devices Syst.


© The Institution of Engineering and Technology 2018
^
within the tolerable limits, as the maximum error in T ς value is 0.7.
Specification details of the simulation model are presented in the
next section. Test statistic calculation for the OFDM frames
constructed with 128, 256, 512 and 1024 subcarriers are not under
the tolerable limits and resulted in large errors due to the effect of
quantisation noise. Therefore, the intermediate computation
variables involved in the test statistic calculation have been
transformed from their current domain to 64 subcarrier ND = 64
domains. Mathematically, such transformation can be defined as

ΘT = ϝ(ϑin, N D) (17)

where ϝ() is the transformation function, which maps the values of


intermediate variables ϑin for ND = {128, 256, 512, 1024}
subcarriers to ND = 64 subcarrier domain.
To estimate the implementation-friendly transformation
function ϝ(), values of intermediate variables for various OFDM
frames are analysed for different SNRs in the range of −26 to 0 dB
which represent the worst-case channel conditions [10]. Table 3
shows the outcomes of such analysis which indicates that the ratio
Fig. 3  Proposed reconfigurable VLSI architecture for autocorrelator of MAC or accumulator outputs for OFDM frames with 64
module of TCD compliant with 4G LTE-A wireless network subcarriers to their outputs with rest of the subcarriers maintain
constant values irrespective of varying SNR. Suppose Omac and
Oacc represent the outputs of MAC and accumulator modules,
respectively, then these values are mathematically expressed as
N−1
1
N n∑
Oacc = x[n] × y[n], (18)
=0

N−1
1
N n∑
M × Oacc = M × x[n] × y[n] , (19)
=0

N−1
1
N n∑
Omac = x[n], (20)
=0

N−1
1
N n∑
M × Omac = M × x[n] . (21)
=0

From (18) and (20), it is interesting to note that the input


variables x[n] and y[n] to the MAC and accumulator modules
follow the same trend as the output variables, as they are linear
operations. Hence, if similar analysis is performed with the inputs
and intermediate variables of MAC and accumulator modules then
Fig. 4  Overall VLSI and micro-architectures of the CORDIC module used the same results, as shown in Table 3, will be derived. Thereby, the
in the proposed TCD transformation function ϝ() has been formulated to be simple
(a) VLSI architecture for CORDIC algorithm with ten pipelined stages used in the scaling function. Therefore, Table 4 presents the values of selected
proposed TCD where the angle α of 32-bit data width is fed as an input, (b) Internal scale factors which decide the bit shift value that is to be fed via
architecture of each CORDIC stage used in our design, (c) Proposed N-bit adder multiplexer for different subcarriers in MAC and accumulator
architecture for overflow/underflow protection that is used in each CORDIC stage modules, as shown in Fig. 5. Such optimisation is valid because the
where the values N = 10 bit, NMax = 1 FF Hex and NMin = 200 Hex test statistic calculation is a normalised operation and is
^
independent of the number of input samples used for the detection
error is calculated by comparing the T ς values generated from the as well as the number of subcarriers used for the construction of
simulation and the fixed-point TCD architecture. This error plot the OFDM frames. To verify the transformation function discussed
has been generated for 100 Monte-Carlo simulations of which only above, simulations similar to ND = 64 subcarriers are carried out
the first 50 samples are shown for clear visualisation. It is clear for ND = 1024 subcarriers and the absolute error plot has been
from the plot that the quantisation noise introduced in the system is generated, as shown in Fig. 6. This value of 1024 subcarriers has

Table 1 Comparison of CORDIC architecture accuracies with the proposed overflow/underflow protection technique and
without it for the bit width of N = 10 bit
Θ sin Θ cos Θ
F pv Fxc Fxp Ecv, % E pv, % F pv Fxc Fxp Ecv, % E pv, %
76.320 0.9716 −0.0586 0.9707 106 0.09 0.2365 0.2461 0.2383 3.9 0.7
108.140 0.9503 −0.0996 0.9453 110 0.57 −0.3113 −0.3457 −0.386 11 0.8
120.360 0.8629 −0.2754 0.8632 113 0.03 −0.5054 −0.513 −0.5059 1.5 0.05
172.730 0.1265 0.082 0.1211 35 4.27 −0.9919 −0.0176 −0.9922 101 0.03
190.950 −0.1899 −0.1699 −0.1894 10 0.26 −0.9818 0.0371 −0.9804 103 0.14

IET Circuits Devices Syst. 5


© The Institution of Engineering and Technology 2018
respectively. These values are stored in LUTs and are fed to the
angle generator module via multiplexer, as shown in Fig. 6. Angle
generator behaves like an accumulator to add the present value of
register with the input alpha value to generate an angle required in
the current clock cycle to feed subsequent CORDIC module. It
processes this 32-bit angle to generate 10-bit sin and cos values of
this angle in two's complement fixed-point-format. Then, the
outputs from CORDIC and autocorrelation modules are complex
multiplied to feed three different MAC modules and an
accumulator for computing A, B, D, Xt(α) and Y t(α) from (12). In
this work, division in accumulator and MAC modules from (2), (9),
(10) and (11) is avoided by modifying the test statistic formula as

^ Xt(α)2 × D + Y t(α)2 × A − 2 × Xt(α) × Y t(α)B


Tς = . (22)
A × D − (B)2

The test-statistic calculation module shown in Fig. 6 is a three-


stage pipelined architecture which computes the numerator and
denominator of (22). In the suggested TCD architecture, divider
and threshold comparator of the CTCD discussed in Section 2.3
[11] are replaced by new repeated-subtraction module, as shown in
^
Fig. 6. Such replacement is feasible because the threshold value T δ
computed in (13) corresponds to a fixed value of 4.6052 when the
probability of false-alarm Pη value is 0.1. Therefore, the
denominator is repeatedly subtracted from the numerator four
times and inverse of its sign bit (msb) has been tapped as final
Fig. 5  Proposed hardware-efficient architectures of output ODet of the proposed TCD architecture, as shown in Fig. 6.
(a) MAC module, (b) Accumulator module for the proposed TCD design
5 Experimental results
been chosen to generate the plots because maximum scaling occurs
at this point where the quantisation noise is at its peak value. It can This section presents performance analysis as well as
be observed that the maximum error of test statistic value for 1024 implementation results of the suggested TCD architecture and its
subcarriers is 1.6 which is within the tolerable limits. It should be comparison with the state-of-the-art design.
noted that the error in the test statistic values is within the tolerance
limit as the error does not cross the comparison threshold point 5.1 Performance analysis
^
T δ = 4.6052. Performance analysis of the optimised TCD algorithm for spectrum
sensing has been simulated at different noise levels. The
4.3 Proposed TCD architecture transmission side of this simulation model comprises of OFDM
transmitter which generates quadrature phase shift keying
The final VLSI architecture of TCD which aggregates all the modulated signals using NS different subcarriers. Then, it appends
proposed modules is shown in Fig. 6. In this architecture, detection
has been performed over 4096 samples and the parameters cyclic prefix and transmits this OFDM signal via AWGN channel
compliant to LTE-A OFDM frame like autocorrelation delay as towards the receiver side. Here, the analogue output from antenna
well as scale factors of MAC and accumulator modules can be on- is converted to digital baseband signal of I and Q samples by
the-fly configured using 3-bit select-signal Sel. The complex analogue-RF section followed by ADC and are fed to the TCD
inphase-&-quadrature (IQ) signal XI + j ⋅ Y Q from the receiver algorithm. Such simulation process has been performed for
different values of SNR to observe the effect of noise on detection
analogue-RF front-end has been fed to the autocorrelator where XI performance. During this simulation, TCD algorithm collects M = 
and Y Q are in 10-bit two's-complement fixed-point-format. As a 524288 samples of the received signal and processes them to detect
result, this module generates the first valid output after τ clock the presence of PUs. For every SNR value, Monte-Carlo
cycles. Simultaneously, the CORDIC module process angle simulations for 1000 iterations are performed to obtain the reliable
generator output to generate the first valid output. As discussed estimate of probability of detection Pd . Fig. 7 shows the Pd versus
earlier, the α value is computed as α = 1/NsHz and to compute the SNR plot generated from this simulation for ND = 1024 subcarriers
exponent from (2), the value of 2π × α × n /N is needed and this with the cyclic prefix of 128 (12.5% of 1024). It shows the
angle is in radians. Thereby, it is converted to degree by performance of proposed detector for negative SNRs, which is the
multiplying with 180/π and normalising it by multiplying with worst-case scenario in wireless network environment where the
232 /360 to compute the final angle αnew = α × 232 /N where N =  noise strength dominates the signal strength, because at the positive
4096 which is the total number of samples for processing. To SNRs the detection rate is theoretically 100% which means Pd = 1.
support various OFDM subcarriers, we computed five different The probability of false alarm Pη has been assumed to be 0.05 for
αnew values depending on Ns. Thereby, the normalised fixed-point ^
the computation of detection threshold T δ. It can be observed from
values of five different αnew for 64, 128, 256, 512 and 1024
Fig. 7 that the proposed TCD algorithm delivers equivalent
subcarriers are α1 = 0 × 00004000, α2 = 0 × 00002000, performance to CTCD and it performs 3 dB better than the
α3 = 0 × 00001000, α4 = 0 × 00000800 and α5 = 0 × 00000400, hardware-efficient spatial-sign cyclic correlator with angular

Table 2 Comparison of hardware consumed by the proposed and conventional CORDIC architectures when implemented on
FPGA platform
Conventional architecture Proposed architecture Hardware loss, %
registers 700 752 6.91
logic 2060 2851 27.74
memory 2092 bits 2851 bits 26.62

6 IET Circuits Devices Syst.


© The Institution of Engineering and Technology 2018
placed and routed on 65 nm-CMOS FPGA. The implementation
result that includes resources and power consumed by our design
has been presented in Table 5. Additionally, it includes the
comparison with reported implementations of CTCD, frequency-
domain cyclostationary-feature detector (FCD) and autocorrelation
feature detector (AFD) [11]. It can be observed that the suggested
TCD consumes least hardware and power among all. Specifically,
our design requires lesser registers, logic, memory and power by
23.08, 36.02, 93.75 and 42.09%, respectively, compared to CTCD
[11]. The dynamic power consumed by the suggested architecture
is measured at 20 MHz clock frequency; although, it can be
operated up to 120 MHz of maximum frequency. For the real-
world verification, hardware prototype of our TCD is realised
using Altera DE-I FPGA-board (with Cyclone-II device). The
analogue-RF sections of transmitter and receiver are implemented
using two USRP devices, as shown in Fig. 8d. It also shows the
USRP transmitter that continuously transmits OFDM signal
constructed using ND = 512 subcarriers and cyclic prefix of length
64. The centre frequency of this transmitter is 868 MHz and its
sampling rate is 200 kHz. Additionally, Fig. 8d shows the OFDM
receiver implemented on another USRP device. This receiver is
also tuned at the same centre frequency of 868 MHz. Here, the
received signal has been sampled at the rate of 1 MHz and
decimated by a ratio of 5 to produce 200 kHz of sampling rate.
Further, the decimated signal is fed to 128-tap finite-impulse-
response notch filter to remove the DC components added to the
received signal by amplifiers and other RF circuitry. Subsequently,
these filtered IQ samples are stored in a file on disk. In order to
verify the functionality of TCD hardware prototype, 4096 samples
of real-world received signal captured using a USRP device are
transferred from the file to on-board read-only-memory (ROM) of
the FPGA. The FPGA implemented detector fetches these values
from ROM to process and generate the detection result. The
hardware test setup for this prototype has been shown in Fig. 8a. It
includes the above-mentioned FPGA platform coupled with a
universal-serial-bus (USB) logic analyser whose output can be seen
on the desktop of the personal computer connected to it. A closer
view of Altera DE-I FPGA shows that the on-board switch is used
as enable pin to initiate the detection process, as shown in Fig. 8a.
Additionally, there are two light-emitting diodes (LEDs) used by
Fig. 6  (Upper) Error analysis plots of quantisation noise for the proposed this prototype: a status LED that turns green indicating the
TCD which incorporates the hardware-efficient MAC and accumulator detection process is under progress and a output LED which shows
modules. (Lower) Proposed reconfigurable and hardware-efficient TCD the final result. Logics of these LEDs are buffered using the
architecture compliant to LTE-A wireless network general-purpose input/output pins of FPGA and a USB logic-
analyser probe has been connected to generate output waveform, as
domain computation, as reported in [11]. Additionally, the shown in Fig. 8. Here, three channels represent clock, status
performance analyses have been carried out for longer detection (BusyPin) and detector-output (DetOut) signals. Once the detection
times using 2M and 4M samples which outperform the one with M begins, BusyPin outputs logic high and resets only after the
samples by 1.5 and 3 dB, respectively, as shown in Fig. 7. detection is complete. After this period, the logic state of DetOut
switches high or low indicating the presence or absence of PU,
5.2 Hardware implementation and verification respectively. In this work, detection has been performed for 5.12 
ms (denoted by logic high of BusyPin signal) at 1 MHz clock
The proposed TCD architecture has been described in register frequency, as shown in logic analyser output waveform from
transfer level (RTL) level using Verilog HDL which is synthesised, Fig. 8b. Logic low of DetOut signal after the detection period

Table 3 Comparison table of magnitude ratios of the proposed MAC and accumulator outputs for different SNR values such
that Oacci and Omaci ∀i = {64, 128, 256, 512, 1024} are their values for different OFDM subcarriers compliant to LTE-A wireless
network
SNR ND = 128 ND = 256 ND = 512 ND = 1024
φ1 ω1 φ2 ω2 φ3 ω3 φ4 ω4
0 dB 4.04 1.97 15.82 3.97 81.40 9.58 356.94 19.05
−6 dB 4.11 1.99 18.69 4.37 92.09 9.78 418.16 21.88
−11 dB 4.15 2.23 17.94 5.88 85.53 9.49 376.28 21.27
−14 dB 3.92 1.81 17.50 4.56 84.29 9.56 380.08 20.32
−20 dB 4.19 2.20 16.99 4.11 86.43 8.64 409.94 20.39
−26 dB 4.02 1.93 16.93 3.62 75.34 8.99 361.16 19.96
ω1 = Oacc64 /Oacc128; φ1 = Omac64 /Omac128.
ω2 = Oacc64 /Oacc256; φ2 = Omac64 /Omac256.
ω3 = Oacc64 /Oacc512; φ3 = Omac64 /Omac512.
ω4 = Oacc64 /Oacc1024; φ4 = Omac64 /Omac1024.

IET Circuits Devices Syst. 7


© The Institution of Engineering and Technology 2018
Fig. 7  (Upper) Performance analysis of the proposed TCD algorithm in
AWGN channel environment for the OFDM frame size of 1024 subcarriers
with different sample values. (Lower) A chip layout of the proposed TCD
which is synthesised and post-layout simulated in 65 nm CMOS technology
node with the core area of (h = 0.579249 mm × w = 0.552121 mm) = 0.32 
mm2

indicates the absence of PU. Note that the clock signal in Fig. 8b is
blurred due to its congested waveform at 1 MHz. Closer view of Fig. 8  Hardware (FPGA) implementation and testing of the proposed TCD
logic analyser output waveform is presented in Fig. 8c where the (a) FPGA test setup for the proposed TDCD detector, (b) Output waveform at 1 MHz,
DetOut signal generates logic high after the detection period indicating the absence of PU after the detection period of 5.12 ms, (c) Output
completes indicating the presence of PU. waveform indicating the presence of PU under same timing considerations, (d) Test
setup to capture real-world signals using the USRPs as OFDM transmitter and receiver
5.3 ASIC implementation of wireless network

The suggested TCD architecture has been simulated for functional critical inputs for accurate power estimation. Functionally verified
verification with appropriate test vectors using Synopsys-VCS design has been synthesised with the standard cell libraries of
engine. Simultaneously, it generates switching-activity interface- UMC 65 nm-CMOS (G-9LT-Logic Mixed-Mode 65N-SP-Low-K)
format (SAIF) file which is a record of transitions that every net in technology node, along with various real-world timing constraints,
the design undergoes during the simulation. Such file is one of the using the Synopsys-DC tool. This process has been carried out with
a supply voltage of 0.72 V under worst-case timing corner and the
synthesis report indicates that the proposed design consumes
Table 4 Scale factors or shift values (in the proposed MAC 165263 standard cells with 62 logic levels. Thereafter, the
and accumulator architectures) for different ND values of generated gate-level netlist is static timing analysed using
OFDM frame compliant to the LTE-A wireless network Synopsys-PT tool as well as functionally verified via post-
ND MAC scale factor Accumulator scale factor synthesis simulation in Synopsys-VCS environment.
64 1 (0-bit shift) 1 (0-bit shift) For the physical design process, netlist along with the six-metal
128 4 (2-bit shift) 2 (1-bit shift) layer library-exchange-format and timing library file is imported
using Cadence-SoC Encounter tool. Additional files for input/
256 16 (4-bit shift) 4 (2-bit shift)
output (I/O) pad integration and their orientation around the core
512 64 (6-bit shift) 8 (3-bit shift) have been imported along with aforementioned files. Thereafter,
1024 256 (8-bit shift) 16 (4-bit shift) core as well as die area of the chip has been floor planned to
accommodate the imported cells. Power rings and strips are added
around and across the core, respectively, for supply and ground
Table 5 Comparison of the hardware resource and power connections of cells and pads. These imported cells are placed on
consumed by the proposed TCD with the reported work [11], the floor-planned core area where they are signal and power routed
when implemented using FPGA platform as well as clock tree synthesised. Subsequently, post-route static-
Proposed CTCD [11] FCD [11] AFD [11] timing analysis is performed iteratively until the design is free from
registers 993 1291 8802 1060 violations. Then, we have added core as well as I/O leaf cells and
logic 5498 8593 16,591 6779 finally performed static timing analysis (STA) for the timing
closure. It indicates that the proposed design can operate with a
memory, bits 20,480 327,680 405,674 327,680
maximum clock frequency up to 217.43 MHz. The physical
power, mW 20.26 34.99 61.29 35.29 verifications like design rule check, layout versus schematic as

8 IET Circuits Devices Syst.


© The Institution of Engineering and Technology 2018
Table 6 Summary of ASIC post-layout simulated results of [3] Boccardi, F., Heath, R. W., Lozano, A., et al.: ‘Five disruptive technology
directions for 5G’, IEEE Commun. Mag., 2014, 52, (4), pp. 74–80
the proposed TCD architecture [4] Zeng, Y., Liang, Y.C.: ‘Spectrum-sensing algorithms for cognitive radio based
Design metric Proposed TCD on statistical covariances’, IEEE Trans. Veh. Technol., 2009, 58, (4), pp.
technology, nm 65 1804–1815
[5] Vijay, G., Bdira, E.B.A., Ibnkahla, M., et al.: ‘Cognition in wireless sensor
supply voltage, V 0.72 networks: a perspective’, IEEE Sens. J., 2011, 11, (3), pp. 582–592
core area, mm2 0.32 [6] Hossain, E., Niyato, D., Han, Z., et al.: ‘Dynamic spectrum access and
management in cognitive radio networks’ (Cambridge, USA, 2009)
gate count, k GEs 161.39 [7] Chaudhari, S.: ‘Phd thesis: spectrum sensing for cognitive radios: algorithms,
number of logic levels 62 performance and limitations’, phdthesis, Aalto University School of Electrical
Engineering, Nov. 2012
maximum clock frequency, MHz 217 [8] Yucek, T., Arslan, H.: ‘A survey of spectrum sensing algorithms for cognitive
leakage power, mW 0.5602 radio applications’, IEEE Commun. Surv. Tutorials, 2009, 11, (1), pp. 116–
130
dynamic power (mW) at 100 MHz 17.902 [9] Lundn, J., Kassam, S.A., Koivunen, V.: ‘Robust nonparametric cyclic
energy efficiency, pJ/bit 0.828 correlation-based spectrum sensing for cognitive radio’, IEEE Trans. Signal
Process., 2010, 58, (1), pp. 38–52
[10] Rebeiz, E., Urriza, P., Cabric, D., et al.: ‘Optimizing wideband
cyclostationary spectrum sensing under receiver impairments’, IEEE Trans.
well as electrical rule check have been performed for the final Signal Process., 2013, 61, (15), pp. 3931–3943
layout and its RC parasitics are extracted. The overall netlist of the [11] Kosunen, M., Turunen, V., Kokkinen, K., et al.: ‘Survey and analysis of
cyclostationary signal detector implementations on FPGA’, IEEE J. Emerg.
physically verified chip-layout has been generated and is used Sel. Topics Circuits Syst., 2013, 3, (4), pp. 541–551
along with the test vectors as well as RC extracted file to perform [12] IEEE, ‘Cognitive wireless RAN medium access control (MAC) and physical
the post-layout simulation for the functional verification. This layer (PHY) specifications’, IEEE 802.22 b Std., 2015
netlist along with the SAIF file are feed to the power estimation [13] Srinu, S., Sabat, S.L.: ‘FPGA implementation of spectrum sensing based on
energy detection for cognitive radio’. Proc. Int. Conf. on Communication
tool to estimate total power consumed by the design and it is Control and Computing Technologies (ICCCCT), 2010, pp. 126–131
18.4622 mW at a clock frequency of 100 MHz. The final layout of [14] Chaitanya, G.V., Rajalakshmi, P., Desai, U. B., et al.: ‘Real time hardware
the proposed TCD with the core areas of 0.32 mm2 is shown in implementable spectrum sensor for cognitive radio applications’. Proc. Int.
Fig. 7. The magnitudes of various design metrics obtained from the Conf. on Signal Processing and Communications (SPCOM), 2012, pp. 1–5
[15] Joshi, G.P., Nam, S.Y., Kim, S.W., et al.: ‘Cognitive radio wireless sensor
aforementioned ASIC synthesis and post-layout simulation are networks: applications, challenges and research trends’, Sensors, 2013, 13,
listed in Table 6. (9), pp. 11 196–11 228
[16] De la Roche, G., Alayn Glazunov, A., Allen, B.: ‘LTE-Advanced and next
generation wireless networks: channel modelling and propagation’ (Wiley,
6 Conclusion USA, 2012)
[17] Turunen, V., Kosunen, M., Huttunen, A., et al.: ‘Implementation of
This work proposed a reconfigurable architecture for the cyclostationary feature detector for cognitive radios’. Proc. Int. Conf.
autocorrelator module that enabled TCD to on-the-fly process Cognitive Radio Oriented Wireless Networks and Communications
received OFDM signals of various subcarrier sizes compliant to the (CROWNCOM), 2009, pp. 1–4
LTE-A wireless network. In order to enhance the accuracy of [18] Kallioinen, S., Vääräkangas, M., Hui, P., et al.: ‘Multi-mode, multi-band
spectrum sensor for cognitive radios embedded to a mobile phone’. Proc. Int.
CORDIC module, which is designed using lesser bit-width Conf. Cognitive Radio Oriented Wireless Networks and Communications
architecture, an overflow/underflow protection technique is (CROWNCOM), 2011, pp. 236–240
introduced and its corresponding architecture is presented as well. [19] Chaudhari, S., Kosunen, M., Mäkinen, S., et al.: ‘Performance evaluation of
We suggested new schemes for the design of hardware-efficient cyclostationary-based cooperative sensing using field measurements’, IEEE
Trans. Veh. Technol., 2016, 65, (4), pp. 1982–1997
MAC and accumulator architectures. Finally, the entire TCD [20] Gardner, W.A., Franks, L.: ‘Characterization of cyclostationary random signal
architecture has been presented by aggregating all the proposed processes’, IEEE Trans. Inf. Theory, 1975, 21, (1), pp. 4–14
modules along with the simplified test-statistics calculation [21] Dandawate, A.V., Giannakis, G. B.: ‘Statistical tests for presence of
module. Performance analysis showed that the suggested TCD cyclostationarity’, IEEE Trans Sig. Process., 1994, 42, (9), pp. 2355–2369
[22] Lundn, J., Koivunen, V., Huttunen, A., et al.: ‘Spectrum sensing in cognitive
could achieve Pd of 0.1 at −22 dB which is in par with the CTCD radios based on multiple cyclic frequencies’. Proc. Int. Conf. Cognitive Radio
algorithm. Real-world testing of its hardware prototype using Oriented Wireless Networks and Communications (CROWNCOM), 2007, pp.
OFDM signals is carried out using USRP-based transmitter and 37–43
[23] Neyman, J., Pearson, E.S.: ‘On the problem of the most efficient tests of
receiver. FPGA implementation of our TCD showed that it requires statistical hypotheses’, Philos. Trans. R. Soc. A: Math., Phys. Eng. Sci., 1933,
34 and 93% lesser hardware resources (logic plus registers) and 231, (694–706), p. 289?337
memory, respectively, compared to existing implementation. [24] 3GPP, 3GPP TS 36.211, technical specification, V13.2.0, 3GPP Std., 06, 16
Eventually, ASIC synthesis and post-layout simulation of this [25] Meher, P.K., Valls, J., Juang, T.-B., et al.: ‘50 years of CORDIC: algorithms,
architectures, and applications’, IEEE Trans. Circuits Syst. I, Reg. Pap., 2009,
hardware-efficient design has been performed using UMC 65 nm 56, (9), pp. 1893–1907
CMOS technology. [26] Walther, J.S.: ‘A unified algorithm for elementary functions’. Proc. Spring
Joint Computer Conf., 1971, pp. 379–385
[27] Walther, J.S.: ‘The story of unified CORDIC’, J. VLSI Signal Process., 2000,
7 References 25, (2), pp. 107–112
[28] Crenshaw, J.W.: ‘Math toolkit for real-time development’ (CRC Press, USA,
[1] Haykin, S.: ‘Cognitive radio: brain-empowered wireless communications’,
2000)
IEEE J. Sel. Areas Commun., 2005, 23, (2), pp. 201–220
[29] Khan, S.A.: ‘Digital design of signal processing systems: a practical
[2] Bhat, P., Nagata, S., Campoy, L., et al.: ‘LTE-Advanced: an operator
approach’ (Wiley, UK, 2011)
perspective’, IEEE Commun. Mag., 2012, 50, (2), pp. 104–114

IET Circuits Devices Syst. 9


© The Institution of Engineering and Technology 2018

Das könnte Ihnen auch gefallen