MICROPROCESSORS AND MICROSYSTEMS www.elsevier.com/locate/micpro Microprocessors and Microsystems 30 (2006) 43-51 # Real-time implementation of an efficient correlator for complementary sets of four sequences applied to ultrasonic pulse compression systems Fernando J. Álvarez<sup>a,\*</sup>, Álvaro Hernández<sup>b</sup>, Jesús Ureña<sup>b</sup>, Manuel Mazo<sup>b</sup>, J. Jesús García<sup>b</sup>, J. Antonio Jiménez<sup>b</sup>, Ana Jiménez<sup>b</sup> <sup>a</sup>Department of Electronics and Electromechanical Engineering, University of Extremadura, Escuela Politécnica, 10071, Cáceres, Spain <sup>b</sup>Department of Electronics, University of Alcalá, Escuela Politécnica, 28805 Alcalá de Henares, Spain Available online 23 May 2005 ## **Abstract** Complementary sequences are being employed to improve the results obtained in many sensorial systems where it is necessary to enhance the signal to noise ratio imposed by the environment. This improvement is achieved at the expense of increasing the computational complexity of the signal processing tasks. This work presents the hardware implementation of an efficient correlator for complementary sets of four sequences and its application to an ultrasonic pulse compression system. This implementation starts with a generic VHDL specification and then it is developed on an FPGA architecture. The achieved design can be adapted to different computational requirements, easily modifying its datapath and the length of the used sequences. © 2005 Elsevier B.V. All rights reserved. Keywords: Ultrasonic pulse compression system; Complementary sequences; Efficient correlator ## 1. Introduction Ultrasonic sensors provide cheap and convenient means for determining the distances to objects, which is done by measuring the time delay between sensor emissions and returning echoes. The method most frequently used for time-of-flight measuring is threshold detection, which is very simple but inaccurate, since it strongly depends on the level of the received signals. In the last years, some sonar systems have solved this problem by incorporating pulse compression techniques borrowed from radar theory [2,6,7]. Correlation detection increases the accuracy of the measurements, the spatial resolution (overlapping echoes can be clearly identified), and the overall process gain of these systems. Furthermore, it is possible to reduce the scanning times if a set of encoding sequences with low cross-correlation properties between them is selected. In this case, all the signals can be simultaneously emitted and the receivers are able to discern between them without interference, obtaining in this way a set of measurements made under similar conditions (multi-mode operation). However, the direct implementation of the correlation functions imposes large processing cost and, depending on the magnitude of the data to be processed, real-time operation could not be possible without the aid of high complexity hardware. A multi-mode sonar system which is able to perform real-time operation with low complexity hardware was previously proposed by the authors in [5]. This system performed the simultaneous emission of two pairs of Golay sequences [4], and its practical implementation was possible, thanks to the development of an efficient correlator for these codes [3,8]. This Efficient Golay Correlator (ECG) extremely reduces the number of operations to be carried out in the detection process and it was built on a simple platform based on a low-cost FPGA device [5]. The performance of this system can be notably improved if the signals are coded with complementary sets of sequences instead of Golay pairs. These sets are a generalization of Golay pairs, which contain more than two sequences and have better autocorrelation properties, allowing these systems to work under low levels of signal-to-noise ratio. Furthermore, when dealing with complementary sets, it is possible to generate more than two simultaneous emissions with ideal null cross-correlation properties (orthogonal <sup>\*</sup> Corresponding author. Tel.: +34 927 257 195 ext:7571. E-mail address: fafranco@unex.es (F.J. Álvarez). Fig. 1. Block diagram of a generic sensorial system based on the emission of four complementary sets of sequences. sets). This feature allows the system to extract more information from the environment in the same measuring period. Fig. 1 shows the block diagram of a generic sensorial system based on the emission of four orthogonal complementary sets. In order to achieve the practical implementation of this system, it is necessary to develop an efficient correlator for complementary sets of sequences equivalent to the EGC derived by Popovic for Golay pairs [8]. This work presents the hardware implementation of this correlator and its performance in the receiver module of a novel ultrasonic pulse compression system. The rest of the paper is organized as follows. In Section 2, complementary sets of sequences are briefly described. Section 3 presents the Efficient Correlator for sets of four sequences, and the implementation of this system is considered in Section 4. In Section 5, the application of the correlator to an ultrasonic pulse compression system is discussed and finally the conclusions are outlined in Section 6. ## 2. Complementary sets of four sequences A complementary set of four sequences is a set of binary sequences whose elements are either +1 or -1, having the property that the sum of their aperiodic auto-correlation functions equals zero for all nonzero time-shifts. In particular, if $\{a,b,c,d\}$ is a set of four sequences with length L, and $R_{xx}$ represents the auto-correlation function of the sequence x[i] then: $$R_{aa}[i] + R_{bb}[i] + R_{cc}[i] + R_{dd}[i] = \begin{cases} 4L, & i = 0\\ 0, & i \neq 0 \end{cases}$$ (1) Complementary sets of more than two sequences were initially studied by Tseng and Liu [9], who gave a general method for constructing sets of longer sequences from shorter ones. Based on this method a new algorithm can be proposed which allows the recursive generation of complementary sequences with length $L=4^N$ starting from the elementary delta sequences [1]. This algorithm can be expressed as follows $$\begin{aligned} a_0[i] &= b_0[i] = c_0[i] = d_0[i] = \delta[i], \\ a_n[i] &= a_{n-1}[i] - b_{n-1}[i - S_n] + w_{1,n} \cdot c_{n-1}[i - 2S_n] \\ &+ w_{1,n} \cdot d_{n-1}[i - 3S_n], \\ b_n[i] &= a_{n-1}[i] + b_{n-1}[i - S_n] + w_{1,n} \cdot c_{n-1}[i - 2S_n] \\ &- w_{1,n} \cdot d_{n-1}[i - 3S_n], \\ c_n[i] &= w_{2,n} \cdot a_{n-1}[i] - w_{2,n} \cdot b_{n-1}[i - S_n] \\ &- w_{1,n} w_{2,n} \cdot c_{n-1}[i - 2S_n] \\ &- w_{1,n} w_{2,n} \cdot d_{n-1}[i - 3S_n], \\ d_n[i] &= w_{2,n} \cdot a_{n-1}[i] + w_{2,n} \cdot b_{n-1}[i - S_n] \\ &- w_{1,n} w_{2,n} \cdot c_{n-1}[i - 2S_n] + w_{1,n} w_{2,n} \cdot d_{n-1}[i - 3S_n] \end{aligned}$$ where $a_n$ , $b_n$ , $c_n$ and $d_n$ are four sequences with length $4^n$ ; $\delta[i]$ is the Kronecker delta function; n is the number of iterations (n=1, 2,...,N); $w_{1,n}$ and $w_{2,n}$ are two coefficients having values +1 or -1, and $S_n$ represents a set of positive delays which must be selected as any permutation of $\{4^0, 4^1,...,4^{N-1}\}$ . One of the more interesting properties of this algorithm is that it is able to easily generate mutually orthogonal sets. Two sets with the same number of sequences $\{a^{(1)}, b^{(1)}, c^{(1)}, d^{(1)}\}$ and $\{a^{(2)}, b^{(2)}, c^{(2)}, d^{(2)}\}$ are said to be orthogonal when the sum of the corresponding cross-correlation Table 1 Orthogonal sets generated in the first iteration $(S_1 = 4^0)$ | $w_{1,1} = -1 \ w_{2,1} = -1$ | $w_{1,1} = -1 \ w_{2,1} = +1$ | |-------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------| | $ \overline{a^{(1)} = [+1 -1 -1 -1]} b^{(1)} = [+1 +1 -1 +1] c^{(1)} = [-1 +1 -1 -1] d^{(1)} = [-1 -1 -1 +1] $ | $a^{(2)} = [+1 -1 -1 -1]$ $b^{(2)} = [+1 +1 -1 +1]$ $c^{(2)} = [+1 -1 +1 +1]$ $d^{(2)} = [+1 +1 +1 -1]$ | | $u = \begin{bmatrix} -1 & -1 & -1 & +1 \end{bmatrix}$ | a - [+1 +1 +1 -1] | | $w_{1,1} = +1 \ w_{2,1} = -1$ | $w_{1,1} = +1 \ w_{2,1} = +1$ | functions equals zero: $$R_{a^{(1)}a^{(2)}}[i] + R_{b^{(1)}b^{(2)}}[i] + R_{c^{(1)}c^{(2)}}[i] + R_{d^{(1)}d^{(2)}}[i] = 0 \quad \forall i$$ $$(3)$$ The algorithm generates mutually orthogonal sets simply by modifying the values of the coefficients $w_{1,n}$ and $w_{2,n}$ in the first iteration and then using the same value for these coefficients in the remaining iterations. Table 1 shows the four mutually orthogonal sets generated in the first iteration with $S_1 = 4^0$ . Another interesting property of this algorithm is the versatility introduced by the delays $S_n$ appearing in its formulation. The algorithm always generates complementary sequences despite the values chosen for these delays, but these sequences are binary only when the delays belong to the set of permutations of $\{4^0, 4^1, ..., 4^{N-1}\}$ . When this permutation is $[4^0, 4^1, ..., 4^{N-1}]$ the algorithm is a particular case of the recursive method proposed by Tseng and Liu [9] and the new sequences are formed simply by concatenation of the previous ones complemented or not. ### 3. Efficient sets of sequences correlator (ESSC) As has been already stated, the practical realization of the systems based on complementary sets of sequences depends on the development of an efficient correlator able to perform the detection of these sequences with affordable computational cost. This correlator can be derived directly from the previous algorithm if we note that when the order of the delays is interchanged the algorithm generates the time-reversed versions of the sequences $(a_N[L-i], b_N[L-i], c_N[L-i], d_N[L-i])$ instead of the direct ones $(a_N[i], b_N[i], c_N[i], d_N[i])$ . By applying Z-transform to the equations of the algorithm with the delays interchanged, it can be written as: $$A_{0} = B_{0} = C_{0} = D_{0} = 1,$$ $$A_{n} = A_{n-1}[z] \cdot z^{-3S_{n}} - B_{n-1}[z] \cdot z^{-2S_{n}} + w_{1,n} \cdot C_{n-1}[z] \cdot z^{-S_{n}} + w_{1,n} \cdot D_{n-1}[z], B_{n} = A_{n-1}[z] \cdot z^{-3S_{n}} + B_{n-1}[z] \cdot z^{-2S_{n}} + w_{1,n} \cdot C_{n-1}[z] \cdot z^{-S_{n}} - w_{1,n} \cdot D_{n-1}[z],$$ $$C_{n} = w_{2,n} \cdot A_{n-1}[z] \cdot z^{-3S_{n}} - w_{2,n} \cdot B_{n-1}[z] \cdot z^{-2S_{n}} - w_{1,n}w_{2,n} \cdot C_{n-1}[z] \cdot z^{-S_{n}} - w_{1,n}w_{2,n} \cdot D_{n-1}[z],$$ $$D_{n} = w_{2,n} \cdot A_{n-1}[z] \cdot z^{-3S_{n}} + w_{2,n} \cdot B_{n-1}[z] \cdot z^{-2S_{n}} - w_{1,n}w_{2,n} \cdot C_{n-1}[z] \cdot z^{-S_{n}} + w_{1,n}w_{2,n} \cdot D_{n-1}[z]$$ $$(4)$$ These expressions represents a digital filter of N similar stages with four adders, four subtractors and three delay elements each, as can be observed in Fig. 2. The impulsive response of this filter are the time-reversed sequences $a_N[L-i]$ , $b_N[L-i]$ , $c_N[L-i]$ , $d_N[L-i]$ , and therefore, it behaves like a matching filter which correlates the input sequence r[i] with the direct complementary sequences $a_N[i]$ , $b_N[i]$ , $c_N[i]$ , $d_N[i]$ . In the case of sequences of length $L=4^N$ the total number of operations performed by the ESSC is only $8 \cdot N$ , whereas, in the straightforward matched filter implementation, it will be $2 \cdot 4^N - 1$ ( $4^N$ multiplications and $4^N - 1$ additions). Note that, at each stage, the output values are generated by addition or subtraction of four input values and thus the maximum value of the output signal could be four times the maximum value of the input signal, requiring two bits more for its representation in order to avoid overflow. If $N_{A/D}$ stands for the number of bits required to represent the values of the input signal, the total number of memory bits Fig. 2. Block diagram of the efficient sets of sequences correlator (ESSC). Fig. 3. Proposed configuration for the implementation of the ESSC. $N_{\rm m}$ in the efficient correlator is obtained as $N_{\rm m} = 6S_1 N_{A/D} + 6S_2 (N_{A/D} + 2) + \dots + 6S_N (N_{A/D} + 2(N-1)) = \sum_{i=1}^{N} 6S_i (N_{A/D} + 2(N-1))$ $$+2(N-1) = \sum_{n=1}^{N} 6S_n[N_{A/D} + 2(n-1)]$$ (5) or, taking into account that $$\sum_{n=1}^{N} S_n = \sum_{n=0}^{N-1} 4^n = \frac{1}{3}(L-1)$$ (6) Eq. (5) may be rewritten as: $$N_{\rm m} = 2N_{A/D}(L-1) + 12\sum_{n=2}^{N} (n-1)S_n$$ (7) As can be seen this number is higher than the number of memory bits needed in the straightforward implementation. However, it can be minimized when choosing the reverse permutation $S_n = [4^{N-1}, 4^{N-2}, ..., 4^0]$ . ## 4. Implementation of the efficient correlator ## 4.1. Design strategy The design of the efficient correlator has been structured around two generic parameters of the VHDL specification. On the one hand, it is possible to configure the number of stages N in the scheme, and therefore the length (in bits) of the sequences to be processed $L=4^N$ . On the other hand, it is also possible to select the number of bits employed in the codification of the input signal $N_{A/D}$ . This later parameter will directly affect the accuracy of the design and the quality of the results. Fig. 3 shows the proposed block, where r[i] is the input signal represented by $N_{A/D}$ bits; and $R_{ra}[i]$ , $R_{rb}[i]$ , $R_{rc}[i]$ and $R_{rd}[i]$ are the correlation functions between the input signal r[i] and the sequences a[i], b[i], c[i] and d[i] of the considered complementary set. Note that signals $R_{ra}[i]$ , $R_{rb}[i]$ , $R_{rc}[i]$ and $R_{rd}[i]$ require $N_{A/D} + 2 \cdot N$ bits, if overflow is avoided in internal computations. The internal architecture of the ESSC is based on multiple stages connected in cascade as is shown in Fig. 4. The number of stages is configurable in order to increase the adaptability of the system. Internally the design of these stages has been divided into two modules perfectly differentiated: one combinational block where the arithmetic elements are grouped, and one sequential block, which contains all the delay units necessary in the correlator. This structure is shown in Fig. 5. The combinational block contains all the combinational resources needed in a stage of the structure (see Fig. 6a). In practice, the multiplications can be reduced to additions and subtractions because of the binary nature of the coefficients $w_{k,1}$ and $w_{k,2}$ . The values of these coefficients determine the final configuration of the adders and the subtractors in the kth stage. The sequential block includes all the delay elements appearing in the structure of the correlator. This block has been implemented making use of the basic module SRL16, an internal element in the architecture of the more recent FPGAs from Xilinx [10]. These elements have been concatenated and configured according to the different stages present in the correlator, as is shown in Fig. 6b. In Fig. 6b, the index j has been determined trying to minimize the total number of memory bits $N_{\rm m}$ , according to (7). So, the sequential blocks follow the permutation $S_n = [4^{N-1}, 4^{N-2}, \dots 4^0]$ , where N is the number of stages in the efficient correlator ESSC, and $j=N-1, N-2,\dots,0$ . Since the width of the datapath increases, as the number of stage k does, it is better to place longer delay blocks in first stages in order to reduce the required memory. Fig. 4. Internal architecture of the ESSC. Fig. 5. Internal configuration of a generic stage. Fig. 6. Structure of the combinational and the sequential blocks. # 4.2. Performance of the hardware implementation Firstly, it is necessary to consider the quantification error introduced by the hardware implementation shown in Section 4.1. In order to do that, the values of the input sequence are normalised in the [-1, 1] range. A fixed point format has been used in the encoding of the numeric values. Starting from the values of the input sequence r[i], the width of the system datapath is increased at each stage in order to avoid any kind of overflow or truncation in the consecutive arithmetic operations. Table 2 shows the quantification error introduced in a two-stages ESSC for different numbers of bits in the fractional part of the signal r[i]. From now on, a precision of 9 bits has been considered Table 2 Quantification error for different numbers of bits assigned to r[i] | Precision (number of bits) integer + fractional | Quantification error (%) | |-------------------------------------------------|--------------------------| | 1+4 | 0.1496 | | 1+6 | 0.1111 | | 1+7 | 0.1065 | | 1+8 | 0.1012 | | 1 + 10 | 0.0993 | in the input signal r[i] (1 bit for the integer value and 8 for the fractional part). Fig. 7 shows the input signal applied to the developed design. This signal consists of a set of four complementary sequences (16 bits length) arranged one after another with Fig. 7. Input signal r[i]. Fig. 8. Ideal output signals obtained from simulation. a separation of 50 samples between them. The ideal result obtained from simulation is shown in Fig. 8 for each one of the four correlations performed by the ESSC: $R_{ra}[i]$ , $R_{rb}[i]$ , $R_{rc}[i]$ y $R_{rd}[i]$ . It can be observed that these signals present the maximum correlation value when the corresponding complementary sequence arrives, as expected. Fig. 9 shows the actual signals obtained by the designed hardware module. As can be seen, these signals are nearly equal to the ideal ones, with an error less than 0.1%. If the auto-correlation functions for every sequence $R_{aa}[i]$ , $R_{bb}[i]$ , $R_{cc}[i]$ and $R_{dd}[i]$ are extracted from Fig. 9 (see the remarked area in the mentioned figure), and added according to (1), a result similar to a delta function is obtained with a value of $4 \times L = 4 \times 4^{N} = 64$ (being N = 2) and with a noise level of 0.1%. This auto-correlation is shown in Fig. 10, where noise cannot be appreciated since its level is really reduced. The system has been implemented on a FPGA X2S100E [11]. Table 3 shows the operation frequencies (maximum delays) and the occupancy rates achieved for sequences of different lengths. The operating frequencies are constrained by the combinational delays appearing in the design; taking into account that every new acquired sample is processed by the system in only one FPGA clock cycle (real-time). Nevertheless, in some sensorial applications it could be necessary to increase these frequency rates, so the design could be modified inserting synchronization register to reduce the combinational delay. This would imply not only the increase of the frequency, but also latency in the operation (similar to other pipeline structures). #### 5. Application to an ultrasonic pulse compression system The performance of the ESSC in the receiver module of an ultrasonic pulse compression system has been tested with the aid of the experimental setup shown in Fig. 11. In this system, two orthogonal sets of four sequences (64 bits each one) are generated and continuously transmitted through a pair of ultrasonic transducers placed at different distances from a high-frequency microphone. The simultaneous emission of the four sequences in a set is accomplished by interleaving these sequences to generate a new sequence of 256 bits, and then modulating this sequence with a symbol of one period of a 50 kHz square signal. This BPSK digital modulation allows us to centre the spectrum of the emitted signals around the maximum response frequency of the Polaroid electrostatic transducers. As can be seen in Fig. 11a, the received signal is amplified and digitalized at a sampling rate of 800 kHz. This signal is demodulated by correlation with a digital correlator matched to the modulation symbol, which extracts the bits of the four sequences with 16 samples delay between them. Next, the signal enters the ESSCs where it is simultaneously correlated with the four sequences in the corresponding set. Fig. 9. Actual output signals obtained from the hardware implementation. Note that the signal entering the ESSCs is actually an interpolated version of the signal obtained by interleaving the complementary sequences, with an interpolation factor of 16 (number of samples in the modulation symbol). Therefore, in this signal two samples belonging to the same sequence are $4 \times 16 = 64$ samples apart, and it will be Fig. 10. Final auto-correlation of a set of four complementary sequences. necessary to decimate the signal by the same factor prior to carrying out the correlations. Fortunately, this decimation can be easily fulfilled just by multiplying the values of all the delay stages in the ESSCs by 64. Finally, taking into account that the bits of the interleaved sequences are delayed 16 samples, it is necessary to add three additional delay stages at the outputs of the ESSC in order to perform the in-phase sum of the autocorrelation functions. Fig. 12 shows the actual results obtained with this system when emitters 1 and 2 are placed at 3 and 3.5 m from the microphone, respectively, and the room temperature is fixed to 20 °C (sound speed equal to 343.5 m/s). As can be seen in this figure, each time a set of sequences is completely received a new peak is obtained only in the corresponding output. The system is able to discriminate between the two signals despite they are being simultaneously received in a noisy environment. Similar results are obtained when the other two orthogonal sets are transmitted. Table 3 Results of the implementation on a X2S100E for sequences of different lengths | | Occupancy (slices) (%) | Maximum delay (ns) | |----------------|------------------------|--------------------| | L=16 bits | 18 | 20 | | L=64 bits | 33 | 24 | | L=256 bits | 62 | 28 | | L = 1024 bits | 99 | 35 | Fig. 11. Ultrasonic pulse compression system: (a) block diagram of the detection process, (b) equipment and experimental setup. ## 6. Conclusions In this paper, the design of an efficient correlator for complementary sets of four sequences has been presented. This design is characterized by two basic parameters: the width of the datapath, which determines the accuracy of the system; and the number of stages in the architecture, which depends on the length of the sequences employed. In this way, the final result is a configurable module, able to be adapted to the circumstances and necessities required by different sensorial systems. The system has been implemented on a Spartan-IIE FPGA by Xilinx achieving real-time execution times and minimum levels of error between the ideal results and the real ones obtained from the hardware implementation. The computational load has been analysed, modifying the used precision and the length of the used sequences. Fig. 12. Actual results obtained from the ultrasonic pulse compression system. The developed module has been tested in an ultrasonic pulse compression system to which the efficient correlator has been adapted in order to process the previously demodulated signals. The results show that this module can be successfully used in the development of high performance multi-mode sonar systems improving the results obtained with other encoding schemes previously used. #### Acknowledgements The work has been possible by funding from the Ministry of Science and Technology of Spain (PARMEI project, ref. DIP2003-08715-C02-01) and from the University of Alcalá (ISUAP project, ref. PI2004/033). ## References [1] F.J. Álvarez, J. Ureña, M. Mazo, A. Hernández, J.J. García, J.A. Jiménez, Efficient generator and pulse compressor for complementary sets of four sequences, IEE Electronics Letters 40 (11) (2004) 703–704. - [2] K. Audenaert, H. Peremans, Y. Kawahara, Campenhout J. Van, Accurate ranging of multiple objects using ultrasonic sensors, Proceedings of the IEEE International Conference on Robotics and Automation, Nice, France, May 1992. - [3] S.Z. Budisin, Efficient pulse compressor for golay complementary sequences, IEE Electronics Letters 27 (3) (1991) 219–220. - $[4]\ \ M.J.\ Golay, Complementary\ series, IRE\ Transaction\ IT-7\ (1967)\ 82-87.$ - [5] A. Hernández, J. Ureña, D. Hernanz, J.J. García, M. Mazo, Real-time implementation of an efficient golay correlator (EGC) applied to ultrasonic sensorial systems, Microprocessors and Microsystems 27 (2003) 397–406. - [6] K-W. Jörg, M. Berg, Sophisticated mobile robot sonar sensing with pseudo-random codes, Robotics and Autonomous Systems 25 (1998) 241–251. - [7] D. Maioli, C. Narduzzi, C. Offelli, D. Petri, E. Sardini, A. Taroni, Digital time-of-flight measurement for ultrasonic sensors, IEEE Transactions on Instrumentation and Measurement 41 (1992) 93–97. - [8] B.M. Popovic, Efficient golay correlator, IEE Electronics Letters 35 (17) (1999) 1427–1428. - [9] C.C. Tseng, C.L. Liu, Complementary sets of sequences, IEEE Transactions on Information Theory IT-18 (5) (1972) 644–652. - [10] Xilinx, Inc., Using look-up tables as shift registers (SRL16) in Spartan-3 devices, Application Note v1.0, April 2003. - [11] Xilinx, Inc., Spartan-IIE 1.8V FPGA familly: complete data sheet, Product Specification, July 2003.