2.1. System Architecture
An ultrasound measurement hardware usually consists of three components: A transducer, which converts electrical signals into sound waves and vice versa. Electronics can be used to excite the piezoelectric elements of the transducer to oscillate and, conversely, to amplify, digitize and pre-process small received signals. Finally, the data are transferred to a PC, where the main data processing, image reconstruction, visualization and analysis takes place.
In order to realize the volumetric imaging, we used 1024-element matrix arrays, which can be controlled individually via the developed, fully integrated 1024-channel electronics. The transmit and receive beamforming was performed by a single PC, which was installed together with the four ultrasound electronics units into a single rack housing (
Figure 1).
The 1024-channel ultrasound electronics consist of four DiPhAS ultrasound research systems, each of which integrates 256 parallel transmitter and receiver elements. A dedicated timing circuit ensures high-precision synchronicity between the individual sub-systems for transmit signal generation and parallel echo detection. Each of the four systems is connected to the PC via a PCIe interface. To connect the ultrasound transducer there are four connectors of the type ITT Cannon DLM6-360. Transducers with other pin assignments or connectors can be adapted via pin-out converters.
2.2. Transducer
In view of a future use of the system not only as a general purpose ultrasound research platform but with special focus on prostate therapy monitoring by tracking of radiation-sensitive ultrasound contrast agents [
16,
17], in the context of the AMPHORA project, a transducer with optimized footprint for transperineal imaging was developed. This transducer was used for a first characterization of the multichannel electronics. Typical matrix transducers show small footprints to fulfill the λ/2 criterion for optimized beam-steering capability, which leads to low penetration depths. For the application of therapy monitoring through the perineal window, a high penetration and imaging depth is necessary while accepting smaller beam-steering angles.
Therefore, the perineal window and the number of available channels define the maximum rectangular footprint of the aperture. With the target dimension of 30 mm × 20 mm given by the planned perineal application and 1024 dedicated channels, a pitch of 770 µm, corresponding to 2λ at a center frequency of 4 MHz, results in 26 elements in the elevation direction and 39 elements in the lateral direction. Sound field simulations with our proprietary point source synthesis simulation software “SCALP” were performed to optimize the array geometry. For optimized oscillation modes, FEM simulations suggested a 3 × 3 sub dicing on each element. For further enhanced energy transfer, the acoustic stack was processed on a syntactic foam backing with flex printed circuits to connect through it. To get full advantage of multi-channel capabilities each transducer element is connected to an individual electronics channel via four separated 256-core micro coax cables. Two FEM optimized matching layers were applied for improved bandwidth and energy transfer, which is crucial in the foreseen therapy monitoring application.
Beyond the targeted clinical application, a commercial matrix array probe made by the French company Vermon with 32 × 32 elements was also used to characterize the volumetric ultrasound platform. The transducer has a center frequency of 2.8 MHz and a pitch of 300 µm. The aperture has a footprint size of 9.6 mm × 9.6 mm.
2.3. Multi-Channel Electronics
Each of the four 256 channel ultrasound systems is modular in design (
Figure 2). They consist of 16 front-end boards, which integrate 16 transmit and receive channels each; a main board, which handles the entire control and data management of the sub-system; a transducer connector board, which collects the individual signal lines of the front-end boards and leads them to a transducer connector; as well as a power supply developed in-house.
2.3.1. Main Board
The main board is the central control unit of each 256-channel sub-system. It integrates a high-performance Virtex-6 FPGA with internal MircroBlaze soft processor, which is responsible for a wide range of tasks. The FPGA handles communication with the PC; receives, interprets, acknowledges and manages incoming commands; and parameterizes the transmit and receive circuits on the front-end boards accordingly.
In the receiving phase, the digitized data of all 16 front-end boards are parallelized in the FPGA of the main board, synchronized, buffered and then transferred to the PC at high transmission speeds. For this purpose, a PCIe interface of the 2nd generation with 8 lanes per sub-system was implemented, with which it is possible to realize transfer rates of up to 25 Gbit/s per 256-channel sub-system. In total, the data of the four sub-systems are connected with an overall bandwidth of 100 Gbit/s to a single PC, which computes the actual signal processing, reconstruction and visualization.
Parallel to the PCIe interface for general communication and data transfer, a bidirectional serial interface was also implemented, which can be optionally used for debugging or for service purposes for controlling a sub-system.
In order to be able to ensure communication between the individual front-end FPGAs and the main board FPGA, a custom bidirectional data bus with a special transmission format was developed. Each front-end FPGA is connected to the FPGA of the main board via a total of 5 LVDS line pairs. One of these line pairs serves as a so-called CMD bus, through which commands are transferred from the MicroBlaze (running at 100 MHz) of the main board to the FPGAs of the front-end boards. At the same time the bus is used to synchronize the transmit/receive phases, controlled by the main board FPGA. In the opposite direction, four LVDS line pairs per front-end board are used as a DATA bus, in order to be able to transport not only status information but also above all the digitized data of the 16 receive channels to the main board FPGA at high transfer rates.
Physically, a total of 16 slots are provided on the main board, into which the front-end boards are plugged via card-edge connection. When using 16-channel front-end boards, up to 256 parallel send and receive channels are addressed per sub-system.
In addition to the parameterization of the transmit and receive characteristics and the data management on the receive side, the main board is also responsible for the synchronization of the 256 parallel channels of the sub-system. A high-precision clock network was set up for this purpose. When digitizing ultrasound signals, the choice of a suitable clock is essential in order not to affect the performance of the receiver’s A/D converters. To ensure a high resolution, the applied clock must have only small phase variations. However, clocks generated by an FPGA are generally subject to comparatively high phase fluctuations. Therefore, two 14-channel clock generators of the type AD9523 from the manufacturer Analog Devices are used on the main board, which have an extremely low broadband jitter < 100 fs; i.e., they can generate very clean clocks. Each of the two clock generators integrates two PLLs and a clock distributor. To reduce phase noise and thus improve the signal-to-noise ratio at the A/D converter, an external, frequency-stable, voltage-controlled crystal oscillator (VCXO) is used to generate a clock of 80 MHz. A reference frequency of 20 MHz is applied to the input of the clock generators, which is generated by a VCO and fed via a clock distributor to the two clock generators and as a system clock to the FPGA of the main board. Both PLLs generate eight in-phase 80 MHz clocks each, synchronized to the input clock, which are routed via LVDS lines to the individual A/D converters on the front-end boards (
Figure 3). In the layout, conscious care was taken to ensure that the signal line lengths to the individual front-end interfaces are practically identical.
The main board provides both input and output triggers. The input trigger allows the ultrasound system to be controlled by an external device. Thus, the ultrasound system starts a send/receive event or sequence only if a trigger signal is present. The input trigger is sampled with a frequency of 1 MHz. The output trigger works the other way around. Here, the ultrasound system generates a trigger signal (resolution of 120 MHz) for each send/receive event or sequence and thus controls an external device. Both input and output are decoupled via optocouplers and operate at +5 V TTL levels. Since the required pulse width may vary from application to application, it is possible to define the pulse width of the trigger by the firmware and software.
The logic contents of the FPGA are stored in a flash device, which is programmable via a JTAG interface. At the power-on moment, an initialization routine configures the main board FPGA. During this time, the FPGA has no function yet and an additional PLD was installed, which has its logic content directly after the supply voltage is switched on and already serves basic functionalities.
In order to be prepared for future tasks, the main board has two card edge connectors, which can be used to connect PCBs to the system that can perform additional functions. The add-on cards can be controlled by the main board FPGA via a 19-bit address and 16-bit data bus. In addition, a large number of other control signals as well as some connection lines to the PLD are connected to the interface.
2.3.2. Front-End Boards
The front-end board integrates the analog/digital interfaces for a total of 16 parallel transmit and receive circuits each. A Spartan-6 FPGA with integrated MicroBlaze soft processor (running at 50 MHz) is used for control and data processing. The front-end FPGA, synchronized and controlled by the main board FPGA, handles the entire timing of a scan, such as pulse-echo measurement. For this purpose, a corresponding state machine consisting of reset, initialization, transmit and receive phases (user configurable), as well as a wait state, was implemented.
The logic contents of the front-end FPGAs are stored centrally in a flash device on the main board, which is programmable via a JTAG interface. At the moment of switch-on, an initialization routine configures the front-end FPGAs in the way the main board does.
The parameterization of a scan is done via the programming and user interface of the PC software. Commands are generated for each channel based on the user’s input, describing the transmit and receive properties in detail. These commands are transmitted to the corresponding individual sub-systems via four PCIe interfaces. On each 256-channel sub-system, the commands are first managed by the main board FPGA and forwarded to the corresponding front-end FPGAs. There, the commands are interpreted by the integrated MicroBlaze soft processor and the sequence control characteristics and the transmit signals of each channel are programmed according to the user-defined settings.
To store the transmit patterns, a separate FPGA internal block RAM (16 × 2048 bit) is used for each channel. This has the advantage that during the transmission phase a signal can be read out individually from the memory for each channel at different times. Thus, it is possible to first count down the coarsely resolved delay values, which are stored in a separate internal block RAM (16 × 1024 bit) per channel, using a 60 MHz clock accuracy. After this, the actual transmit signal is read out from the transmit memory at a clock rate of 120 MHz and routed to the external pulser output stages via 4:1 serializers at a high clock rate of 480 MHz. To realize a high delay resolution, a specially developed sorting algorithm is used. In addition to the coarse delay already counted down, this algorithm shifts the transmit signal read from the memory by several 120 MHz clocks, depending on the delay set, before the signal is then clocked out at high resolution by the serializers at the I/O interfaces. Shift operators and an AND gate are used in the FPGA for this purpose. This principle allows a high delay resolution of 2 ns to be achieved. In total, transmit patterns with up to 32 µs signal length can be stored in the transmit memories per channel, together with a total of 1024 different delay values.
The pulser output stages are capable of generating rectangular tristate burst signals that can be programmed in frequency, amplitude, number of cycles and delays per channel. A total of eight MAX4940s from the manufacturer Maxim Integrated are used as pulser output stages per front end, each integrating four transmit channels per chip. Each transmit output stage consists of a transistor circuit that switches through either a positive or a negative DC high voltage, depending on the digital control of the Spartan-6 FPGA (
Figure 4,
Figure 5 upper part). The voltage level can be set up to +/−100 V for all channels together. To increase the output current per channel to a maximum of 4 A, two outputs are always connected together. This also allows a higher frequency excitation of the ultrasound transducer. In total, a front-end board integrates 16 parallel transmit channels.
In the receive path (
Figure 5, lower part), a T/R switch is used to protect the sensitive input circuits from too high voltages of the transmit power stages. Each front end uses two MAX4937s from Maxim Integrated, each integrating eight channels and designed for transmit voltages up to +/−115 V. The transmit voltages are limited by the integrated diode bridges in such a way that only +/−0.75 V is present at the output of the component, which is completely uncritical for the input amplifiers of the receiver. Followed behind the T/R switches are analog anti-aliasing filters that can be equipped with different values depending on the application. Then, two AD9674 devices from Analog Devices are used to process the analog receive signals. Each of them features eight receive channels, including a low-noise preamplifier, a voltage-controlled attenuator, a post-amplifier, programmable filters consisting of a high-pass/low-pass combination and an A/D converter.
The receiver gain can be set to a maximum of 52 dB. The voltage-controlled attenuator allows a Time Gain Control (TGC) to be implemented in a range of up to 45 dB. To realize this, the attenuators are controlled by a TGC curve defined using up to 1024 supporting values, which specifies how high the amplification factor should be at which time. The TGC curve is generated centrally by the FPGA on the main board and is routed differentially to the individual front-end boards via a 12-bit digital-to-analog converter and a high-frequency output driver.
The amplifiers are followed by a programmable high-pass/low-pass filter combination implementing a band-pass behavior. The maximum adjustable cut-off frequencies are mainly dependent on the sampling frequency of the following analog-to-digital converter. However, the device offers the additional option of bypassing the integrated high-pass filter. As a result, the lower cutoff frequency can be reduced to 300 kHz.
After filtering, the received signals are digitized. The analog-to-digital converter is operated with a sampling rate of 80 MSPS and 12-bit of resolution. The digitized data of the 16 channels are transmitted via LVDS lines to the Spartan-6 FPGA, parallelized for internal preprocessing, extended to a 16-bit data type, sorted and then buffered in 2 × 4 Gbit DDR3 RAMs (32 million samples per channel). Depending on the sampling rate requirements, an accumulation of the data with a factor of two or four is possible, reducing the resulting sampling rate while increasing the signal-to-noise ratio. After that, the data of all the front ends are transferred to the main board, where they are sorted again and then transferred via the PCIe interface to the PC using direct memory access (DMA).
2.3.3. Power Supply
A modular power supply, designed for a 12 V input source, guarantees a sufficient supply for the individual components of each sub-system. The power board integrates four different modules to generate all the voltages required for ongoing acquisition operation. In total, the power supplies of each of the four sub-units provide up to 250 W of power. In addition, there are modules that serve as a basic supply, providing only low currents, but are immediately available after the system is switched on (12 V at max. 2 A and 3.3 V at max. 3 A). A Lattice PLD and an Atmel microcontroller are responsible for control and voltage monitoring, especially during the power-up procedure. The individual voltage supply modules use voltage regulators from the manufacturer Texas Instruments with programmable properties. Thus, up to four different voltages can be generated per module. On two modules for the main supply of the ultrasound platform, the so-called LV modules (LV = low voltage), three different voltages are provided. The first module generates the voltages 1.2 V, 1.5 V and 2.5 V, the second one the voltages 1.0 V, 1.8 V and 3.3 V. All voltages are rated at 40 A each. However, 1.2 V and 1.8 V are double dimensioned due to higher loads. Other modules are responsible for supplying the integrated PC or for providing the positive and negative transmit voltages of +/−100 V.
2.3.4. Transducer Connector Board
In addition to some control lines and status LEDs, this PCB integrates the socket for connecting the ultrasound transducer. The connector chosen is the ITT Cannon ZIF DLM-360, which is widely used in medical technology and is specially designed for the high pin count of ultrasound arrays and for the high transmission voltages of up to +/−100 V. Transducers with other pin-outs can easily be connected to the system via adapters.
2.3.5. Synchronization Scheme
A clock network makes it possible to combine several systems in synchronous operation. Each sub-system is connected to the PC via its own PCIe interface (
Figure 6).
In this way, all 1024 elements of the matrix array can be addressed individually and in parallel without using a multiplexer. The synchronization of the four sub-systems is achieved by a separate synchronization circuit. Here, one system acts as the master and generates the 20 MHz master clock with which all systems are operated. In addition, the Virtex-6 FPGA of the master system generates a coded 1 MHz synchronization signal, which also serves as a trigger for an ultrasound measurement scan. The master clock and synchronization signal are passed on to the individual sub-systems via a distribution circuit. Compared to
Figure 3 in
Section 2.3.1, the on-board generated clock is no longer used to drive the PLLs. Instead, the external clock, which is distributed to the individual systems, is applied via the CLK IN input by selecting it with a switch. The master system itself also receives the two signals again as input to ensure phase equality with the other sub-systems in the sequence control. Other research groups, such as those around the platform ULA-OP, are also pursuing similar approaches [
18]. While our method is based on synchronous distribution of clock and trigger signals with equal signal run lengths, ULA-OP’s synchronization circuit relies on PLLs with a programmable phase shift and on a trigger delay compensation to realize zero delay of the timing signals between the master and slave systems.
2.4. Software
The software architecture that controls the electronics and processes the measured data is based on our latest generation of DiPhAS software tools written in C# and C++. All user interface components, file handling and general logic were developed in C#; hardware device control with the PCI express hardware interface and the GPU-based programming were developed using C++.
The hardware operation mode is configured by the software with a set of parameters that are downloaded to the firmware before measurements are performed continuously. To minimize the motion artefacts by tissue or material movement that will lead to incoherent summation over subsequent pulse-echo measurements, sequences including multiple transmit-receive events can be grouped and measured with the fastest pulse repetition possible. The digitized data of each receive phase is collected in the front-end memory and can be transferred as a large block after the sequence is completed. This is crucial for ultrafast ultrasound imaging.
Even though the system consists of four synchronized electronics units, the software is implemented to control a “synchronized multi DiPhAS” unit that transparently handles the programming of all 1024 channels for the user. The software internally manages which delays to send to which system and combines the measurement data stored in multiple DMA memory blocks (
Figure 7).
The ultrasound electronics firmware does not compute any preprocessing or beamforming reconstruction using the digitized data. The data are transferred to the PC for all processing in terms of a fully “software-defined” ultrasound system. This approach provides all flexibility needed to develop and test any new beamforming and other signal-processing technique in interchangeable software libraries. Depending on the target application and transducer used, this versatile research unit can easily be used by multiple researchers without requiring custom firmware changes to adapt the internal processing pipeline in-between.
As the system is designed to be used for real-time measurements and imaging, the data acquisition of the new digitized ultrafast sequence data and processing of previously transferred data is performed in parallel. Depending on the real-time processing used, the data acquisition needs to wait before performing the next measurements. This ensures a continuous streaming operation mode without recording limit while optimizing the resulting imaging frame rate.
For performant computations, we developed OpenCL-based kernels for signal conditioning, receive beamforming computation, signal post-processing and scan conversion. The implemented beamforming reconstruction includes classical delay and sum techniques and adaptive beamforming techniques, such as sign coherence beamforming [
19] as well as standard deviation calculations [
20]. Apart from these rather computationally fast adaptive beamforming implementations to be used in real time, we enabled more complex approaches that cannot be computed online to be reconstructed after the measurements are stored to disc.
An open source file format called “open research binary (ORB)” was developed and implemented for storing high-frequency ultrasound data from research devices. This is a modular and extendable file format, including all device-specific hardware parameters, transmit parameters for all sequences and the received single-element raw channel data. If any processing, including the receive beamforming reconstruction or volumetric scan conversion for image generation, is applied to the measured data during real-time operation, these data are also added to the file container to be saved. Based on existing import routines for C#, C++, MATLAB and Python, the processing pipeline can be applied again on the stored data sets and optimized processing and analysis steps can be performed in the future as well with new implementations. In the future, we will also include support for the “Ultrasound File Format” proposed by ultrasound beamforming researchers in 2018 [
21].
The visualization of the measured data is shown in real time on the user interface. Plots are showing the A-Scans (received amplitude over time) of individual transducer element receive signals or a single reconstructed beam after beamforming. The unreconstructed single element channel data can be viewed as a wide B-Scan showing the brightness of the received echoes as a 2D image. After spatial reconstruction with the selected beamforming, the resulting volume data are visualized using a custom orthogonal slice display and direct volume rendering techniques. The orthogonal slice display shows three images that correspond to single slices of the volume in the XZ, YZ and XY orientation. These views are linked and can be navigated by the user to browse through the insights of the data set. Furthermore, a volumetric rendering of the whole data set is computed on the GPU using direct volume rendering techniques programmed in OpenCL. Using ultrasound imaging-specific visualization parameters, such as echogenicity-to-density mapping of the voxel data and multiple channels for (flow- or stiffness) overlay data on top of the B-Mode grey levels, this rendering technique is still able to compute in real time at more than 60 fps.
A cine loop buffer continuously stores incoming and processed ultrasound data during normal operation in the PC system RAM. As soon as the user pauses the acquisition process, the data from the cine loop can be played back and reprocessed by the processing pipeline.
The ultrasound device operating software includes these tools for live operation, including volume reconstruction and interactive imaging for predefined beamforming techniques, such as plane wave imaging and diverging waves with virtual point sources, and additional software tools for individual beamforming measurements using custom precomputed transmit delay sets and storage of the received single-element channel data. On top of these standard components, the unit can be programmed using multiple software development kits for the programming languages C#, C++, MATLAB and Python. As our main ultrasound system software core was developed in Microsoft .NET, the C# SDK uses these components directly. In the same way, the integration in MATLAB and Python (IronPython) uses our libraries using their own integrated common language interface (CLI) implementation directly. The often-used C++ SDK was implemented using a C++/CLI library that wraps the managed object of .NET to the native environment of C++. Both the MATLAB and C++ interfaces encapsulate the integration and usage of the .NET components without exposing these language-specific elements to the user.
2.5. System Integration
One goal in the development was to integrate the 1024-channel system into a standard 19-inch rack case (
Figure 8) that can be used as a single ultrasound system even for clinical studies after normative testing regarding electrical safety and electromagnetic compatibility. The mobile rack system on wheels is equipped with telescopic levels in a way that the individual sub-systems and the PC can be easily accessed and maintained. Due to the movable concept, the system can be transported very comfortably from one room to another. In total, the cart has a height of 800 mm.
The housing is divided into 3 levels. The PC with a height of 4U is integrated in the lower third. The upper two levels contain the ultrasound electronics with the master and the three slave sub-systems. All necessary connections between the individual sub-systems, such as the PCIe interfaces and the synchronization circuitry, are routed in the case and do not have to be made externally. Only the interfaces relevant for the user are located in the front area of the system. These include the four ITT Cannon DLM6-360 ports for the transducer, two USB ports for connecting external peripherals to the PC, such as external data storages, and one trigger input and output each. The power button is also located on the front of the system.
The PC installed in the system contains a total of 10 PCIe interfaces (Supermicro X10DRG-O+-CPU, X9DRG-O-PCIE) and integrates a dual Intel Xeon processor (2x E5-2697AV4, 32 cores). In addition to the four PCIe interfaces of the 256-channel ultrasound sub-systems, further PCIe interfaces are used to connect four additional Nvidia graphics cards, which are intended for further signal processing on the GPUs [
22]. In total, the system uses 128 Gbytes of DDR4 memory, which is used for the DMA transfer of the raw ultrasound data, allowing the user to access the single-element channel data of all 1024 channels in parallel.
The required total power of the four sub-systems as well as the integrated PC is ensured by two AC/DC power supplies. On the one hand, a 1000 W power supply with an output voltage of +12 V (DC) is used to supply the ultrasound systems, on the other hand, a 2000 W power supply provides the necessary power for the PC.