1. Introduction
Ultrasound imaging is a frequently used method for postoperative monitoring of the urinary bladder. Depending on the surgical context, different clinical conditions that need close monitoring can occur. Post-operative urinary retention (POUR) is a frequent problem for various reasons (e.g., intravesical blood clotting) that can lead to bladder overdistension and needs rapid detection and medical intervention. On the other hand, invasive procedures such as catheterization present significant discomfort for patients and can lead to infections or even trauma of the urinary tract. In contrast, ultrasound imaging is fully non-invasive and has already shown its potential for bladder monitoring [
1,
2,
3]. Accurate bladder volumes can be extracted from 3D ultrasound data; however, reliable qualitative information about potential bladder overdistension can already be derived from 2D B-mode (brightness mode) ultrasound images.
In order to efficiently prevent POUR and directly initiate therapeutic measures if an increased amount of urine or blood is detected in the urinary bladder, the bladder should be monitored at frequent intervals, which is not possible in a clinical environment where ultrasound investigations are mostly performed using standard sonography equipment based on hand-held probes. Accordingly, when defining a tool for ideal postoperative follow-up and the prevention of related complications in the 24–48 h period after surgery, different challenges and requirements arise. First, the system must be portable, such that the mobility of the patient is ensured. Second, the probe must be self-adhesive or pad-like (in contrast to hand-held probes that require the presence of a sonographer). Third, (image or signal) data must be automatically analyzed to retrieve diagnostic features that are relevant for the identification of a potential complication (e.g., a bladder volume above a defined threshold in the context of POUR monitoring or specific scattering properties as a result of blood clots in the bladder). In a research context, where the optimal signal and image processing still needs to be defined, this results in a need for RF or even better pre-beamformed channel data access. In particular, the third requirement allows the use of analysis methods beyond pure image-based segmentation and classification. We recently showed in other applications that machine learning approaches can be applied to raw radio-frequent ultrasound data prior to image formation for classification tasks with a high accuracy [
4]. Radio-frequent data with a high dynamic range (16-bit amplitude quantization) and ultrasonic wave phase information at high digitalization rates of up to 50 MHz contain a lot more informational content than scan-converted ultrasound images. During scan conversion, typically more than 90% of the raw ultrasound wave information is lost during image formation and cannot be used in image-based processing.
To the best of our knowledge, there are no systems available that fulfil the above defined requirements. The use of advanced classification approaches is not possible with classical clinical sonography systems, as they do not provide access to radio frequent ultrasound data. Ultrasound systems for research applications such as the Vantage Ultrasound System (Verasonics, Inc. Redmond, WA, USA), the ULA-OP [
5], the systems from the Technical University of Denmark [
6] or the DiPhAS by Fraunhofer IBMT (Sulzbach, Germany) [
7] provide access to this type of data, but are mostly not certified for clinical use, and more importantly, they are complex, bulky and costly devices. The latest generation of point of care ultrasound (POCUS) devices, such as the Butterfly iQ or Vscan [
8] by GE has decreased the costs by an order of magnitude when compared to high-end sonography machines, and can be used in bedside settings due to their miniaturization. However, the availability of care staff still represents a limiting factor when it comes to frequent monitoring postoperatively. Finally, dedicated devices for bladder monitoring such as DFree (Triple W, Tokyo, Japan) or SENS-U [
9] (Novioscan, Nijmegen, The Netherlands) have a particular focus on incontinence management. These systems directly generate bladder filling level-related parameters and do not provide access to the underlying ultrasound signals. Other ultrasound systems optimized for urological applications such as BladderScan (Verathon, WA, USA) measure the bladder volume, but are based on hand-held probes, which limits their suitability for continuous monitoring.
In summary, all these devices optimized for the daily clinical routine (or for home-care settings in the case of DFree or SENS-U) are difficult to utilize in research applications, where customized signal and image processing algorithms need to be applied to the data. In particular, machine-learning based approaches have been shown to have tremendous potential for automated segmentation of ultrasound data [
10], and have been reported in particular for breast imaging [
11], coronary arteries [
12], and thyroid [
13] or different tumors [
14]. In comparison to these applications, where the anatomy is more complex and the contrast difference is reduced, bladder segmentation represents an ideal use case for ML-based approaches due to the low echogenity and the resulting high contrast to surrounding tissue. Multiple ultrasound imaging devices, including mobile ones like Butterfly iQ+ (Butterfly Network Inc, Guilford, CT, USA), already include automated bladder segmentation and volume estimation, but the shape of the hand-held transducer does not allow long-time monitoring as a wearable.
In light of the somewhat contrary requirements of an ideal urinary bladder monitoring system that also provides full data access, and thereby can be flexibly used in research applications, we developed a new portable ultrasound system (mobile ultrasound equipment—MoUsE). Despite being validated in this first application in the context of bladder monitoring, the MoUsE can also be used as a general-purpose ultrasound research system since full access to the transmit and receive pipeline is provided.
2. Materials and Methods
2.1. Portable Multichannel Electronics with Research Interface
The MoUsE is a compact ultrasound system integrated into a 3D printed housing (
Figure 1) with dimensions of 184 mm × 123 mm × 33 mm and a total weight of 610 g, thus ensuring its portability. It is driven by a 12V medical power supply which can be replaced by lithium-ion battery packs for future fully mobile applications. Detailed specifications are given in
Table 1. All system functionalities, including generation of transmit signals, amplification and digitization of receive signals, storage and communication (via USB 3.0) to a PC/tablet controlling the device are implemented on the same main printed circuit board (PCB). Data management, communication and sequence control are handled in the integrated ZYNQ-7 FPGA. An on-board low voltage (LV) power supply generates the required power levels for the logic components.
A compact high voltage (HV) power supply that generates the ± 100 V of transmit voltage for each of the octal (8-channel) transmit receive ICs was implemented on a second PCB mounted on the main PCB. A frequency range of 100 kHz–10 MHz was defined as the transmit bandwidth.
In principle, transmit signals can be freely defined within the limits of the tri-state programmable ICs, for instance, using pulse width modulation (PWM); however, only rectangular bursts with adjustable length and frequency have been implemented in the software so far. The internal system clock of 160 MHz is used for the definition of the transmit signals. Receive signals are digitized with up to 50 MSa/s with a resolution of 12 bit and are transferred as pre-beamformed channel data via USB 3.0 to a PC/tablet for image reconstruction. The receive data can be amplified by up to 44.3 dB with different linear or customized TGC settings. No analog preprocessing is performed on-board beyond bandpass filtering and (optional) data accumulation (corresponding to averaging) for improvement of the signal to noise ratio (SNR). Interfaces for wireless (IEEE 802.11 b/g/n (1 × 1)) communication and the transfer of pre-beamformed channel data are foreseen in the hardware design but not yet implemented. The system uses a sleep mode to switch off the transceiver ICs for stand-by between long-term measurements to reduce power consumption.
2.2. Transducer Design and Manufacturing
The MoUsE can be driven with all kinds of 32-element transducers using the given pinout or via transducer connection adapter. However, in the context of the first application being used for automated bladder monitoring, a 32-element phased array transducer was developed. The transducer properties were defined in a sound field simulation study using the in-house developed sound field simulation software tool SCALP based on point source synthesis (
Figure 2). A pitch of 500 µm with a kerf of 50 µm and element sub-dicing were chosen as a compromise between sensitivity (profiting from larger element size) and beam steering capabilities (decreasing with larger element size). To improve the elevational resolution, a focusing silicon lens was applied to the element of elevational size of 11.5 mm. The array was manufactured from a soft PZT material (3203 HD), the center frequency was adjusted to 3 MHz and two matching layers were applied for improved bandwidth. Connection to the MoUsE electronics was achieved by two 16-core micro-coax cables directly soldered to the customized connector PCB, which was preferred over a solution involving a commercial connector for the sake of compactness. The acoustic block was finally integrated into a 3D printed cylindrical housing of 40 mm in diameter and a height of 17.5 mm. For long-term monitoring applications, a fixation concept involving an acoustically transparent adhesive tape could optionally be used.
2.3. Beamforming and Software
Image reconstruction is performed in real-time using a GPU (OpenCL, Khronos Group, Beaverton, OR, USA)-based implementation of plane wave compounding [
15] approach in the in-house developed clinical style user interface USPilot (
Figure 3). Other reconstruction methods can easily be implemented via an SDK. The number of plane wave angles, as well as the increment can be freely selected by the user. Other transmit parameters such as the frequency, the burst count or the voltage can be adjusted as well. On the receive side, the data sampling rate, averaging factor and TGC can be selected.
The reconstruction can be adjusted in terms of the size and resolution of the reconstruction grid (lateral and axial pixel/sample count), the speed of sound and apodization. Furthermore, customized algorithms (e.g., bandpass filtering or alternative beamforming approaches) can be inserted into the (real-time) reconstruction pipeline. The software allows the visualization of reconstructed (compounded) B-scan images as well as the pre-beamformed channel data (in time or frequency) domain, which makes it ideal not only for clinical research, but also for educational purposes or research on reconstruction algorithms. In addition to controlling the system via the USPilot, an open programming interface (C#/C++/Matlab with SDK) is made available, which provides access to the same transmit, receive and beamforming parameters as in the case of the UI. A custom but open binary data format (*.orb) is chosen for storage of the pre-beamformed and reconstructed ultrasound data. Meta-data such as transmit and receive parameters are stored with the actual ultrasound data by default and import tools for Matlab/Python/C/C++ are made available.
2.4. ML-Based Segmentation Algorithm
We trained a neural network to segment the bladder into abdominal ultrasound images and encountered two main challenges when implementing the network. On the one hand, the limited space and computational resources available at inference time and on the other hand, the quality of abdominal ultrasound images can be very challenging.
Figure 4 shows an example: in the left sub-figure, a (partially filled) bladder appears mainly as a dark region in the image since little sound is reflected by the fluid. In addition, the bladder is only partially imaged and merges seamlessly into the black area outside the ultrasound fan. This situation is usually the case in corpulent patients. As can be seen in the upper part of the segment, weak echoes might occur in cases where the side lobes of the ultrasound beam intersect with the bladder tissue. A very different situation is shown in the middle sub-figure. Here, the (almost empty) bladder is located in the middle of the ultrasound fan. Lastly, the right sub-figure depicts a situation where other anatomical structures, e.g., the pubic bone or the colon, generate a large dark region that might fuse with the bladder. Please note that the appearance of different anatomical parts can be very similar in the images.
We started development using a Mask-R-CNN architecture [
16,
17] to segment the bladder. However, we found that the model size of approximately 0.5 GByte was way too large for the intended purpose. A second drawback was that the network tended to overfit to the data, since only very few images (253) were available for training. We therefore decided to use a U-Net architecture [
18] in a minimal configuration. We set the network up to compute a 2-class segmentation (bladder, non-bladder). The original ultrasound images (1056 × 720) were down-sampled to a resolution of 528 × 352 and reduced to a single color channel. In total, we acquired 253 data sets (each consisting of one B-mode image), which were acquired from 20 human volunteers as training data for the CNN. For both the contracting and expanding paths, we used 5 successive blocks. We started with 6 channels for the first layer and doubled the channel number with each successive layer, resulting in a total of 96 channels at the bottleneck. For expansion we used up-sampling followed by convolution. Training was performed using a batch size of 4 with a learning rate of 0.00002. Using these parameters, the network converged within 400 epochs. The resulting network was less prone to overfitting than the original attempt. We found however that the amount of data was still too low. More importantly, we found that the network had issues in detecting the virtual border of a scan in the image. In particular, if the ultrasound response for a partially imaged bladder was very weak, the resulting segment often extended into the illegal region of the image, i.e., outside the ultrasound fan. Additionally, we often found cases where other dark regions were segmented as bladder.
To this end, we extended the dataset by re-sampling the images so that one of the fan sides coincided with one of the image borders. Furthermore, we flipped images and ground truth on the vertical as well as the horizontal axis. This way we increased the number of images by a factor of 20. Training the network using the augmented data effectively prevented overfitting. Furthermore, and probably much more importantly, the network learned how artifacts and the bladder differ.
For the trained network, we computed an IoU above 0.75 but below 0.9 for all images. We checked segmentations and ground truth and found, interestingly, that the computed segmentations were consistently tighter (smaller) than the ground truth provided by medical experts. The ground truth segmentation was performed by one experienced urologist using the VIA annotation tool [
19]. A second experienced urologist performed the validation of the ground truth. Consulting with the experts revealed that the network only segmented the interior of the bladder while the experts partially included the bladder tissue. This unintended result proved to be beneficial for the application at hand. Since we want to estimate the bladder volume, including the tissue would lead to a systematic error that, in particular, depends on the volume itself.
We are working on a further reduction in the network size. The original network size was 340 MBytes. We were able to reduce its size with various pruning strategies [
20] to under 300 MBytes without significantly sacrificing the quality of the results. This size is still too large to be run efficiently on a mobile device. Additionally, the inference times need to be decreased significantly. Currently, the network does inference on the target device at approximately 6.4 s per frame. Although this would be more than sufficient for a regular check of the bladder volume, the system would not be able to perform any other tasks in the meanwhile. In the use-case of regular checks of the bladder volume and content, such an inference frame rate might still be acceptable for long-time monitoring. The integration of such a model in the processing pipeline will be implemented by supporting the ONNX model format with the C# runtime using Microsoft ML.NET in the future.
4. Discussion
We developed a new portable low-cost ultrasound research system designed for continuous bladder imaging and characterized its (hard- and software) components in first phantom and proband experiments to assess its potential for later use in post-operative bladder monitoring. With dimensions of 18 × 12 × 3 cm3 and a weight of 610 g, the system is compact enough for applications where portability is required. The ultrasound probe was integrated into compact housing (diameter of 40 mm, height of 17.5 mm) and equipped with a self-adhesive foil, which allows long-term use without manual probe positioning. The system was designed, manufactured, assembled and tested in the ultrasound department of Fraunhofer IBMT. In the design process, the focus was set not only on the performance but also on cost efficiency and limiting the total material cost for the electronics to approximately €1000. The system was designed to be as flexible as possible, and therefore it provides full control to the transmit parameters and full access to the receive data pipeline, where receive and beamforming parameters can be selected and custom filters and reconstruction algorithms can be integrated into the real-time pipeline. Full data access to the receive pipeline and in particular real-time availability of the pre-beamformed channel data (up to 300 frames/s in our study) is not provided by clinical sonography systems and makes the system future-proof for other types of applications such as raw radio-frequent signal processing and ML modeling. On the other hand, most research systems are not certified for medical use. Accordingly, the combination of low-cost and the above-described flexibility makes the MoUsE system an ideal tool for research and educational purposes in ultrasound imaging. In order to ease the transfer of new ultrasound imaging approaches into clinics, the technical prerequisites such as data access must be provided and regulatory constraints must be respected as well. For this reason, we performed various tests according to safety standards for medical devices, such as electrical safety, electromagnetic compatibility and acoustic safety. Compliance to these standards was shown and the corresponding test protocols are available; this is of great value when seeking an ethics clearance for exploratory clinical studies.
In order to cover most of the clinical applications of diagnostic ultrasound, we chose a frequency range of 100 kHz–10 MHz as the target specification and validated the bandwidth in our study. The imaging performance of the MoUsE is mainly dependent on the transducer that is used. Our phased array probe with 32 elements represents a compromise between opening angle and sensitivity. A smaller pitch would have been preferred since a larger opening would have resulted, which is crucial for effective plane wave compounding. On the other hand, given the demonstrated image depth of more than 10 cm in the standard CIRS phantom and more than 15 cm in the human abdomen, imaging of the entire bladder would have been difficult to achieve with a smaller aperture size generating less acoustic energy output. The comparison of the image metrics obtained with different beamforming approaches underlines the potential of software-based reconstruction methods, and thereby, the need to have access to pre-beamformed channel data.
Having high-contrast image data is particularly needed when subsequent image processing steps are performed for automated analysis of the data, such as in our first application of bladder segmentation. We demonstrated the general functionality of our CNN for segmentation in abdominal ultrasound images. However, the analysis showed that a high contrast is crucial to prevent segmentation artifacts. This is underlined by the comparison with earlier work on the use of CNNs for bladder segmentation from ultrasound data [
26,
27], where a higher correlation between the automatically determined and the manually segmented bladder volumes was obtained. However, it should be mentioned, that the cited work was based on the use of high-end clinical ultrasound devices, which provide higher contrast, and two orthogonal B-mode images were acquired for obtaining quantitative values for the bladder volume [
26]. The assessment of the actual bladder volume can hardly be achieved with high accuracy using single cross-sectional B-mode images, and therefore it is beyond the scope of the presented work. However, the impact of the lower SNR when compared to ultrasound data acquired with high-end clinical ultrasound machines needs to be closely investigated, particularly since the bladder cross-sectional surface was systematically underestimated. This was due to clutter signals occurring at the bladder border that were recognized as background tissue by the CNN. On the other hand, background tissue was very reliably identified with very few “false positive” areas (background tissue falsely identified as bladder).
Despite the first proof-of-concept, further investigation is needed to enhance the performance of the overall approach. In particular, the network size needs to be improved in order to allow better use on mobile devices with limited computing capabilities. Since the analysis has shown the importance of SNR for the accurate segmentation and the potential of more sophisticated beamforming approaches for contrast improvement, the optimization of image CR and CNR will be the focus of our future work. Furthermore, we will investigate if training the algorithm with more diverse data (different, and in particular, lower contrast levels) will yield higher accuracy. In summary, in applications such as the monitoring of POUR, where a significant or even dramatic and thereby potentially harmful increase in bladder volume can occur, the proposed approach provides sufficient sensitivity. However, for applications where a precise quantitative assessment of the bladder volume is needed, further enhancement of the performance is needed and will be investigated using refined beamforming approaches and improved training of the CNN.
Finally, beyond this first study on bladder monitoring, we will seek to use MoUsE and its unique combination of device mobility, flexibility and data access in other medical ultrasound applications. Although the number of transmit/receive channels is currently limited to 32, a synchronization scheme that combines several MoUsE systems for a higher total channel count is currently under development. A wireless interface is already available on the hardware, but is not implemented in software, which is also a work in progress and would allow easier use in future mobile ultrasound applications. A battery-powered version of MoUsE is in development as well. The possibility of transferring existing classification tasks using machine learning on radio-frequent data in addition to the image-based approach will also be investigated.