Accelerating development of a single-molecule localisation and tracking technique

MathWorks Australia

By Maged F Serag*
Friday, 29 April, 2016


Accelerating development of a single-molecule localisation and tracking technique

Pioneered more than 30 years ago, single-molecule localisation and tracking (SMLT) is a technique for characterising the motion of individual molecules. By measuring diffusion coefficients and characterising molecular motion as random, directed or constrained, scientists can investigate subcellular dynamics in live cells, including viral infection, gene transcription and the behaviour of receptors on cell surfaces.

Despite its relatively long history and many applications, SMLT has several drawbacks. It does not, for example, tell us the molecule’s shape and size or how these change over time. In addition, SMLT is inefficient and sometimes fails to work due to statistical errors resulting from out-of-focus motion of the molecules.

My research group at King Abdullah University of Science and Technology (KAUST) developed a method for measuring single-molecule diffusion that has none of these limitations. Rather than quantifying diffusion from the spatial and temporal components of the molecule’s trajectory, as in traditional SMLT, our MATLAB-based method quantifies diffusion by analysing the increase of the cumulative area (CA) occupied by the molecule in space over time (Figure 1). We validated our approach with MATLAB by comparing the statistical distribution of diffusion coefficients calculated by traditional SMLT techniques and those calculated with the new CA method. The CA method outperformed traditional SMLT in the reproducible measurement of diffusion dynamics for the DNA molecules we tested.

Figure 1: Increase in the cumulative area occupied by fluorescent nanospheres over 0.48 s (top) and fluorescent dyed DNA over 0.4 seconds (bottom).

The core of our work — image processing and performing fitting and mathematical calculations on microscope images — is done using MATLAB. MATLAB offers three key advantages that make it a good fit for our research. First, it is easy to learn. Even though my background is in pharmacy, not programming, I mastered MATLAB well enough to conduct this research in just one month. It would have taken me six times longer to reach a similar level of mastery in a language like C++ or Java. Second, KAUST has a Total Academic Headcount (TAH) licence, which makes it easy for researchers across KAUST to access MATLAB and the large collection of capabilities and functions in its add-on toolboxes anywhere on campus. Third, the SMLT and CA methods are computationally intensive, requiring hundreds of thousands of Gaussian fittings for a single experiment. Parallel Computing Toolbox and MATLAB Distributed Computing Server enabled me to accelerate these methods and shorten processing times for multiple experiments from days to hours.

Creating image sequences of simulated particles, nanospheres and DNA molecules

Both SMLT and CA methods involve analysing a sequence of image frames, typically captured from a microscope, with one or several molecules visible in each frame. We applied the CA method to characterise the motion of particles and calculate diffusion coefficients in three separate scenarios. The first uses simulated data to create the sequence of images. The second and third use sequences of images obtained using a custom-built wide-field epifluorescence microscope in our lab.

We designed the first scenario to validate the CA method. In MATLAB, we generated random-walk trajectories of particles in 2D space using predetermined diffusion coefficients of 1, 1.5 and 2 µm2/s. For each step on the random walk, the x and y positions of a particle were used to define the centre of a five-pixel cross in a single frame in the image sequence (Figure 2). We then used the CA method to calculate the diffusion coefficient from the simulated particles and verified that the results (1.10, 1.51 and 1.98 µm2/s, respectively) were in agreement with our predetermined values.

Figure 2: Cumulative area for a simulated 2D diffusion trajectory. At 0 s, the simulated particle is in its original position, represented by a five-pixel cross.

For the second and third scenarios, we tracked yellow fluorescent polymer nanospheres about 0.2 µm in diameter and double-stranded DNA molecules of different lengths and topological forms. We captured images of the nanospheres and molecules at a rate of 1 frame per 6.4 ms. We processed these images using both SMLT and CA methods.

Implementing the CA method

Working in MATLAB, we developed an algorithm to implement the CA method (Figure 3). Using the sequences of thousands of 512 x 512 pixel frames generated through simulation or captured in the lab, the algorithm first invokes Image Processing Toolbox functions to remove the background based on an initial threshold. The algorithm calculates this threshold by fitting the frequency distribution of the intensity of all pixels in the frame with a Gaussian function using Curve Fitting Toolbox.

After removing noise pixels from the frame, the algorithm gradually increases the background threshold until just five pixels remain, defining the area of the space occupied by the molecule in that frame.

When all frames in the sequence have been processed, the algorithm superimposes them to generate the cumulative area occupied by the molecule in each frame and then subtracts the cumulative area of adjacent frames to find the cumulative area difference, which is used to calculate the diffusion coefficient.

Figure 3: Sequence of steps in the CA method, including background subtraction, noise removal, superimposition, calculation of cumulative area differences and calculation of the diffusion coefficient.

Accelerating the process with parallel and distributed computing

With a single experiment requiring about 200,000 Gaussian fittings, we soon found that running experiments on a single processor took too long to be practical. To shorten processing times we used Parallel Computing Toolbox to perform the computations on a workstation with multiple cores. Using four cores, experiments took about 3 h; with 16 cores, they took 45 to 50 min.

Of course, we often need to run many simulations and experiments to obtain valid statistical results. To further accelerate the process we began running our jobs on 512 cores at a time on the IT Research Computing clusters at KAUST with MATLAB Distributed Computing Server. These clusters offer more than 10,000 cores to the users. Using this set-up we can complete a set of experiments that took 24 h on a multicore machine in just 15 min.

Visualising and interpreting results

We are currently interpreting the results of our simulations and experiments. With MATLAB we visualise experimental results to better understand how the CA-method is performing compared with the SMLT.

To enable a comparison of the CA-method with traditional SMLT on the same experimental data, we implemented SMLT in MATLAB. Our SMLT algorithm applies 2D Gaussian fittings over the pixels in each frame to determine the position of the molecule’s centre of mass. After repeating this process for each frame, the algorithm connects the centres of mass across frames to create a trajectory and then performs mean squared displacement analysis of the trajectories to characterise the molecule’s motion (Figure 4).

Figure 4: The random movement of a single fluorescent nanosphere in solution. The particle’s movement is tracked using SMLT and MATLAB.

We are using dynamic time warping (DTW) techniques implemented in MATLAB to measure similarities and differences between the SMLT and CA-method results. Early results suggest that the CA-method has a smaller statistical error, in addition to the ability to provide scientists with information on molecular size and frequency of conformational changes.

*********************************************************

Running MATLAB on research computing clusters

By Dr Matthijs van Waveren^

MATLAB Distributed Computing Server enables researchers at KAUST to run their computationally intensive MATLAB programs on the computer clusters maintained and managed by the university’s IT Research Computing group.

To make it easier for researchers to use the clusters, our group worked with MathWorks consultants to develop a high-performance computing (HPC) add-on for MATLAB. Researchers can use this add-on from within the MATLAB environment to execute their scripts on hundreds of workers. The add-on takes care of transferring data files and scripts to the cluster, running the jobs and then transferring the results back to the researcher’s MATLAB environment.

The HPC add-on made it easier for researchers to use clusters for their MATLAB jobs. As a result, demand for cluster time increased dramatically. To meet this demand, we built a virtual cluster using OpenStack and a set of Linux workstations. We then updated the HPC add-on so that users could run their jobs either on one of the original clusters or on the new virtual cluster. While not as fast as the original clusters, the virtual cluster is available for researchers who do not want to wait for their jobs to be scheduled on the original clusters during periods of high demand.

*Dr Maged Serag is a postdoctoral researcher at King Abdullah University of Science and Technology (KAUST). He is working on new single-molecule fluorescence imaging techniques for visualising diffusive and conformational dynamics of DNA simultaneously. Dr Serag holds a PhD in chemistry and biotechnology from Nagoya University in Japan.

^Dr Matthijs van Waveren is a research applications specialist at KAUST with more than 20 years of experience in IT as a software engineer, researcher, supercomputer consultant and marketing coordinator. He wishes to thank Raymond Norris and Amine El Helou of MathWorks for assistance in developing the HPC add-on.

Related Articles

AI can detect COVID and other conditions from chest X-rays

As scientists compare different AI models to improve automated chest X-ray interpretation, a new...

Image integrity best practice: the problem with altering western blots

Image integrity issues are most likely to come from western blots, so researchers and...

Leveraging big data and AI in genomic research

AI has fast become an integral part of our daily lives, and embracing it is essential to the...


  • All content Copyright © 2024 Westwick-Farrow Pty Ltd