Calculate Mean Squared Displacement

A particle’s mean squared displacement (MSD) tells you the squared distance it moves over time. By analyzing the shape of the MSD curve, we can gain mechanistic insight. The general form of the MSD curve is:

\[\langle x^2(t) \rangle = K_{\alpha}t^{\alpha}\]

where \(K_{\alpha}\) is the generalized diffusion coefficient. In cases were \(\alpha=1\), the MSD is linear and \(K_{\alpha}\) is equal to the standard diffusion coefficient measured for particles undergoing Brownian motion. Albert Einstein was the first to prove that the MSD of a Brownian particle is linear and that its slope is proportional to the diffusion coefficient \(D\):

\[\frac{d MSD}{dt} \propto 2NDt\]

where \(N\) is the number of spatial dimensions over which the MSD was measured. When \(\alpha > 1\) diffusion occurs faster than expected and is termed superdiffusion. When \(\alpha < 1\), diffusion is slower than expected and called subdiffusion.

There are two ways one can calculate MSD. The first and simplest way is the ensemble MSD:

\[\langle x^2(t) \rangle = \langle x(t) - x(0) \rangle^2\]

The ensemble MSD measures the distance traveled from a particle’s initial position. The second way measures and averages the particle’s displacement over all possible time lags, \(\tau\), and is called the time-averaged MSD:

\[\overline{x^2(\tau)} = \dfrac{1}{T - \tau}\int_{0}^{T - \tau} (x(t + \tau) - z(t))^2 dt\]

where \(T\) is the length of the trajectory. The time-averaged MSD is more statistically robust than the ensemble average. For ergodic systems, both methods of calculation will yield the same result. However, with less data, the time-averaged MSD will give tighter error bars.

msd.py can calculate both types of MSDs for a particle. If you believe the particle undergoes Brownian motion, then you can also measure the diffusion constant. The script can be run from the command line or its classes can be imported for use in other analysis scripts. Both usages are explained below.

Command Line Interface

Calculate mean squared displacement (MSD) and diffusion coefficientfor a specific residue or set of atoms in a trajectory

usage: msd.py [-h] [-t TRAJECTORY] [-g GRO] [-r RESIDUE]
              [-atoms ATOMS [ATOMS ...]] [-b NBOOT] [-f FRONTFRAC]
              [-F FRACSHOW] [-a AXIS] [-nboot NBOOT] [-ensemble] [-compare]
              [-power_law] [-tails] [-pores] [-pr PORE_RADIUS] [-acf] [-acov]
              [-nofit] [--update] [-wt WT_WATER] [-s SAVENAME] [-begin BEGIN]
              [-end END]

Named Arguments

-t, --trajectory
 

Path to input file

Default: “wiggle.trr”

-g, --gro

Name of .gro coordinate file

Default: “wiggle.gro”

-r, --residue Name of residue whose diffusivity we want
-atoms Name of atoms whose collective diffusivity is desired
-b, --nboot

Number of bootstrap trials to be run

Default: 200

-f, --frontfrac
 

Where to start fitting line on msd curve

Default: 0

-F, --fracshow

Percent of graph to show, also where to stop fitting line during diffusivity calculation

Default: 0.2

-a, --axis

Which axis to compute msd along

Default: “z”

-nboot

Number of bootstrap trials for error estimation

Default: 200

-ensemble, --ensemble
 

Calculate MSD as ensemble average

Default: False

-compare, --compare
 

Compare time-averaged and ensemble-averaged time series

Default: False

-power_law, --power_law
 

Fit MSD to a power law

Default: False

-tails, --tails
 

Only look at transport within tails

Default: False

-pores, --pores
 

Only look at transport within pores

Default: False

-pr, --pore_radius
 

Radius of pores. Anything greater than this distance from the pore center will not be included in calculation

Default: 1.5

-acf, --autocorrelation
 

Plot step autocorrelation function

Default: False

-acov, --autocovariance
 

Plot step autocovariance function

Default: False

-nofit, --nofit
 

Do not attempt to fit any curve to MSD

Default: False

--update

update database with MD MSD values

Default: False

-wt, --wt_water
 

Weight percent water in system. Only needto adjust this if updating database.

Default: 10

-s, --savename

If specified, save the MSD curve in a pickled .pl file of this name.

Default: False

-begin, --begin
 

First frame to analyze

Default: 0

-end, --end

Last frame to analyze

Default: -1

Classes

class msd.Diffusivity(traj, gro, axis, begin=0, end=-1, startfit=0.01, endfit=1, residue=False, atoms=(), restrict=(), solute_indices=())
__init__(traj, gro, axis, begin=0, end=-1, startfit=0.01, endfit=1, residue=False, atoms=(), restrict=(), solute_indices=())

Calculate the mean squared displacement of a selection from an unwrapped coordinate trajectory

Parameters:
  • traj – unwrapped trajectory (i.e. gmx trjconv with -pbc nojump)
  • gro – representative coordinate file
  • axis – axis along which to compute MSD (choices: x, y, z, xy, yz, xz, xyz)
  • startfit – start linear fit to MSD startfit % into trajectory
  • endfit – end linear fit to MSD endfit % into trajectory
  • residue – if specified, the residue whose center of mass MSD will be measured
  • atoms – if specified, group of atoms whose center of mass MSD will be measured
  • restrict – restrict selection to certain indices. For example, if you want to calculate MSD of a certain residue, but only include a fraction of the total residues in the system.
  • solute_indices – different from restrict, this specifies the indices of the solute centers of mass of

which to calculate the MSDs. For example, if there are 24 solutes and you only want the MSD of the first two solutes, you would pass [0, 1].

bootstrap(N, fit_line=True, confidence=68)

Estimate error at each point in the MSD curve using bootstrapping

Parameters:
  • N (int) – number of bootstrap trials
  • fit_line (bool) – Fit a line to the MSD curve
  • confidence (float) – percent confidence interval (out of 100)
bootstrap_power_law(N)

Bootstrap fits of a power law to the MSD

Parameters:N (int) – number of bootstrap trials
calculate(ensemble=False)

Calculate mean squared displacement of trajectory

Parameters:ensemble (bool) – Calculate the ensemble MSD instead of the time-averaged MSD
fit_linear()

Fit a line to the MSD curve. This function will prompt the user to define the linear region as there is no rule of thumb for determining exactly where to fit the MSD curve. Make sure to report what you chose as the linear region.

fit_power_law(y)

Fit power law to MSD curve.

Parameters:y (numpy.ndarray) – y values of MSD curve
Returns:Coefficient and exponent in power low of form [coefficient, power]
plot(fracshow=0.5, save=False, savename='diffusivity.pdf', savedata=False, show=False)

Plot the MSD curve

Parameters:
  • fracshow (float) – fraction of MSD curve to show on plot
  • save (bool) – save plot
  • savename (str) – name under which to save plot (including file extension)
  • savedata (bool) – save MSD curve data in a numpy compressed .npz file
  • show (bool) – show plot when done
plot_autocorrelation(show=True)

Plot autocorrelation function

Parameters:show (bool) – show plot
Returns:
restrict_to_pore(r, dwell_fraction=0.95, tails=False, build_monomer='NAcarb11V', spline=False, buffer=0, npores=4)

Restrict calculations to center of masses (COMs) that primarily stay in the pore OR tail region

Parameters:
  • r (float) – radius of pore. Anything greater than r from the pore center is considered the tail region
  • dwell_fraction – Fraction of time spent in region of interest required in order to keep trajectory
  • tails (bool) – if True, then restrict calculations to COMs primarily in the tail region
  • build_monomer (str) – monomer coordinate file of which liquid crystal membrane is mode
  • spline (bool) – track pore centers with a 3D spline
  • buffer (float) – Do not count molecules below _buffer_ or above z-box-vector - _buffer_
stationarity(axis=2)

Determine if increments (resulting from first-order differencing) are stationary using the Augmented Dickey Fuller test. The null hypothesis of the ADF test is that the time series has a unit root, i.e. it is non-stationary. If the returned p-value is below some threshold (often 0.05), then one can reject the null hypothesis and claim stationarity. The p-value is calculated from the ADF statistic. A more negative ADF statistic indicates a greater chance of rejecting the null hypothesis.

Parameters:axis (int) – axis along which to measure increments. (x = 0, y = 1, z = 2)
Returns:p value for each trajectory
Return type:numpy.ndarray
step_autocorrelation()

Calculate autocorrelation of step length in the direction of axis.