Calculate Mean Squared Displacement¶

A particle’s mean squared displacement (MSD) tells you the squared distance it moves over time. By analyzing the shape of the MSD curve, we can gain mechanistic insight. The general form of the MSD curve is:

\[\langle x^2(t) \rangle = K_{\alpha}t^{\alpha}\]

where \(K_{\alpha}\) is the generalized diffusion coefficient. In cases were \(\alpha=1\), the MSD is linear and \(K_{\alpha}\) is equal to the standard diffusion coefficient measured for particles undergoing Brownian motion. Albert Einstein was the first to prove that the MSD of a Brownian particle is linear and that its slope is proportional to the diffusion coefficient \(D\):

\[\frac{d MSD}{dt} \propto 2NDt\]

where \(N\) is the number of spatial dimensions over which the MSD was measured. When \(\alpha > 1\) diffusion occurs faster than expected and is termed superdiffusion. When \(\alpha < 1\), diffusion is slower than expected and called subdiffusion.

There are two ways one can calculate MSD. The first and simplest way is the ensemble MSD:

\[\langle x^2(t) \rangle = \langle x(t) - x(0) \rangle^2\]

The ensemble MSD measures the distance traveled from a particle’s initial position. The second way measures and averages the particle’s displacement over all possible time lags, \(\tau\), and is called the time-averaged MSD:

\[\overline{x^2(\tau)} = \dfrac{1}{T - \tau}\int_{0}^{T - \tau} (x(t + \tau) - z(t))^2 dt\]

where \(T\) is the length of the trajectory. The time-averaged MSD is more statistically robust than the ensemble average. For ergodic systems, both methods of calculation will yield the same result. However, with less data, the time-averaged MSD will give tighter error bars.

msd.py can calculate both types of MSDs for a particle. If you believe the particle undergoes Brownian motion, then you can also measure the diffusion constant. The script can be run from the command line or its classes can be imported for use in other analysis scripts. Both usages are explained below.

Command Line Interface¶

Calculate mean squared displacement (MSD) and diffusion coefficientfor a specific residue or set of atoms in a trajectory

usage: msd.py [-h] [-t TRAJECTORY] [-g GRO] [-r RESIDUE]
              [-atoms ATOMS [ATOMS ...]] [-b NBOOT] [-f FRONTFRAC]
              [-F FRACSHOW] [-a AXIS] [-nboot NBOOT] [-ensemble] [-compare]
              [-power_law] [-tails] [-pores] [-pr PORE_RADIUS] [-acf] [-acov]
              [-nofit] [--update] [-wt WT_WATER] [-s SAVENAME] [-begin BEGIN]
              [-end END]

Named Arguments¶

`-t, --trajectory`
	Path to input file Default: “wiggle.trr”
`-g, --gro`	Name of .gro coordinate file Default: “wiggle.gro”
`-r, --residue`	Name of residue whose diffusivity we want
`-atoms`	Name of atoms whose collective diffusivity is desired
`-b, --nboot`	Number of bootstrap trials to be run Default: 200
`-f, --frontfrac`
	Where to start fitting line on msd curve Default: 0
`-F, --fracshow`	Percent of graph to show, also where to stop fitting line during diffusivity calculation Default: 0.2
`-a, --axis`	Which axis to compute msd along Default: “z”
`-nboot`	Number of bootstrap trials for error estimation Default: 200
`-ensemble, --ensemble`
	Calculate MSD as ensemble average Default: False
`-compare, --compare`
	Compare time-averaged and ensemble-averaged time series Default: False
`-power_law, --power_law`
	Fit MSD to a power law Default: False
`-tails, --tails`
	Only look at transport within tails Default: False
`-pores, --pores`
	Only look at transport within pores Default: False
`-pr, --pore_radius`
	Radius of pores. Anything greater than this distance from the pore center will not be included in calculation Default: 1.5
`-acf, --autocorrelation`
	Plot step autocorrelation function Default: False
`-acov, --autocovariance`
	Plot step autocovariance function Default: False
`-nofit, --nofit`
	Do not attempt to fit any curve to MSD Default: False
`--update`	update database with MD MSD values Default: False
`-wt, --wt_water`
	Weight percent water in system. Only needto adjust this if updating database. Default: 10
`-s, --savename`	If specified, save the MSD curve in a pickled .pl file of this name. Default: False
`-begin, --begin`
	First frame to analyze Default: 0
`-end, --end`	Last frame to analyze Default: -1

Classes¶

class msd.Diffusivity(traj, gro, axis, begin=0, end=-1, startfit=0.01, endfit=1, residue=False, atoms=(), restrict=(), solute_indices=())¶

__init__(traj, gro, axis, begin=0, end=-1, startfit=0.01, endfit=1, residue=False, atoms=(), restrict=(), solute_indices=())¶

Calculate the mean squared displacement of a selection from an unwrapped coordinate trajectory

Parameters:

traj – unwrapped trajectory (i.e. gmx trjconv with -pbc nojump)
gro – representative coordinate file
axis – axis along which to compute MSD (choices: x, y, z, xy, yz, xz, xyz)
startfit – start linear fit to MSD startfit % into trajectory
endfit – end linear fit to MSD endfit % into trajectory
residue – if specified, the residue whose center of mass MSD will be measured
atoms – if specified, group of atoms whose center of mass MSD will be measured
restrict – restrict selection to certain indices. For example, if you want to calculate MSD of a certain residue, but only include a fraction of the total residues in the system.
solute_indices – different from restrict, this specifies the indices of the solute centers of mass of

which to calculate the MSDs. For example, if there are 24 solutes and you only want the MSD of the first two solutes, you would pass [0, 1].

bootstrap(N, fit_line=True, confidence=68)¶

Estimate error at each point in the MSD curve using bootstrapping

Parameters:	N (int) – number of bootstrap trials fit_line (bool) – Fit a line to the MSD curve confidence (float) – percent confidence interval (out of 100)

bootstrap_power_law(N)¶

Bootstrap fits of a power law to the MSD

Parameters:	N (int) – number of bootstrap trials

calculate(ensemble=False)¶

Calculate mean squared displacement of trajectory

Parameters:	ensemble (bool) – Calculate the ensemble MSD instead of the time-averaged MSD

fit_linear()¶: Fit a line to the MSD curve. This function will prompt the user to define the linear region as there is no rule of thumb for determining exactly where to fit the MSD curve. Make sure to report what you chose as the linear region.

fit_power_law(y)¶

Fit power law to MSD curve.

Parameters:	y (numpy.ndarray) – y values of MSD curve
Returns:	Coefficient and exponent in power low of form [coefficient, power]

plot(fracshow=0.5, save=False, savename='diffusivity.pdf', savedata=False, show=False)¶

Plot the MSD curve

Parameters:	fracshow (float) – fraction of MSD curve to show on plot save (bool) – save plot savename (str) – name under which to save plot (including file extension) savedata (bool) – save MSD curve data in a numpy compressed .npz file show (bool) – show plot when done

plot_autocorrelation(show=True)¶

Plot autocorrelation function

Parameters:	show (bool) – show plot
Returns:

restrict_to_pore(r, dwell_fraction=0.95, tails=False, build_monomer='NAcarb11V', spline=False, buffer=0, npores=4)¶

Restrict calculations to center of masses (COMs) that primarily stay in the pore OR tail region

Parameters:

r (float) – radius of pore. Anything greater than r from the pore center is considered the tail region
dwell_fraction – Fraction of time spent in region of interest required in order to keep trajectory
tails (bool) – if True, then restrict calculations to COMs primarily in the tail region
build_monomer (str) – monomer coordinate file of which liquid crystal membrane is mode
spline (bool) – track pore centers with a 3D spline
buffer (float) – Do not count molecules below _buffer_ or above z-box-vector - _buffer_

stationarity(axis=2)¶

Determine if increments (resulting from first-order differencing) are stationary using the Augmented Dickey Fuller test. The null hypothesis of the ADF test is that the time series has a unit root, i.e. it is non-stationary. If the returned p-value is below some threshold (often 0.05), then one can reject the null hypothesis and claim stationarity. The p-value is calculated from the ADF statistic. A more negative ADF statistic indicates a greater chance of rejecting the null hypothesis.

Parameters:	axis (int) – axis along which to measure increments. (x = 0, y = 1, z = 2)
Returns:	p value for each trajectory
Return type:	numpy.ndarray

step_autocorrelation()¶: Calculate autocorrelation of step length in the direction of axis.