Time Series Analysis

from LLC_Membranes.llclib import timeseries

Commonly used functions for working with time series.

Classes

class timeseries.VectorAutoRegression(timeseries, r)
__init__(timeseries, r)

Fit a vector autogressive (VAR) process to data using statsmodels.tsa.vector_ar. The output object is just reduction and renaming of attributes produced after running the fit() method of the VAR class

For more detailed docs, see: https://www.statsmodels.org/dev/vector_ar.html#module-statsmodels.tsa.vector_ar

For a multidimensional time series, one could write a system of dependent autoregressive equations:

\[Y_t = A_1*Y_{t-1} + ... + A_p*Y_{t-p} + u_t\]

where

\[\begin{split}Y_t = \begin{bmatrix} y_{1,t} \\ y_{2,t} \\ ... \\ y_{k,t} \end{bmatrix}, Y_{t-1} = \begin{bmatrix} y_{1,t-1} \\ y_{2,t-1} \\ ... \\ y_{k,t-1} \end{bmatrix}, ...\end{split}\]

The matrices \(A_i\) are K x K matrices where K is the number of dimensions of the trajectory. \(A_1\) contains the 1st time lag autoregressive coefficients. If

\[\begin{split}A_1 = \begin{bmatrix} 0.5 & 0 \\ 0 & 0.4 \end{bmatrix}\end{split}\]

the associated system of equations for a VAR(1) process would be:

\[ \begin{align}\begin{aligned}y_{1,t} = 0.5y_{1,t-1} + u_{1,t}\\y_{2,t} = 0.4y_{2, t-1} + u_{2,t}\end{aligned}\end{align} \]

Of course, adding cross-terms to A would create more complex dynamical behavior

\(u_t\) is a K-dimensional vector multivariate gaussian noise generated on the covariance matrix of the data

Parameters:
  • timeseries (numpy.ndarray) – a T x K matrix where T is the number of observations and K is the number of variables/dimension
  • r (int) – autoregressive order. Number of past point on which current point depends

Functions

timeseries.acf_slow(d)

Calculate the autocorrelation function of a time series. This speed of this method is O(n^2)

Parameters:d – numpy array of length n, with time series values {x1, x2 … xn}
Returns:autocorrelation function
timeseries.acf(t, largest_prime=500, autocov=False)

Quickly calculated the autocorrelation function of a time series, t. This gives the same results as acf_slow() but uses FFTs. This method is faster than numpy.correlate.

Parameters:t – time series array (npoints, nseries)

:param largest_prime : the largest prime factor of array length allowed. The smaller the faster. 1.6M points takes about 5 seconds with largest_prime=1000. Just be aware that you are losing data by truncating. But 5-6 data points isn’t a big deal for large arrays. :param autocov: return autocovariance function insted (which is just the unnormalized autocorrelation)

timeseries.autocov(joint_distribution, varied_length=False)

Calculate the autocovariance function of the joint distribution of multiple realizations of a time series model

See Pag 45 - 46 of Time Series Analysis (1st edition?) by James hamilton

y_t : timeseries values at time t y_t-j : timeseries values at time t - j

covariance_j = E(y_t - mu)(y_t-j - mu)

In words: the covariance at lag j equals the expected value of y_t times y_t-j. They are not necessarily independent so you can’t assume it equals E(y_t)*E(y_t-j)

Parameters:joint_distribution – n x m numpy array with n independent realizations of a time series consisting of m data

points (observations) per realization.

:returns autocovariance of joint distribution as function of lag j

timeseries.msd_straightforward(x, axis)

Straightforward way to calculte msd. Gives same answer as msd() :param x: positions of centers of mass of all particles for each frame, numpy array [nframes, natoms, dim] :param ndx: list of indices to include in msd calculation (x = 0, y = 1, z = 2)

Returns:Average MSD and individual particle MSDs
timeseries.msd(x, axis, ensemble=False, nt=1)

Calculate mean square displacement based on particle positions

Parameters:
  • x (ndarray (n_frames, n_particles, 3)) – particle positions
  • axis (int or list of ints) – axis along which you want MSD (0, 1, 2, [0, 1], [0, 2], [1, 2], [0, 1, 2])
  • ensemble (bool) – if True, calculate the ensemble MSD instead of the time-averaged MSD
Returns:

MSD of each particle

timeseries.bootstrap_msd(msds, N, confidence=68, median=False)

Estimate error at each point in the MSD curve using bootstrapping

Parameters:
  • msds (np.ndarray) – mean squared discplacements to sample
  • N (int) – number of bootstrap trials
  • confidence (float) – percentile for error calculation
timeseries.step_autocorrelation(trajectories, axis=0)

Calculate autocorrelation of step length and direction

Parameters:
  • trajectories (numpy.ndarray) – array of position vs time (n_frames, n_particles, n_dimensions)
  • axis (int or list) – axis along which to calculate step lengths ({x:0, y:1, z:2})
timeseries.correlograms(zt)

Plot correlograms of (z - zmean), (z - zmean)^2, (z - zmean)^3, (z - zmean)^4 :param zt: timeseries of probability integral transforms

timeseries.switch_points(sequence)

Determine points in discrete state time series where switches between states occurs. NOTE: includes first and last point of time series

Parameters:sequence (list or numpy.ndarray) – series of discrete states
Returns:list of indices where swithces between states occur
Return type:numpy.ndarray
timeseries.calculate_moving_average(series, n)

Calculate moving average of a time series

Parameters:n (int) – Number of previous points to average