Monday, 9 September 2013

Algorithmic Strategy

Price

The logarithmic middle price $x$ is defined as $$ x(t_i) = x(\Delta t,t_i) = \frac{\text{log } p_{bid}(t_i) + \text{log } p_{ask}(t_i) } {2} $$ where $t_i$ is the homogeneous sequence of times regularly spaced by time intervals of size $\Delta t$. The following figure shows $x$ from 10:00 to 10:10 GMT for the case of $\Delta t$ = 1 second.

Return

The return at time $t_i$, $r(t_i)$, is defined as $$ r(t_i) = r(\Delta t, t_i) = x(t_i) - x(t_i - \Delta t) $$ In the normal case, $\Delta t$ is the interval of the homogenous series, and $r(t_i)$ is the series of the first differences of $x(t_i)$. If the return interval is chosen to be a multiple of the series interval, we obtain overlapping intervals.

Realized Volatility

The realized volatility $\upsilon(t_i)$ at time $t_i$ is computed from historical data and it is also called historical volatility. It is defined as $$ \upsilon(t_i) = \upsilon(\Delta t,n,p;t_i) = \left[\frac{1}{n} \sum_{j=1}^n | r(\Delta t; t_{i-n+j}) |^p \right]^{1/p} $$ where $n$ is the number of observations. There are two time intervals, which are the return interval $\Delta t$, and the size of the total sample, $n \Delta t$. The exponent $p$ is often set to 2 so that $upsilon^2$ is the variance of the returns about zero. In many case, a value of $p=1$ is preferred as it is less sensitive to outliers.

In order to compute realized volatility, the return interval, $\Delta t$ and a sample of length $n \Delta t$ need to be chosen. By inserting $\Delta t = 10$ in the above equation, one can compute the volatility of regularly spaced 10-min returns.

Bid-Ask Spread

A suitable variable for research studies is the relative spread $s(t_j)$ $$ s(t_j) = \text{log } p_{ask} (t_j) - \text{log } p_{bid} (t_j) $$ where j is the index of the original inhomogeneous time series. The advantage of $s(t_j)$ over the nominal spread $(p_{ask} -p_{bid})$ is that the latter is in units of the underlying price, whereas the former is dimensionless.

A homogenous time series of spreads $s(t_i)$ generated by interpolation is defined as
$$ s(t_i) = \text{log } p_{ask} (t_i) - \text{log } p_{bid} (t_i) $$ The following figure shows $s(t_i)$ where $t_i$ is the homogeneous sequence of times regularly spaced by time intervals of size 1 second.
 A more suitable alternative is to compute average spreads within time windows and to build a homogenous time series of these average spreads.

Tick frequency

The tick frequency at time $t_i$, $f(t_i)$, is defined as $$ f(t_i) = f(\Delta t;t_i) = \frac{1}{\Delta t} N \{ x(t_j) | t_i - \Delta t < t_j \leq t_i \} $$ where $N\{x(t_j)\}$ is the counting function and $\Delta t$ is the size of the time interval in which ticks are counted. The "log tick frequency", $\text{log } t_i$, has been found to be more relevant. We can also define the average time interval between ticks, which is simply the inverse tick frequency, $f^{-1} (t_i)$.

Volatility Ratio

The volatility ratio, the ratio of two volatilities of different time resolutions: $$ \upsilon_{ratio} = \frac { \upsilon_{ann}(m \Delta t,n,p;t_i) } { \upsilon_{ann}(\Delta t,mn,p;t_i) } $$ A volatility ratio is around 1 for a Brownian motion of x, high if x follows a trend, lower if x has mean-reverting noise. The volatility ratio is thus a toll to detect trending behavior.

Overlapping Returns

Some variables, notably returns, are related to time intervals, not only single time points. When statistically investigating these variables, we need many observations. The number of observations can be increased by choosing overlapping intervals. For returns, a modified version is used: $$ r_i = r(t_i) = x(t_i) - x(t_i - m \Delta t) = x_i - x_{i-m} $$ where $t_i$ is again a regular sequence of time points (for any choice of time scale), separated by intervals of size $\Delta t$. The interval of the return, however, is $m \Delta t$, an integer multiple of the basic interval $\Delta t$. If $r_i$ is considered for every i, we obtain a homogenous series of overlapping returns with an overlap factor m. The corresponding series of non overlapping returns would be $r_m,r_{2m},r_{3m},...$

Cumulative 5 minute returns for the month of Jan


Cumulative 1 minute returns for Jan 3rd 2013


Cumulative 1 second returns from 13:59 to 14:01 on Jan 3rd 2013 (pit open at 14:00)


Goal of strategy: adjust position in a single contract according to predictions made on a 1 second basis using features extracted from multiple contracts. Create feature matrix. Classify examples. Split data: Train - Validate - Test Fit model to Train - Report Train and Validate Error Starting with full model, drop one feature at a time and report train/validate error (which features work). Is the model good at predicting out-of-sample? Yes - backtest strategy No - Run diagnostics - bias/variance problem? Define strategy... hyper parameters control decision making... these can be optimize with a GA? Show equity curve - with and without bid/ask spread

No comments:

Post a Comment