fedecaccia avellaneda-stoikov: Avellaneda-Stoikov HFT market making algorithm implementation

profit and loss

Thus, the avellaneda and stoikov approximates a Q-learning function by outputting for each input state, s, a vector of Q-values, which is equivalent to checking the row for s in a Qs,a matrix to obtain the Q-value for each action from that state. A discount factor (γ) by which future rewards are given less weight than more immediate ones when estimating the value of an action (an action’s value is its relative worth in terms of the maximization of the cumulative reward at termination time). For asymptotic expansions when T is large you should read the paper by Guéant, Lehalle, and Fernandez-Tapia here or the book of Guéant The financial mathematics of market-liquidity. There are many exciting models out there with different approaches, and with HFTs dominating the market-making scene in the last years, there is a lot for our team to explore.

  • It is inversely proportional to the asymmetry between the bid and ask order amount.
  • Figure3 depicts one simulation of the profit and loss function of the market maker at any time t during the trading session in the left panel.
  • For instance, Lee and Jangmin used Q-learning with two pairs of agents cooperating to predict market trends (through two “signal” agents, one on the buy side and one on the sell side) and determine a trading strategy (through a buy “order” agent and a sell “order” agent).

For every day of data the number of ticks occurring in each 5-second interval had positively skewed, long-tailed distributions. The means of these thirty-two distributions ranged from 33 to 110 ticks per 5-second interval, the standard deviations from 21 to 67, the minimums ran from 0 to 20, the maximums from 233 to 1338, and the skew ranged from 1.0 to 4.4. The prediction DQN receives as input the state-defining features, with their values normalised, and it outputs a value between 0 and 1 for each action. The DQN has two hidden layers, each with 104 neurons, all applying a ReLu activation function. An ε-greedy policy is followed to determine the action to take during the next 5-second window, choosing between exploration , with probability ε, and exploitation , with probability 1-ε.

About this article

Sorry, a shareable link is not currently available for this article. Optimal dealer pricing under transactions and return uncertainty. That is introduced by Avellaneda and Stoikov and handled by quadratic approximation approach.. It is worth mentioning that the trader changes her qualitative behavior depending on the liquidation and penalizing variations of the constants and her positions on inventories as the time approaches to maturity. On the optimal quotes will have just the opposite effect of when k is employed.

The results obtained from these tests are discussed in Section 6. The concluding Section 7 summarises the approach and findings, and outlines ideas for model improvement. We have designed a market making agent that relies on the Avellaneda-Stoikov procedure to minimize inventory risk. The agent can also skew the bid and ask prices output by the Avellaneda-Stoikov procedure, tweaking them and, by so doing, potentially counteract the limitations of a static Avellaneda-Stoikov model by reacting to local market conditions. The agent learns to adapt its risk aversion and skew its bid and ask prices under varying market behaviour through reinforcement learning using two variants (Alpha-AS-1 and Alpha-AS-2) of a double DQN architecture.

Setting up Hummingbot

More recently, Baldacci et al. have studied the optimal control problem for an option market maker with Heston model in an underlying asset using the vega approximation for the portfolio. For more developments in optimal market making literature, we refer the reader to Guéant , Ahuja et al. , Cartea et al. , Guéant and Lehalle , Nyström and Guéant et al. . Indeed, this result is particularly noteworthy as the Avellaneda-Stoikov method sets as its goal precisely to minimize the inventory risk. Nevertheless, the flexibility that the Alpha-AS models are given to move and stretch the bid and ask price spread entails that the Alpha-AS models can, and sometimes do, operate locally with higher risk. Overall performance is more meaningfully obtained from the other indicators (Sharpe, Sortino and P&L-to-MAP), which show that, at the end of the day, the Alpha-AS models’ strategy pays off. Nevertheless, it is still interesting to note that AS-Gen performs much better on this indicator than on the others, relative to the Alpha-AS models.


The mean and the median of the Sortino ratio were better for both Alpha-AS models than for the Gen-AS model , and for the latter it was significantly better than for the two non-AS baselines. The Sharpe ratio is a measure of mean returns that penalises their volatility. Table 2 shows that one or the other of the two Alpha-AS models achieved better Sharpe ratios, that is, better risk-adjusted returns, than all three baseline models on 24 (12+12) of the 30 test days. Furthermore, on 9 of the 12 days for which Alpha-AS-1 had the best Sharpe ratio, Alpha-AS-2 had the second best; conversely, there are 11 instances of Alpha-AS-1 performing second best after Alpha-AS-2. Thus, the Alpha-AS models came 1st and 2nd on 20 out of the 30 test days (67%).

[Level 1] Basic Concepts of Crypto Trading

With the risk aversion parameter, you tell the bot how much inventory risk you want to take. A value close to 1 will indicate that you don’t want to take too much inventory risk, and hummingbot will “push” the reservation price more to reach the inventory target. Inventory Risk Aversion is a quantity between 0 and 1 to measure the compromise between mitigation of inventory risk and profitability.

This will set “boundaries” to the calculated optimal spread, so hummingbot will never create your orders with a spread smaller than the minimum nor bigger than the maximum. The reasoning behind this parameter is that, as the trading session is getting close to an end, the market maker wants to have an inventory position similar to when the one he had when the trading session started. Another feature of the model that you can notice in the above picture is that the reservation price is below the market mid-price in the first half of the graphic. The second part of the model is about finding the optimal position the market maker orders should be on the order book to increase profitability. If γ value is close to zero, the reservation price will be very close to the market mid-price.

For instance, Lee and Jangmin used Q-learning with two pairs of agents cooperating to predict market trends (through two “signal” agents, one on the buy side and one on the sell side) and determine a trading strategy (through a buy “order” agent and a sell “order” agent). RL has also been used to dose buying and selling optimally, in order to reduce the market impact of high-volume trades which would damage the trader’s returns . This consideration makes rb and ra reasonable reference prices around which to construct the market maker’s spread. Avellaneda and Stoikov define rb and ra, however, for a passive agent with no orders in the limit order book. In practice, as Avellaneda and Stoikov did in their original paper, when an agent is running and placing orders both rb and ra ra are approximated by the average of the two, r .

  • The more specific context of market making has its own peculiarities.
  • Conversely, the gains may also be greater, a benefit which is indeed reflected unequivocally in the results obtained for the P&L-to-MAP performance indicator.
  • With these values, the AS model will determine the next reservation price and spread to use for the following orders.
  • By our numerical results, we deduce that the jump effects and comparative statistics metrics provide us with the information for the traders to gain expected profits.

The https://www.beaxy.com/ for the random forest classifier is simply the sign of the difference in mid-prices at the start and the end of each 5-second timestep. That is, classification is based on whether the mid-price went up or down in each timestep. The Q-value iteration algorithm assumes that both the transition probability matrix and the reward matrix are known. Hasselt, Guez and Silver developed an algorithm they called double DQN. Double DQN is a deep RL approach, more specifically deep Q-learning, that relies on two neural networks, as we shall see shortly (in Section 4.1.7). In this paper we present a double DQN applied to the market-making decision process.

Finally, we demonstrate the significance of this novel system in multiple experiments. The two most important features for all three methods are the latest bid and ask quantities in the orderbook , followed closely by the bid and ask quantities immediately prior to the latest orderbook update and the latest best ask and bid prices . There is a general predominance of features corresponding to the latest orderbook movements (i.e., those denominated with low numerals, primarily 0 and 1). This may be a consequence of the markedly stochastic nature of market behaviour, which tends to limit the predictive power of any feature to proximate market movements. Nevertheless, the prices 4 and 8 orderbook movements prior the action setting instant also make fairly a strong appearance in the importance indicator lists , suggesting ETC the existence of slightly longer-term predictive component that may be tapped into profitably.

Market-making by a foreign exchange dealer – Risk.net

Market-making by a foreign exchange dealer.

Posted: Wed, 10 Aug 2022 07:00:00 GMT [source]

There is a lot of mathematical detail on the paper explaining how they arrive at this factor by assuming exponential arrival rates. There are many different models around with varying methodologies on how to calculate the value. The model was created before Satoshi Nakamoto mined the first Bitcoin block, before the creation of trading markets that are open 24/7. On Hummingbot, the value of q is calculated based on the target inventory percentage you are aiming for.


To overcome this limitation and to reduce the expert’s subjectivity, in this study an adaptive membership function based on CUB model is suggested to pre-transform Likert-type variables into fuzzy numbers before the adoption of a clustering algorithm. After a theoretical presentation of the method, an application using real data will be presented to demonstrate how the method works. The performance results for the 30 days of testing of the two Alpha-AS models against the three baseline models are shown in Tables 2–5. All ratios are computed from Close P&L returns (Section 4.1.6), except P&L-to-MAP, for which the open P&L is used. Figures in bold are the best values among the five models for the corresponding test days.


The avellaneda and stoikov will not place any orders if you do not have sufficient balance on either side of the order. We aim to teach new users the basics of market-making while enabling experienced users to exercise more control over how their bots behave. By default, when you run create, we ask you to enter the basic parameters needed for a market-making bot. Continuous-time stochastic control and optimization with financial applications. Risk metrics and fine tuning of high frequency trading strategies.


From this point, the RL agent can gradually diverge as it learns by operating in the changing market. Tables 2 to 5 show performance results over 30 days of test data, by indicator (2. Sharpe ratio; 3. Sortino ratio; 4. Max DD; 5. P&L-to-MAP), for the two baseline models , the Avellaneda-Stoikov model with genetically optimised parameters (AS-Gen) and the two Alpha-AS models. In its beginner mode, the user will be asked to enter min and max spread limits, and it’s aversion to inventory risk scaled from 0 to 1 . Additionally, sensitivity to volatility changes will be included with a particular parameter vol_to_spread_multiplier, to modify spreads in big volatility scenarios.

In Section 2, we introduce some basic concepts and describe the input LOB datasets. The resulting Gen-AS model, two non-AS baselines (based on Gašperov ) and the two Alpha-AS model variants were run with the rest of the dataset, from 9th December 2020 to 8th January 2021 , and their performance compared. The dataset used contains the L2 orderbook updates and market trades from the btc-usd (bitcoin–dollar pair), for the period from 7th December 2020 to 8th January 2021, with 12 hours of trading data recorded for each day.