Architecture
This file explains
- Implementation
- Technology choices & justification
Config Files
Language:
- JSON
Implementation
-
Contains
- Features
- Labelling logic
- ML model settings
- Execution settings
- Training tickers
- Live trading tickers
-
Acts as a Single Source of Truth (SSOT)
-
Preventing feature misalignment
- Keeping every service uniform
- All services when called by the user should be given the same file
Justification:
- Universally Recognized and is supported well in
PythonandGo - Why not
YAML?YAMLis loosely type compared toJSONJSONs are easier to parse inPythonandGo
Drawbacks
- Cannot comment in
JSONfiles to take notes- Could create a section not found in
config/demo.json - The program will never read additional sections added by the user outside of required sections
- Could create a section not found in
ML Training Pipeline (V4)
Language:
- Python
Technologies:
PandasNumPyScikit-learnTA-libRich
External API(s):
- YFinance
Implementation
- Reads features, labelling logic, and ML settings
- Constructs a
Pandas DataFrameusing config file settings - Trains and dumps the model
Justification
Python- Mature ecosystem for Data and ML libraries
Pandas- More intuitive to use than 2D
NumPyarrays - Although slower than
NumPy, runtime speed at training time is not critical
- More intuitive to use than 2D
Rich- Used for improved UX on the CLI
TA-lib- Mature library to compute technical indicators
- Why write these methods again from scratch if this supports >150 technicals?
Scikit-learn- Offers a wide selection of models to users
- Builds upon the modularity and flexibility of the platform
YFinance- Returns back as
DataFramesfor historical stock data - Reduces extra code for
DataFrameconversions
- Returns back as
Drawbacks
YFinance- The lowest period of data we can get back is 1 minute
- Meaning all ML models at the minimum can only be trained on 1 minute data
- This puts an implied floor of 60 second cycles for the execution engine since
- More frequent predictions made below 60 seconds leads to potentially degraded performance
Backtesting (V1)
Language:
- Python
Technologies
PandasNumPyScikit-learnTA-libRich
External API(s):
YFinance
Implementation
- Borrows
DataFramefitting method from theML Pipeline - Walks a ML model down features, making predictions, and marking buy and sell signals
- Returns final P/L
Justification
- See ML Pipeline
- Same
DataFramefitting method- By using the same method to fit
DataFramesat train time and validation time ensures that aDataFrameis not made different from the ones it was trained on
- By using the same method to fit
Drawbacks
- Current implementation only shows
- Final P/L
- This is fine for the MVP but for true performance analysis more data is needed
Runtime Engine (V4)
Languages:
GoPython
Technologies:
FastAPIGorilla WebSocketTA-Lib
External APIs:
Alpaca
Implementation
- There two pieces used during live execution
- Runtime Engine
- ML API
- Runtime Engine
- Using the config file, manages features for declared ticker(s)
- Uses
TA-libfor technical indicator initialization
- ML API
- Hosts the ML model and is parallelized to handle n amount of inferences at once
- 1:1 ratio of workers spawned per ticker on server startup
- Combined
- Both communicate via websockets asynchronously
Gosend computed features to the server- ML inference is ran and returns an inference value
Gocommunicates withAlpacato place orders
Justification
- Language Split
- Concurrency is harder in
Pythonwith GIL,Gomakes it performant compilation - Splitting between
PythonandGoestablishes clear separation of concerns Scikit-learnmodels cannot be easily used inGo
- Concurrency is harder in
TA-lib- Use established library to initialize technical indicator values
FastAPI- Allows for parallelized ML inference by dynamically adjust workers on the server to each ticker being tracker (1:1)
Gorilla Websocket- Rather than use
HTTP RESTto communicate with theFastAPIserver we use websockets - Since we constantly need to send and get updates from the server websockets reduce latency and connection overhead
Alpaca - Chosen as the exclusive broker due to ease of integration and extensive API support for trading
- Rather than use
Drawbacks
- Language Split
- Introduces more complexity as the project must maintain two different languages
FastAPIandGorilla WebSocket- Introduces latency to getting ML inferences
- Can be overcome by non-
Scikit-learnmodels by usingONXX