Architecture

This file explains

Config Files

Language: - JSON

Implementation

Justification:

Universally Recognized and is supported well in Python and Go
Why not YAML?
- YAML is loosely type compared to JSON
- JSONs are easier to parse in Python and Go

Drawbacks

Cannot comment in JSON files to take notes
- Could create a section not found in config/demo.json
- The program will never read additional sections added by the user outside of required sections

Language: - Python

Technologies:

External API(s): - YFinance

Implementation

Justification

Python
- Mature ecosystem for Data and ML libraries
Pandas
- More intuitive to use than 2D NumPy arrays
- Although slower than NumPy, runtime speed at training time is not critical
Rich
- Used for improved UX on the CLI
TA-lib
- Mature library to compute technical indicators
- Why write these methods again from scratch if this supports >150 technicals?
Scikit-learn
- Offers a wide selection of models to users
- Builds upon the modularity and flexibility of the platform
YFinance
- Returns back as DataFrames for historical stock data
- Reduces extra code for DataFrame conversions

Drawbacks

YFinance
- The lowest period of data we can get back is 1 minute
- Meaning all ML models at the minimum can only be trained on 1 minute data
- This puts an implied floor of 60 second cycles for the execution engine since
- More frequent predictions made below 60 seconds leads to potentially degraded performance

Language: - Python

Technologies

External API(s):

Implementation

Borrows DataFrame fitting method from the ML Pipeline
Walks a ML model down features, making predictions, and marking buy and sell signals
Returns final P/L

Justification

See ML Pipeline
Same DataFrame fitting method
- By using the same method to fit DataFrames at train time and validation time ensures that a DataFrame is not made different from the ones it was trained on

Drawbacks

Current implementation only shows
- Final P/L
- This is fine for the MVP but for true performance analysis more data is needed

Languages:

Technologies:

External APIs:

Implementation

There two pieces used during live execution
- Runtime Engine
- ML API
Runtime Engine
- Using the config file, manages features for declared ticker(s)
- Uses TA-lib for technical indicator initialization
ML API
- Hosts the ML model and is parallelized to handle n amount of inferences at once
- 1:1 ratio of workers spawned per ticker on server startup
Combined
- Both communicate via websockets asynchronously
- Go send computed features to the server
- ML inference is ran and returns an inference value
- Go communicates with Alpaca to place orders

Justification

Language Split
- Concurrency is harder in Python with GIL, Go makes it performant compilation
- Splitting between Python and Go establishes clear separation of concerns
- Scikit-learn models cannot be easily used in Go
TA-lib
- Use established library to initialize technical indicator values
FastAPI
- Allows for parallelized ML inference by dynamically adjust workers on the server to each ticker being tracker (1:1)
Gorilla Websocket
- Rather than use HTTP REST to communicate with the FastAPI server we use websockets
- Since we constantly need to send and get updates from the server websockets reduce latency and connection overhead Alpaca
- Chosen as the exclusive broker due to ease of integration and extensive API support for trading

Drawbacks

Language Split
- Introduces more complexity as the project must maintain two different languages
FastAPI and Gorilla WebSocket
- Introduces latency to getting ML inferences
- Can be overcome by non-Scikit-learn models by using ONXX