building an API to scrape public financial data
I’ve had a side project in mind for a while: A web API that organizes financial data from publicly available SEC filings.
This is a great side project for me because:
- I would use it. Always nice to work on something that would benefit yourself.
- Others would use it (as long as it’s correct). Extracting data from SEC filings is notoriously manual, and the numbers need to be correct 100% of the time. Building an API that produces correct numbers 100% of the time is a challenging technical problem but one where the solution is bound to be useful to people who rely on it.
- The technical challenges that I’ve chosen for this project are novel for me:
- Exposing an API in a new language (in this case, Rust)
- Automating the deployment of AWS infrastructure using Terraform
- Asynchronous jobs to retrieve and normalize the financial data
- Since this is an API-first product, a focus on organizing the data to be client-agnostic, which is a completely novel technical challenge for me.
what I’ve done so far:
So far, here’s what I’ve accomplished:
- Organized a basic Rust project using a variation of the multi-crate structure that folks tend to use for large Rust projects.
- Built a basic working REST API using Rust + Rocket 🚀
- Deployed most of the infrastructure to AWS using Terraform. There are some nuances around Fargate that I’m still working through (setting up environment variables, using AWS Secrets Manager), but overall Terraform has made it an absolute breeze to set up and tear down dozens of resources in AWS within a couple of seconds. Game changer!
- Right now I’m working on a sub-crate that I’m calling
filer-status. The goal of it is to scrape sec.gov for a given entity and store a
booleanin the birb datastore when it discovers whether or not the entity is an active filer or not. Dig into the
READMEfor more details!
- Once that’s done, I will probably revisit getting my infrastructure set up so that I can run
filter-statusin production and work out the process of setting up job/queues/workers in AWS. So pumped to learn about setting up this process as I’ve never done it and I’m curious to optimize for 1. fault tolerance, and 2. monitoring/debugging.
You can see how I’m progressing by visiting the repository on GitHub.