Architectural Overhaul of a
Historical Digitization System
My Role
Lead Full Stack Developer, System Designer, Project Manager
Related Skills
KrakenD, Azure Cloud, Docker, FastAPI, Windmill.dev, Postgres, Redis, Apache Solr, OpenStack Swift
The Brief
Since 1978 the Canadiana staff has been digitizing Canadian history for long term preservation and access. In the 70’s this involved scanning documents into microfiche and distributing copies throughout the country. By 2011, they were using high tech scanners, distributed storage systems, a search engine, and a library search website. Scripts written in Perl, were configured to transfer various forms of data created by the librarians and digitization staff, into one single database which acted as the "source of truth" for the library website, a monolithic Catalyst application. Over time, the script system grew outdated, complex, and error prone. It was time to re-envision how the heritage content went from paper to online web pages!
The Solution
So, what did I do?

Assess the current architecture
By the time I was added to the project, previous developers had retired or moved on. There was little documentation, and the revisions to the Perl code over time had made things quite complex. In order to get a good handle of how the system functioned, I interviewed the Digitization staff on their tooling, and dove deep into the source code, creating detailed flow diagrams. I performed log analysis and observed monitoring software to determine bottlenecks, and highlighted them in the flow diagrams.
Define the desired architecture
Grounded by the requirements gathered through interviewing the Digitization staff, I researched potential open source software that could be adopted to replace the old custom Perl system. Following a KIS (keep it simple) approach, and learning from the bottlenecks found in the old system, I decided on a Micro service based architecture for the new access platform. This would alleviate the main bottleneck found, which was the slow and overloaded pipeline for transferring data from various sources to the main database used by the Catalyst application. This same script had to run every time anything was changed, and was designed to pull all of the data from every source for any change. This was very inefficient, and caused delays in Digitization projects, and complex coordinating between Digitization staff. Instead of having this single pipeline, we would have one API per data source that can be updated independently from one another, and exposed to the front end by their own REST API. Ditching the “single source of truth” database and removing interdependencies between the software supporting different business processes was the way forward! Next, I decided to integrate an open source workflow engine into the architecture to support better automation load balancing and result history. Instead of configuring multiple CRON jobs on a server, any automation would now be queued and ran in a holistic system, which is aware of all of the other jobs running. This ensures our servers no longer get overloaded like they used to.


Plan the migration
With the new architecture design ready, I then broke down each system component into a project. I determined key tasks and created technical GitHub issues in their respective repositories. Next, I created a user focused development roadmap with non technical meta issues to report progress to non technical stakeholders.
Implement the changes
Following an Agile Scrum methodology, I act as the scrum master, leading the daily stand ups, sprint retrospective, and sprint planning meetings. The technical stack involves quite a few different data storage systems, server side tech, and frameworks, each chosen to optimize ease of digitization, and performance for each application or micro service. Azure Cloud was used to deploy the architecture components, other than the OpenStack Swift Object Storage, as that is managed in house.


In depth testing
By performing load tests with locust, and revising the code when performance enhancements seemed necessary, I gained a base understanding of what amount of resources that would be needed to meet the expected load of the system. Additionally, it was planned that an open beta test would allow us to soft launch the new architecture platform and gather more feedback both from end users and system performance. There will be a “see our new design” option on our existing Catalyst application to allow end users to experience the open beta. The old architecture will run in parallel to the new architecture during this time.
Deploy the new architecture
When the beta test is completed we will be launching the new access platform and sunsetting the old architecture!
