Challenge: Building an analytics and data enrichment solution for a sales automation platform.
Solution: Serverless data pipeline in AWS that involves data cleanup, normalisation, enrichment.
User Group: Enterprise, SMEs, business intelligence, business analytics
The first Proof of Concept for uplifty.io involved scripts that fetch the data and store it in a relational database. This allowed us to get familiar with the semantics, volume, variety, and structure of the data fetched from the source API. The architecture choices involved:
- Bare python for ETL component - to allow for flexibility and quick iterations as building the MVP always involves insights that can potentially change both the details in data transformations and the high-level data flow.
- AWS RDS - relational database was the obvious choice due to the structured nature of the data and the analytics use case;
- Amazon ECS - as a (serverless) runtime that introduces minimal infrastructure overhead while allowing for easy local testing and fetching data in long-running batches.
The work involved a lot of mini proof of concept phases, due to being largely exploratory. Communicating effectively about the data, constraints, and workaround was very important; and having a tech-savvy (even though non-engineer) client was a major time saver.
The challenges of the time zone difference were overcome by two things:
- Clear and detailed writing - both by the client, who wrote well-defined requirements and product milestone documents, and by Kombinat’s engineering team which had to communicate data issues, workarounds, and consequences of each trade-off to the client. Written communication, when done well, nullifies the problems of time zone difference and turns the asynchronous work into an advantage. Complex problems are always better explained, analyzed, and solved when written down.
- Regular calls and synchronous updates were crucial. Sometimes it takes a week or two to find a solution or a workaround to a problem, and providing a quick update about the progress on a call or via Slack was important for keeping the engagement high and building trust and relationship between the client and the engineering team.
The security baseline that was implemented provided an excellent foundation and a quick start. Having the security built-in is a minor investment at the beginning, but allows for build-up without major rework as the platform, number of users, and visibility grow.
data analyticsdata engineeringbusiness intelligence