Within the quickly rising discipline of information engineering, restructuring knowledge pipelines has develop into elementary to driving enterprise progress and operational effectivity. Manohar Sai Jasti, Software program Improvement Engineer at Workday, shares his journey of implementing progressive options and making certain scalability in knowledge pipelines. On this interview, we discover his experiences and insights into reshaping knowledge pipelines to empower companies with data-driven decision-making.
What are some key initiatives involving knowledge pipeline restructuring, and what outcomes did you obtain?
After I was engaged at Stord, a number one cloud provide chain, and achievement platform, I used to be the only real knowledge engineer there. My duty was to steer a number of essential initiatives that reshaped our knowledge infrastructure. One of the vital vital initiatives was the Log-Based mostly Replication (LBR) Migration challenge, which I spearheaded in collaboration with our Web site Reliability Engineering (SRE) staff.
Earlier than this challenge, we confronted substantial knowledge discrepancies between our supply system and BigQuery. They had been resulting in inefficiencies and slower knowledge updates, so the migration yielded outstanding outcomes.
To be exact, we achieved annual value financial savings of $72,000 per yr, equating to $6,000 per 30 days. The info discrepancies had been virtually eradicated and decreased by virtually 100%. Knowledge refresh charges had been additionally improved by at the very least 30%.
This challenge has been an enormous endeavor and has impacted all the main datasets for each Stord One Commerce and Stord One Warehouse, that are cloud-based order administration and warehouse administration merchandise. Due to the outstanding outcomes, I used to be acknowledged and awarded for “Efficiency Driver”.
One other key challenge was the Essential Orders Dataflow Enhancement. I owned this important knowledge circulation the place the purpose was to consolidate data throughout Stord’s legacy and new programs. This challenge considerably improved our knowledge aggregation and reporting capabilities. Its fundamental benefit was offering logistics prospects with detailed and correct insights into their provide chain operations.
Moreover, I accomplished all data-end migrations from Veracore to Stord One Commerce, which was an enormous buyer obsession win. This migration improved operational effectivity, grew income, and enhanced our services.
At present, as an Analytics Engineer at Workday since Could 2024, I’m concerned in creating and sustaining sturdy knowledge transformation pipelines. I’m a part of the Efficiency, Resilience, and Scalability (PRS) Engineering Instruments Group. My position entails creating a whole knowledge pipeline, from knowledge warehouse to knowledge science purposes, empowering Workmates with data-driven selections at their fingertips.
Right here, I’ve been extensively leveraging DBT, the information construct device, to boost our FinOps practices and create fashions that ingest and remodel billing knowledge from varied cloud suppliers. This work has improved our means to research prices throughout our multi-cloud infrastructure, offering priceless insights for useful resource allocation and spend optimization.
Knowledge product governance is essential for stopping siloed improvement and making certain constant, high-quality knowledge property throughout a corporation. In my present position at Workday, I’ve been addressing this problem by implementing complete knowledge governance practices for our knowledge merchandise utilized by the analysts, knowledge scientists and so on, by means of cross-functional collaboration, standardization, entry administration, knowledge pipeline life cycle administration, and so on.
Scalability and suppleness are cornerstones of any sturdy knowledge infrastructure. How do you guarantee your programs can scale seamlessly whereas supporting enterprise progress?
Scalability and suppleness are certainly essential at our job, particularly at Stord. The matter is that we now have quickly expanded our cloud provide chain companies, and to assist this progress additional and be certain that all new options are versatile, I centered on a number of key areas.
The primary was question efficiency enhancements. I corrected our knowledge infrastructure by strategically separating truth tables. Actually, I can boast that this restructuring dramatically enhanced question efficiency and optimized knowledge retrieval processes for Stord’s complicated logistics operations.
One other key space was the transition to DBT (Knowledge Construct Software). I moved essential knowledge processing logic that powers most of our dashboards from conventional saved procedures to DBT. This has introduced comparatively fruitful outcomes—the general operational effectivity and alerting programs had been improved. Due to that, it has develop into simpler to adapt to new necessities with out repairing your complete system.
Complete alerting and monitoring had been additionally an space of precedence. I applied 100% alerting and monitoring throughout all pipelines and important processes. This resulted in minimized knowledge downtime and improved means to reply shortly to points.
In my present position at Workday, I proceed to deal with scalability and suppleness. I make the most of a spread of instruments, together with DBT, Trino/Presto, Jupyter Notebooks, Python, Apache AirFlow, AWS RDS, MySQL/Postgresql, and Git for knowledge processing and evaluation.
What steps have you ever taken to modernize knowledge processing workflows, and the way have these enhancements impacted effectivity and accuracy?
At Stord, one of the impactful adjustments I made by way of modernizing knowledge workflows was the Log-Based mostly Replication Migration. It solved knowledge accuracy points, improved refresh charges, and lower prices, which helped us present real-time insights into logistics operations.
I additionally launched DBT to handle essential knowledge processes. This allowed us to deal with knowledge extra effectively and made it simpler for staff members to work collectively on updates.
One other challenge concerned enhancing how we deal with grasp order knowledge. These updates gave us a clearer image of warehouse actions and made our stories extra priceless for patrons.
At Workday, I’ve centered on multi-cloud infrastructure, creating pipelines that guarantee correct and up-to-date knowledge for value evaluation. These enhancements have helped groups make selections sooner and with extra confidence.
Let’s speak innovation—how have automated monitoring and machine studying formed your method to managing knowledge?
At Stord, innovation was all about staying forward in how we managed knowledge. One main enchancment was introducing automated monitoring and alerting for all pipelines. With 100% protection, we might catch and repair points earlier than prospects had been affected. This was particularly helpful in making certain correct logistics monitoring and reporting.
I additionally labored on enhancing our alerting system to deal with issues like stale or duplicate knowledge. These enhancements helped us keep excessive knowledge high quality and improved buyer belief in our analytics.
At Workday, I’ve continued to prioritize innovation by creating instruments and processes that make our knowledge merchandise higher. For instance, I’m engaged on enhancing alerting programs to determine points sooner and create smoother workflows for our groups.
Talking about present traits, machine studying is now remodeling virtually each data-driven enterprise. Are you able to share the way you’ve built-in machine studying into knowledge processing and its influence on analytics high quality and timeliness?
Throughout my time at Stord, I used to be concerned in exploring machine studying applied sciences’ integration into our knowledge processing. One in every of my key initiatives was constructing an AI-powered chatbot in collaboration with cross-functional groups. This chatbot used generative AI to deal with analytical queries, permitting customers to ask questions in plain language and get SQL-based solutions shortly.
We additionally added error-handling mechanisms that helped the chatbot study and enhance over time. This not solely decreased response occasions for ad-hoc queries but in addition gave our groups sooner entry to the information they wanted.
At Workday, I’m making use of this expertise to construct a information bot that makes use of generative AI. The bot is designed to assist customers ask questions on learn how to use analytics instruments, slicing down the necessity for documentation and offering real-time assist. It’s an thrilling challenge that’s making analytics simpler and sooner for everybody concerned.
As we wrap up, what hurdles did you face throughout initiatives like log-based replication, and the way did you overcome them?
The Log-Based mostly Replication Migration at Stord had its share of challenges. The principle technical hurdle was the complexity of provide chain knowledge. It was additionally vital to combine the brand new system with out disrupting ongoing logistics operations.
We generally bumped into sudden issues—what we referred to as “black swan” points—after making updates to grasp orders logic. These required deep troubleshooting and teamwork to resolve.
To deal with these challenges, I made certain to check totally at each step. I labored carefully with the SRE staff to resolve technical issues and collaborated with stakeholders to maintain everybody aligned on targets.
In my present position at Workday, I’ve confronted totally different challenges associated to multi-cloud infrastructure. For instance, making certain knowledge accuracy throughout totally different cloud platforms is essential. To resolve this, I constructed checks to validate knowledge and created a system to flag stale knowledge earlier than it affected prospects. This proactive method has helped guarantee our analytics are at all times dependable and up-to-date.