Python Automation Developer for Data Cleansing Project
Skills Required
Description
Handling large datasets often comes with inconsistencies, duplicates, and formatting issues that slow down business processes. This role focuses on building automation scripts in Python to clean and organize data efficiently, ensuring that downstream systems always receive high-quality inputs.
The developer will rely heavily on Pandas for data manipulation, along with SQL for structured database operations. APIs may also be integrated to fetch and validate information in real time, making the process more dynamic and reliable.
Automation is key to this project. Instead of manual data checks, scripts will run scheduled tasks to detect errors, standardize fields, and maintain consistency across multiple sources. This minimizes human error and increases operational speed.
š Core responsibilities and skills include:
Writing Python scripts for automated data cleansing
Using Pandas for advanced transformations and aggregations
Designing SQL queries for validation and optimization
Integrating APIs for external data verification
Automating workflows with schedulers (cron, Airflow, etc.)
Handling duplicates, missing values, and outliers
Creating reusable functions for long-term scalability
Logging and monitoring script performance
Documenting processes for team handover
Testing scripts to ensure reliability in production
Maintaining compliance with data securi...