Data Transformation made easy, how to build an agile data transformation stack
With today’s abundance of data (big or small), organization’s ability to capture, understand and process new content is key for their success. As lead developer and architect of the Provider Directory, Martin Magdinier has developed a custom data transformation stack to integrate over an hundred eclectic data feeds into a single repository. His process goes through three stages:
- Data discovery and exploration,
- Rapid data transformation prototyping and
- Automation of data cleaning and transformation process.
In his presentation, Martin will review challenges specific to each step of the integration process. He will describe tools used (OpenRefine, Talend, Crowdflower) and processes developed to address them while keeping agility and flexibility of the overall stack in mind.
Since 2007, Martin Magdinier has been engaged with innovative start-up and open data communities in France, Vietnam, and Canada. Through his recent projects (TTCPass, 2012 Google Places API Developer Challenge Judge’s Choice Award and Objectif Neige), his involvement with the OpenRefine community (reviewer of Using OpenRefine published by Packt) and consulting positions Martin became intimate with data massage and transformation techniques. Coming from a business approach, his focus is on data management and transformation tools that empower the business user.
- Martin Maginier, Data Transformation Made Easy