Creating Perl Modules for Data Quality
Introduction:
A brief history of how I started to create Perl modules for data quality.
The Modules:
A run through of the design, and functionality of modules for name and address parsing and record deduping. A description of creating a formal grammar with Parse-RecDescent. Using regression testing to validate changes. Creating usable documentation. http://search.cpan.org/~kimryan .
How hard was it to develop CPAN modules:
Amount of time needed to create modules and keep them up to date. Dealing with complexity.
Building on the work of others:
How to find the best CPAN modules that can save you reinventing the wheel. Researching current designs and algorithms.
Making sure it works:
Building up a base of users who can assist with testing and supplying sample data. Handling requests for bug fixes and enhancements. Keeping control of the scope.
Getting people to use it:
How to promote your module to the Perl community, other programmers and end users.
How does it stack up against commercial software:
Compare features and accuracy of Perl modules to their commercial equivalents
What's next:
Other data quality modules that still need to be developed
GUI and command line interfaces
Sourceforge hosting
Integration with other tools, data warehousing, ETL
Keywords: Perl, CPAN, Data Quality, Parsing, Software Design
Kim Ryan
Software Developer, Self employed
|
Ref: OS5P0019


