Open Source Developers' Conference 2005 OSDC Logo
 

 

 

Creating Perl Modules for Data Quality

By:
To add a paper, Login.

Introduction: A brief history of how I started to create Perl modules for data quality.

The Modules: A run through of the design, and functionality of modules for name and address parsing and record deduping. A description of creating a formal grammar with Parse-RecDescent. Using regression testing to validate changes. Creating usable documentation. http://search.cpan.org/~kimryan .

How hard was it to develop CPAN modules: Amount of time needed to create modules and keep them up to date. Dealing with complexity.

Building on the work of others: How to find the best CPAN modules that can save you reinventing the wheel. Researching current designs and algorithms.

Making sure it works: Building up a base of users who can assist with testing and supplying sample data. Handling requests for bug fixes and enhancements. Keeping control of the scope.

Getting people to use it: How to promote your module to the Perl community, other programmers and end users.

How does it stack up against commercial software: Compare features and accuracy of Perl modules to their commercial equivalents

What's next: Other data quality modules that still need to be developed
GUI and command line interfaces
Sourceforge hosting
Integration with other tools, data warehousing, ETL


Keywords: Perl, CPAN, Data Quality, Parsing, Software Design
Stream: Perl
Presentation Type: 30 minute Paper Presentation in English
Paper: Creating Perl Modules for Data Quality


Kim Ryan

Software Developer, Self employed
Australia

I have authored five CPAN modules, mainly in the areas of data quality and text parsing. For details see http://search.cpan.org/~kimryan/

Ref: OS5P0019

 
 
Melbourne Perl Mongers DList
Digital Dimensions
Copyright © 2005
OSDC 2005 hosted by Melbourne Perl Mongers
For futher information contact Scott Penrose
Hosting provided by Digital Dimensions and DList
Web site and logo design by Amanda Penrose