IRMAC Business Intelligence/Data Warehouse Special Interest Group 


An Introduction to Text Mining

Overview

The term and concept of data mining is one that has been around for almost a decade, and has come to represent an approach to the analysis of data, and, the understanding of patterns or relationships within that data. Less well known, both in terms of its methodology and application is the concept of text mining. However, it is a fact that 80% of information is in the form of unstructured text. This seminar has been designed to discuss document warehousing, text mining and the use of text mining with data mining. The seminar has been created to provide an understanding of the potential for the application of automated methods of extracting concepts from text, and then using those concepts as the inputs for a data mining exercise.

Outline

  1. Agenda
  2. An Overview of Document Warehousing
  3. Understanding Unstructured Text
    1. Unstructured text
    2. Natural Language Processing
    3. The concept of a concept
  4. Concept Extraction
  5. Text Mining
    1. Working with text
    2. Combining text and data
  6. Data Mining
    1. The Cross Industry Standard Process for Data Mining (CRISP-DM)
    2. An example project, end to end
  7. An Introduction to Categorization
    1. Rules of language
    2. Learning by examples
    3. Using categories

Biography

Tim Daciuk has worked with SPSS Inc for 10 years, and with SPSS software for over 20 years. Tim started out working with SPSS software in 1980 during his undergraduate studies at the University of Toronto. Tim has worked with SPSS as a consultant for 9 years, and, for the past two years, has been Services Manager, Canada for SPSS. Tim's role with SPSS involves assisting in sales, training and consulting for SPSS, in Canada, the United States, Europe and Asia. In training, Tim has developed courses for SPSS as well as teaching SPSS and related product courses in North America, Europe, and Asia. Tim has taught both scheduled SPSS courses and customized courses in both the private and public sector. Tim has worked with statistical analysis, data mining, text analysis and text mining software.

References


Time and Location of Meeting

Meeting Details:  Tuesday, October 28, 2003 

Location: Beeton Room West, Toronto Reference Library, Yonge Street at Asquith Ave. (one block north of Bloor), Toronto

5:45 pm: Registration for prompt 6:00 pm start
7:30 pm. Expected Finish

No need to pre-register - just show up!

No charge for IRMAC members.  Non-members: $15.

Questions please contact


Copyright © 2002-3 Information Resource Management Association of Canada. All rights reserved.
Revised: 2003-11-06.