The Park Library Logo The Park Library School of Journalism and Mass Communication
University of North Carolina at Chapel Hill

JOMC 54: DATA MINING*

Data Miners

Databases

Data Sources (U.S.) 

Data Sources (N.C.)

Polling Data
and
Public Opinion Research

"Dirty Data, 
Poisoned Archival Wells, and Setting the Record Straight."

Data Miners and Analysts 

Blogs: Reporters and News Researchers:

Data Deceptions

It's hard to distinguish between "good" and "bad" data. Seek the advice and guidance of data miners (listed above) to help sort. Just remember that all data can be one or more of the following:

  • incomplete

  • incorrect 

  • out of date  (example from Johns Hopkins)

  • out of context (example NRJ article)

Databases

Databases (in Park Library) [Floor plan of the Park Library]

Databases (in UNC-CH Online Resources) 

Reference Sources (Print)

  • Almanac of Higher Education (in Park Library)

  • The Encyclopedia of Associations (in Park Library

The Encyclopedia of Associations is a comprehensive source of detailed information on over 135,000 nonprofit membership organizations worldwide. It corresponds to the printed Encyclopedia of Associations family of publications as follows: National Organizations of the U.S., which covers more than 22,200 American associations of national scope; International Organizations, which covers some 22,300 multi-national, bi-national, and non-U.S. national associations; and Regional, State, and Local Organizations, which covers more than 115,000 U.S. associations with interstate, state, intrastate, city, or local scope or membership. The Encyclopedia of Associations database provides addresses and descriptions of professional societies, trade associations, labor unions, cultural and religious organizations, fan clubs, and other groups of all types.

Handouts (in Park Library)

Beat Reporting Ideas, Handouts, and Tips 

  • IRE (Investigative Reporters & Editors)

  • Backgrounding People on the Internet

  • Most Wanted: How to fine People

  • Internet Sites by the Beat (crime/courts, health/medicine, education, environment, business, government, telephone directories, and more)

  • Advanced Internet Searching Tips

  • Unlocking the Invisible Web

  • The Places Journalists Should Go 

Polls and Public Opinion:

Data Sources (U.S.)

Statistical Abstract of the United States
    
Links to full data tables in pdf

Statistical Data Warehouse
     Tables from the Center for Disease Control on mortality, leading cause of deaths, and live births

National Center for Disease Control: Health Statistics
     State and territorial data on diseases and accidents. Links to health surveys with downloadable data. 

U.S. Statistical Reference Shelf
    
Note especially the link to "Briefing Rooms" with access to latest Federal data on economic and social indicators. 

Statistical CenStats Database
   
Building permit statistics on new residential and nonresidential construction for individual municipalities.  Updated monthly. Also includes a Census Tract Street Locator, County Business Patterns, and Detailed Occupation by Race, Hispanic Origin, and Sex. The latter from 1990 Census data.

Center for Media Research
     For planners and buyers of advertising media. Some data restricted, but there are some free reports and studies. 

FirstGov for Citizens: Facts and Figures about Your Community
    
Access to data, statistics, graphics and photos (current and historical), laws and regulations on topics of general interest.

U.S. Department of State
    
Offers links to fact sheets, special reports, Background Notes & Links to Country Pages.

Consumer Data and Statistics
    
Ratings and media use. 

AdAge Data Center
    
Selected advertising and marketing data.

AdAge Annual Ranking of the top 300 U.S. Magazines

Gasoline Prices: Historical     

African-American History Month (February): U.S. Census Fact Sheet
    
Print source: The African American Encyclopedia, 2001 (in The Park Library)

2004 U.S. Presidential Election: Facts and Trends

Data Sources (N.C.)

North Carolina Demographic & Statistical Data  
    
Database from the North Carolina State Data Center.  Provides access to over 1200 data series from state and federal agencies.

North Carolina State Government Employee Telephone Directory (searchable database)

North Carolina Budget: "Our State, Our Money: A Citizen's Guide to the North Carolina Budget." Prepared by the N.C. Progress Board. [Note: Report can be downloaded by logging on to The Progress Board's web site.]

North Carolina County Population Estimates by Age, Race and Hispanic Origin

North Carolina Criminal Justice sites:

North Carolina State Board of Elections

Official Web Site for State of North Carolina

UNC-CH School of Government

North Carolina Counties

Find NC: Central Gateway to NC Government Information

North Carolina State Board of Elections

North Carolina Health Info

North Carolina Crime Statistics: 2002 Annual Summary Report

North Carolina Criminal Records Check

North Carolina Department of Justice

North Carolina Department of Public Instruction

North Carolina Department of Commerce: County Economic Profiles

DIRTY DATA

Article: "How Sources, Reporters View Math Errors in News." By Scott Maier. Newspaper Research Journal, Fall 2003, page 49-63

Johns Hopkins University Medical Research Error

Journalists who have lied and plagiarized: (partial list from Poynter
     Jayson Blair of The New York Times, Janet Cooke of The Washington PostJack Kelley, who just resigned from USA Today; Charlie LeDuff and Bernard Weinraub of The New York Times; Frank Deford of Sports Illustrated; Ben McCarthy, formerly of the University Daily Kansan; Tonya Dawson and Demetra Karamanos of the Cavalier Daily at the University of Virginia; and Catherine Fitzpatrick, formerly of the Milwaukee Journal Sentinel.

Semonche's Guide for "Fact Checkers"

Data Mining Definitions

The currently accepted definition of "data mining" as used by information analysts is something like this: 
     Data mining is sorting through vast amounts of data to
     identify unique patterns and establish relationships. 
      Sometimes data mining is referred to as ". . . . a form of
     artificial intelligence that uses automated processes to
     find information." This technique involves clustering, link
     analysis, association/sequence analysis and summarization.
     Examples: relationship between migraine headaches and
     magnesium deficiency; connection between estrogen and
     Alzheimer's disease. 

However, a general interpretation of the phrase runs something like this: 
     Digging for hard-to-get, nuggets of data 
     such as  reliable, valid statistics.


Copyright 2003 - The Park Library - School of Journalism School of Journalism and Mass Communication -
University of North Carolina at Chapel Hill