Category Archives: Data Sets

Funding Visualization Tool

Libraries are overlooked and underfunded organizations that play a critical role in today’s society by providing free programs, resources, and services to millions of adults, children, and youth everyday around the United States. But many libraries lack the resources and support to innovate and build upon the ways they can meet their communities’ needs.

With support from the John S. and James L. Knight Foundation, Visualizing Funding for Libraries’ Data Tool is developed by Foundation Center to help libraries and their supporters find funding opportunities, increase understanding of funding sources, and track funding trends: http://libraries.foundationcenter.org/

Collaborative Information Seeking Lab Experiments Dataset

Rutgers University announces the availability of the Collaborative Information Seeking Lab Experiments Dataset. The data are from a set of lab experiments conducted by Chirag Shah and Roberto Gonzalez-Ibanez at Rutgers University in 2010-2011. It contains interaction logs (queries, page visits, relevance judgments, and snippets collected) by a total of 160 participants in 80 teams, with each team working on an exploratory search task for about 30 minutes in a controlled lab setting. The data were collected using Coagmento (http://coagmento.org/). The dataset can be downloaded from http://infoseeking.org/data.php#cis2010
Following is a small selection of papers that use the data that one can cite.
  • Shah, C., and Gonzalez-Ibanez, R. (2011). Evaluating the synergic effect of collaboration in information seeking. Proceedings of ACM SIGIR, pp. 913-922. Beijing, China.
  • Shah, C., Gonzalez-Ibanez, R. (2012). Spatial context in collaborative information seeking. Journal of Information Science (JIS). 38(4), 333-349.
  • Gonzalez-Ibanez, R., Haseki, M., and Shah, C. (2013). Let’s search together, but not too close! An analysis of communication and performance in collaborative information seeking. Information Processing & Management, 49(5), 1165-1179.

Note that this dataset is also being used for the Second International Workshop on the Evaluation on Collaborative Information Seeking and Retrieval (ECol) to be held at the ACM CHIIR 2017 conference in Oslo, Norway on March 11, 2017. More details are here: https://www.irit.fr/ECol2017/.

They hope this dataset, which took months to collect, would be a useful resource to researchers working in the fields of interactive IR, as well as social/collaborative search.

California Data Librarywhere

The California Employment Development Department (EDD) announces its new online Data Librarywhere users can access California labor market information.The new portal allows users a single point of access to search, view and download current and historical data on California industries, occupations, employment projections, wages and labor force. “EDD’s new Data Library improves the way we deliver California labor market information, provides users with a new streamlined search tool for everything from finding localized data to developing custom data presentations, and supplies the foundation for enhanced features in the future,” EDD Director Patrick W. Henning Jr. said. “The EDD is proud to partner with other California agencies in an open government initiative making public data easier to access and use.”  To read more, go to  http://edd.ca.gov/About_EDD/pdf/nwsrel16-29.pdf

Open Data Resources

Open data is data that “can be freely used, modified and shared by anyone for any purpose” (http://opendefinition.org/). Open data increases access, preservation, and impact. Increasingly, it is also a requirement as part of federal funding.

Librarians can facilitate its collection, organization, and its physical and intellectual access. Librarians can also help researchers in grantsmanship.

American Library Association’s EBSS section suggests these resources:

Here is another list of free data sets: http://www.datasciencecentral.com/forum/topics/more-free-data-sets

Data Set Repositories

Data Science Central lists these free or public data set repositories:

More data sets can be found here.

Open Data Sources

Zygimantas Jacikevicius posts top free data sources available online: http://www.datasciencecentral.com/profiles/blog/show?id=6448529%3ABlogPost%3A390993

1. Data.gov.uk the UK government’s open data portal including the British National Bibliography – metadata on all UK books and publications since 1950.

2. Data.gov Search through 194,832 USA data sets about topics ranging from education to Agriculture.

3. US Census Bureau  latest population, behaviour and economic data in the USA.

4.Socrata – software provider that works with governments to provide open data to the public, it also has its own open data network to explore.

5.European Union Open Data Portal thousands of datasets about a broad range of topics in the European Union.

6. DBpedia crowd sourced community trying to create a public database of all Wikipedia entries.

7. The New York Times a searchable archive of all New York Times articles from 1851 to today.

8. Dataportals.org datasets from all around the world collected in one place.

9. The World Factbook information prepared by the CIA about, what seems like, all of the countries of the world.

10. NHS Health and Social Care Information Centre data sets from the UK National Health Service.

11. Healthdata.gov detailed USA healthcare data covering loads of health related topics.

12. UNICEF statistics about the situation of children and women around the world.

13. World Health organisation statistics concerning nutrition, disease and health.

14. Amazon web services large repository of interesting data sets including the human genome project, NASA’s database and an index of 5 billion web pages.

15. Google Public data explorer search through already mentioned and lesser known open data repositories.

16. Gapminder a collection of datasets from the World Health Organisation and World Bank covering economic, medical and social statistics.

17. Google Trends analyse the shift of searches throughout the years.

18. Google Finance real-time finance data that goes back as far as 40 years.

19. UCI Machine Learning Repository a collection of databases for the machine learning community.

20. National Climatic Data Center world largest archive of climate data.

 

Free mapping online

To promote the spatial study of US and China, the University of Michigan Spatial Data Center and China Data Center are pleased to jointly announce new release of the “Free Mapping Online”. This web based spatial system offers tens of thousands of free maps from US and China census data and business data. It also allows users to upload their data (in Excel file) to the system and make US or China maps online without any GIS tools and experience. This service is free for public. See details at http://spatialdataonline.org.

Here are a list of maps currently available in the “Free Mapping Online”:

§  US Census Data: 1970, 1980, 1990, 2000, and 2010 (State, Metropolitan, County)
§  US County Business Patterns: 1986-2012 (State, Metropolitan, County)
§  US Business Data 1997-2013 ((State, Metropolitan, County)
§  China Census Data: 2000 and 2010 (Province, City, County)
§  China Economic Census Data: 2004, 2008 (Province, City, County)
§  China Basic Unit Census Data: 2001 (Province, City, County)
§  China Industrial Census Data: 1995 (Province, City, County)
§  China Land Use Data: 1990, 1995, 2000, 2005, 2010 (Province, City, County)
§  China Nighttime Light Data: 1992-2011 (Province, City, County)

This service is compatible with IE6/7, FireFox 3 or Chrome. Flash player 9 or higher version is required. Please contact spatialdata@umich.edu if there is any questions.