Journal of eScience Librarianship

In February, librarians at the University of Massachusetts Medical School’s Lamar Soutter Library announced the publication of the inaugural issue of the Journal of eScience Librarianship (JESLIB). The open access journal will be published entirely online, with three issues coming out each year – you can read more about JESLIB’s mission and scope here. The journal establishes a platform for scholarly literature pertaining to eScience activities in libraries. Librarians working in and around Massachusetts are heavily represented in JESLIB’s first issue, and about half of the articles are authored or co-authored by UMass faculty. But the journal has a clear and inviting article submission policy, which will hopefully spawn discourse between librarians at different institutions who are trying to provide similar services.

Posted in Uncategorized | Comments Off

What We’re Reading: Data Reproducibility. Should we have a “DataMed Central”?

We always find it useful when large-circulation journals like Science or Nature tackle the data issue, especially when they devote a special issue to this topic.  They typically take a structured approach, providing a brief, understandable overview to the topic at hand, and then specific examples or case studies as illustration.

Science’s special issue, on Data Replication and Reproducibility (December 2, 2011) was the latest in our list of “make note of these helpful special issues”.  True to form, the Introduction laid out the issues, noting how “replication — the confirmation of results and conclusions from one study obtained independently in another — is considered the scientific gold standard”, but acknowledging that this concept is complicated by the amounts of data produced, the approaches taken to research, and the complexity of the question.

The first article, by Roger D. Peng from the Johns Hopkins Bloomberg School of Public Health on reproducible research in computational science, raised many interesting points about the potential for reproducibility as a minimum standard for assessing the value of scientific claims.  While full replication of a study is the gold standard for evaluating publishing findings, it is often not feasible.  He uses the example of environmental epidemiology to illustrate: reproducing a large cohort study designed to examine health effects of pollution would be difficult if not impossible.  Such a study is very expensive and requires a long follow-up time.

Other reproducibility barriers exist, as he notes, related to technical issues with instrumentation, cultural issues within the researcher community, and the lack of an integrated infrastructure for distributing research results.  But progress is being made, through journals like Biostatistics that encourage authors to make their work reproducible by others.  Peng suggests small steps can be made now to reach the overall goal of reproducibility.  Authors can publish their code, even if it is not “clean or beautiful”.  Free repositories exist for this purpose.  Publishing code plus data sets is another step.  His final recommendation is that the science community create a “DataMed Central” and a “CodeMed Central”, similar to PubMed Central, that is, a repository for data, metadata, and code, with links to each other and corresponding publications.

What do you think of a resource like a “DataMed Central”?

Posted in Uncategorized | Comments Off

The DMPTool: online guidance for writing data management plans

This past November marked the release of a new resource intended to simplify the process of writing data management plans (DMPs) for grant proposals – the DMPTool. The product is freely available, and was developed through a collaborative effort between the University of Virginia, the University of California Curation Center, the California Digital Library and several other institutions. By providing templates and sample plans that are specific to each funding agency’s guidelines, the DMPTool aims to expedite and standardize the creation of DMPs. With more funders mandating them in grant proposals, this tool will hopefully make DMP creation easier, and therefore make the grant writing process more efficient. Although the tool is currently focused on DMP requirements from the National Science Foundation (NSF), it anticipates data management imperatives from more funding agencies in the future and plans to add templates customized to fit these needs.

The DMPTool also has the ability to recognize organizational affiliation and offer advice that is specific to policies expressed by the researcher’s institution.  The system uses open-source software called Shibboleth to allow those affiliated with partner institutions to log into the DMPTool through their home institution. Right now, specific guidance is limited to researchers from the DMPTool’s founding organizations. However, the project’s emphasis on collaboration and open-source principles invites new institutions to become involved and submit guidance based on their policies regarding data and research materials. The vision for the tool is that it will offer more than just generic advice for how to write a DMP – it will help researchers create a meaningful, nuanced, discipline-specific and institutionally approved strategy for preserving the products of their research.

If you are a librarian who has been getting questions from researchers about writing DMPs, or if you are new to the landscape of data services and would like more information, then the DMPTool may be a good place to start. Our colleagues at UVa’s Scientific Data Consulting Group (SciDaC) were instrumental in planning and developing the DMPTool, so if you are interested in learning more about how it works you should check out their blog post announcing its release.

Posted in DMP, Uncategorized | Tagged | Comments Off

E-Science: What are Your Librarian Colleagues Doing?

It’s common to want to know what’s going on at other health sciences libraries, especially in emerging areas like e-science.  We had this question after working with Claude Moore Health Sciences librarians to host an e-science bootcamp educational program in March 2011.  Librarians at the event were at different stages in supporting e-science at their institutions.  Was this true of other health sciences librarians across the board?  What did this landscape look like?

We discussed methods for investigating these issues: should we embark on a benchmarking initiative?  Interview colleagues?  Perform a survey?  In the end, we decided on the latter, influenced, no doubt, by the opportunity to re-use questions from the Association for Research Libraries’ (ARL) well-designed survey on e-science and data support services in research libraries from August 2010.

We adapted ARL’s survey to create our own, adding questions specific to health sciences libraries.  It was sent to AAHSL members in September 2011.

Though the response rate was modest (27 responses), the information we gathered was interesting and useful.  Many libraries’ institutions were addressing e-science and data support issues along with campus stakeholders, namely information technology units.  Libraries were using their existing librarians and staff to develop and provide e-science services.  Many had supported training opportunities such as conferences and workshops for their staff to develop their skills.  Libraries were investigating technologies surrounding e-science initiatives and data, and were exploring how to support data management efforts at their institutions.

Interested in our complete findings?  See the full report.

Has your library identified e-science and data support as a priority?

Posted in Uncategorized | Comments Off

The NSF DMP Requirement

As of January 18, 2011, all proposals for funding from the National Science Foundation (NSF) are required to include a plan for the management of data collected during the project.  According to the NSF, the data management plan (DMP) should be a supplementary document, no more than two pages in length, that describes “how the proposal will conform to NSF policy on the dissemination and sharing of research results.”  By encouraging investigators to outline immediate and long-term data management strategies in their proposals, the NSF is working towards ensuring a sustainable future for the research that they fund.  Data managed today are data accessible tomorrow.  The NSF isn’t alone in requiring DMPs – the National Endowment for the Humanities (NEH) has a similar DMP requirement for proposals submitted through their Office of Digital Humanities. And with a data sharing policy in place since 2003, it isn’t hard to imagine the National Institutes of Health (NIH) implementing a DMP mandate for all grants and renewals at some point.

Librarians at many institutions are recognizing that DMP requirements could have critical implications for researchers.  The danger is that researchers might view the DMP requirement as just another bureaucratic hurdle, and (without appropriate guidance) could hastily draw up a plan that may either fall short of the funder’s expectation or commit themselves to data management practices that their institutions cannot support. Since funders (the NSF in particular) are sometimes less than helpful in describing exactly what they expect in the DMP, librarians are stepping in to synthesize guidance and policies to create useable DMP templates, checklists and questionnaires for researchers.

Below are several examples of how librarians are providing DMP support services:

*University of Virginia Library’s Scientific Data Consulting Group (SciDaC)

*Georgia Tech University’s Library Research Data Project

*University of Wisconsin’s Research Data Services

Posted in DMP, Uncategorized | Tagged | Comments Off

The Data Life Cycle

SciDaC Data Life Cycle

The first step in offering data management support is to identify what kinds of services are needed. But that may be easier said than done, especially for those of us who aren’t spending the better part of the day in a lab. That’s why several organizations have created models of “the data life cycle” to illustrate the fundamental steps in the research method, and to show how data management fits into that process. The figure above comes from the University of Virginia Library’s Scientific Data Consulting Group. It demonstrates that data can be re-used for as long as they are relevant to the research community. Data are more than just results – in many cases they are the foundations for new ideas, proposals and discoveries.

The data life cycle is a helpful way for non-researchers to conceptualize the significance of data management. But who outside of the research community is interested in managing data? As they’ve done with many other types of scholarly output, librarians are taking on the responsibilities of data stewardship. Librarians bring a unique perspective that combines an interest in preserving access to information while also fostering innovation. The data life cycle allows librarians to assess the research process and determine where and when their expertise could help researchers. While the leap from books to bytes may seem extreme, many data management activities can easily be described in the language of traditional librarianship:

Data “Cataloging”: Documenting data at the point of collection is critical. Librarians are in a position to consult with researchers and help them describe data in formats that are standardized within a particular domain. For example, the MIT libraries provide guidance for adhering to the Data Documentation Initiative (DDI) standards for tagging social and behavioral sciences data.

Data “Weeding”: Not all data are created equal…at least that’s what some librarians are saying. Because data at different stages of analysis are often lumped together when stored, librarians at the University of Florida are using the data life cycle to separate “valuable” data from the data that researchers have exhausted.

Data “Archiving”: Many universities are looking to institutional repositories to fill the need for long-term preservation of datasets – and for the most part, these repositories are being built and operated by librarians. Check out Cornell’s DataStaR, for example.

Posted in Uncategorized | Tagged | Comments Off

What is e-Science to Libraries, Anyway?

Like many of you, at our academic health sciences library we’re seeking new ways to serve our patrons, including our researchers.  To do this we need to be more aware and involved in the realm of e-Science, commonly described as how research has been affected by large-scale computing.  However, we often find ourselves extending this definition – to include other emerging services (perhaps sometimes describe as Research 2.0).

The ARL/DLF E-Science Institute (April 2011) provided a broader interpretation for e-science, to include “all aspects and types of research that are performed digitally, such as data production and curation, social interaction, publishing and scholarly communication, and the use of physical space for specialized group activities”.  

What do you think of this definition?  We agree that this seems to give a better context for library involvement as a whole.  Libraries have made strides in many of these areas already and will continue to have impact as they develop and refine services.  Our survey of academic health sciences libraries’ e-Science activities also seems to reflect that libraries are interpreting “e-Science” in many different ways.  Makes sense in that many of us provide resources and services to reflect efforts specific to our campus communities, and these also vary. 

You’ll notice that we named this blog RaD Librarians not E-Science librarians.  That decision was based on the fact that “research and data” seems to give us a broader context for describing our services (and perhaps avoids a bit of jargon).  But we admit we’ve yet to really talk to many patrons about how they define these terms and what they mean to them.  An important next step!

Just a few thoughts on how we’re defining e-Science — no doubt we will have this conversation again.  Oh, and more survey details soon — Bart Ragon and I are compiling results and will share with the library community.

Posted in Uncategorized | Comments Off

Welcome to RaD Librarians!

Welcome aboard!  We’ve created this blog a place for health sciences librarians involved with Research and Data issues (RaD, get it?) We want to provide a forum where we can learn about and discuss issues, resources, and tools related to supporting e-science, data management, researcher networks, and more.

“We” are staff at the Claude Moore Health Sciences Library, including Andrea Horne, Pete Nagraj, and Bart Ragon. We’re directing this blog to health sciences libraries issues, thinking that it’s best to start with what we know best. But frankly, this is new territory for us, so we hope that you’ll join us on our journey to explore what issues face today’s researchers and how we can best support them.

Thanks for joining us!

Posted in Uncategorized | Comments Off