This spreadsheet and accompanying instructional document show how metadata for Law Review articles can easily transferred from HeinOnline and other sources to Digital Commons batch upload spreadsheets.


Like many law libraries, Santa Clara University Law recently moved its student law reviews to an open access publishing model using the Bepress Digital Commons (DC). To have the complete holdings of each journal in our DC, we purchased the journal backfiles from HeinOnline. Once purchased, we then needed to gather metadata for each article in the journal backfiles, and upload the metadata, along with a PDF of the article, into the DC. With potentially thousands of articles in each journal backfile, this can become a time consuming task that may involve a substantial number of staff to complete.

Faced with this task, SCU devised a method to automate the metadata gathering process using a screen scraper. The metadata was then parsed into the needed format using Excel functions. From this, a final spreadsheet ready for uploading to the DC can be easily produced. This process was recently presented at the 2012 CALI conference in San Diego. A PDF version of the slides presented by David Holt and myself at the CALI conference is available on the SCU digital commons (http://digitalcommons.law.scu.edu/librarian/8/) or at the CALI site (http://conference.cali.org/2012/sessions/moving-your-law-reviews-open-access-publishing-model-using-bepress-digital-commons).

In response to feedback from that presentation, SCU has created a "plug and play" version of the Excel workbook used in its process. All you need to do is plug your metadata into the workbook. The functions included in the workbook then automatically parsed the data into an almost-final spreadsheet. The final spreadsheet, which will be ready to upload into the DC, can then be easily created. Start-to-finish instructions for the project, including scraping your metadata into spreadsheets, are provided.

Using this method should allow you to quickly gather all of the metadata for a journal from multiple sources. In most cases the metadata can be gathered in less than a day and the process of loading the metadata into the DC can then begin. Metadata is gathered from three sources: Heinonline, Index to Legal Periodicals and Books, and Dropbox (or another public file hosting service). See the instructions for more information.

The spreadsheets can be easily adapted for collecting and parsing all of your faculty publication data so it can be uploaded to the DC. More broadly, this process demonstrates an effective method for creating useful data sets from loosely structured information sources on the Web.

If you are interested, the Excel worksheet is presented here and the instruction file is located under "Additional Files."

If you have any comments or questions concerning our process, please do not hesitate to contact me at walexander@scu.eduor 408-554-2733.

