Data Documentation


This is the documentation page for Neighborhood Public Art in Boston, covering the data and data collection methods used. A broader introduction to the project itself can be found here.

Collecting the data 

The data for this project was originally sourced from three spreadsheets: one from Sarah Hutt, a Boston-based artist and public art consultant; one from the Boston Art Commission; and one from the Boston Public Library. In 2020, Danielle Rose, a Boston Research Center (BRC) graduate student worker who earned her MA in Public History from NU in 2021, merged the separate spreadsheets in the following steps:

  • Each entry was assigned a unique ID
  • Due to our initial goal of creating a map visualization of the artworks, the following columns were retained from the originating datasets: name of artwork, name of artist, installation dates, type of piece, description, materials, neighborhood, location description, and geographic coordinates. Using OpenRefine, each column was separately normalized to fix spelling, capitalization, and punctuation discrepancies between duplicate items.
  • The data was then exported to Excel, where the entries were sorted and deduplicated

After the data was initially cleaned, Amy Ruskin, Data Engineer in Northeastern University’s Digital Scholarship Group, reviewed the spreadsheet  item by item, researching the works to confirm the existing information and fill in missing information. She also deleted additional entries with unconfirmed or inaccurate information. Amy uploaded the initial set of items representing works and artists to Wikidata via OpenRefine.

Using Wikidata

Wikidata is a free, collaborative knowledge base owned by the Wikimedia Foundation. We chose it as the repository of this project’s data for multiple reasons, described in detail here. 

First, anyone can view and edit Wikidata, so it allows us to make the data we have accessible to the general public to reuse and invite collaborators to contribute their own knowledge. The BRC’s role in this process is to facilitate the standardization and aggregation of data around Boston neighborhood public art in Wikidata, without ultimately owning or exclusively controlling the data. 

Wikidata Logo

On the more technical side, the structural capabilities of Wikidata make it possible to capture more complex information than we could with spreadsheets, while also offering more flexibility than a relational database. We could take advantage of the work that’s already been done in Wikidata by linking to and modifying existing items in order to buildup a connected world of knowledge about Boston's neighborhood public art.

Wikidata contains WikiProjects, which are organized by groups of people who collaborate in adding or editing items and establishing community modeling practices within specific domains. We used the WikiProject Public art data model as a starting point and eventually created our own Neighborhood Public Art in Boston WikiProject with more project-specific criteria for how data should be added or edited. 

If you would like to add information on public art in Boston to this project, start by visiting the WikiProject homepage for instructions on how to create new items or improve existing ones.

Adding to the WikiProject

After the initial items were uploaded to Wikidata, additional entries were added directly to Wikidata by members of the Boston Research Center and other users, including participants in BRC-hosted edit-a-thons. The BRC has hosted two edit-a-thons so far: one for Northeastern library staff and another via Digital Humanities Open Office Hours, which was open to the public. For both of these edit-a-thons, Amy Ruskin prepared a spreadsheet with information about artworks already filled in. The attendees created new items and accompanying statements in Wikidata based on this spreadsheet. A total of 30 new items were created as part of these edit-a-thons. 

Other information

For more information on methodology, please visit the Location and Reference Data page.