I recently came back from a conference in Bahrain that focused on, among other things, artificial intelligence and machine learning in art. I am as excited as anybody about the potential to apply these new tools to art and art history, but we do not have all that much data about art in a format that is clean, accessible, and easy to analyze. Moreover, without quality data, these new machine learning tools do not add much value to the discourse and use of art.
Lack of data has caused other problems, as well. People debate the exact number (which is likely unknowable), but many people suggest that 15-20% of art in museums and on the market is either forged or misattributed. A lack of quality data on art in an easily accessible format contributes to this problem.
So how do we solve the problems around quantity, quality, and accessibility of data in art? This question has been my focus for the last five years as I have built out the Artnome database of artists’ complete works along with new analytics that can only be derived from such a database. However, tackling a problem of this scale requires collaboration and effort from many different experts and groups attacking the problem from many different angles, including museums, collectors, estates, galleries, and auction houses.
In this first part of my series on art and data, I speak with Neal Stimler, Senior Advisor at the Balboa Park Online Collaborative. Neal served over a decade at The Metropolitan Museum of Art in New York City in successive positions. He worked on rights and permissions, designed digitization workflows for The Met’s collection at scale, oversaw partnerships with the Google Cultural Institute and Wikimedia communities, among other organizations, and was the project manager for The Metropolitan Museum of Art’s Open Access program that launched in 2017. Neal’s expertise in cultural heritage has deep roots in data and digital asset management, but it also incorporates areas of practice that include copyright policy, education, public engagement, operations management, and cross-reality technologies.
JB: Thanks for joining us, Neal. Let’s start with the basics. What is Open Access?
NS: The term open access is derived from open academia, where the standard is Creative Commons Attribution license or better. Open-Access (OA) content - whether we are talking about a piece of art, a writing or other work - is free of most copyright and licensing restrictions and is often available to the user without a fee. For a work to be OA, the copyright holder grants everyone the ability to copy, use, and build upon the work without restriction. I recommend the essential book Open Access by Peter Suber and Creative Commons’ overview on the topic. The video that most inspired my work in Open Access was “A Shared Culture.” A key aspect of engaging Open Access, too, is awareness and dedication to supporting the public domain.
The adoption of open access in museums and the GLAM sector is relatively more recent than in the academy. In the cultural heritage sector, professionals and supporters center around the GLAM-Wiki and OpenGLAM communities of practice. These communities advocate for open-access policies for data, digital assets, and publications resources from galleries, libraries, archives, and museums (GLAMs). Practitioners within and external to cultural institutions build tools to make these world heritage resources available to the public for uses ranging from commercial to creative to scholarly.
JB: What is involved with a museum making its collection available online? How long does it take for a museum to transition from being closed to open access [OA]?
NS: Some resources to consult in this process include The Rights and Permissions Handbook (American Alliance of Museum OSCI 1st Edition; Rowman and Littlefield, 2nd Edition), “Copyright Checkpoint,” and the “Copyright Cortex.” Some museums may also consider RightsStatements.org and International Image Interoperability Framework (IIIF) to address back-end rights management and image services. The “Collections As Data” project and “Museum APIs” wiki may also be useful resources.
After performing a thorough rights assessment on the assets in question and after consulting with licensed legal counsel in their jurisdiction, museums then need build tools to provide mass self-serve access to data and digital assets sets. These tools typically come in the form of a museum's collection online website, a public application programming interface (API), and a GitHub repository of data in the .CSV and .JSON formats. Data should be offered with the same permissions and legal frameworks as associated image assets.
Importantly, for a data set to be useful to the broadest spectrum of the public, it must include not only identifying or “tombstone” data for objects, but also rich contextual data like object descriptions, provenance, bibliography, artist biographies, or other data that help users to interpret and understand objects. The API serves application developers and partners, while .CSV and .JSON formatted data mainly supports researchers and scholars. Open-access content should be hosted in partnership with crucial aggregation platforms such as Wikidata, Wikimedia Commons, and Internet Archive. Other partners and aggregators might be impactful given the nature of the type of collections. Museums, too, should be mindful to evaluate and make decisions with respect to cultural and ethical considerations of open access in collaboration with communities and scholars.
The process from being “closed” to going open access depends on an institution’s preparedness. An advanced level of digital transformation is required for an institution to manifest policies and deliver the necessary tools in order to provide quality open access services to the public. An absolute commitment to open access and sincere leadership are required at the executive level and upper-level management for open access initiatives to succeed. Open access should represent a broader philosophical shift across all aspects of the museum’s operations and programming. An internal working group or project team from relevant areas across the organization should be assembled. The internal group is led by a project manager who leads the project vision and has ultimate decision-making authority. Partnerships with allied organizations engaged with an institution’s users and working directly with Creative Commons is strongly recommended to implement the best practice approach.
JB: What are the benefits of institutions implementing open access policies?
NS: The benefits of museums adopting open access policies are certain, clear, and proven. First, museum users expect open access by default. Museums need to redefine their obligation to “access” in the 21st century. The collection is not theirs; they hold it in the public’s trust, and that comes with responsibilities to serve a broad spectrum of users.
Museums employing a clear Creative Commons standards-based policy and well-developed technology platforms in their open access initiatives may receive a significant positive public response. Museums may also see an increase in website traffic on their sites at the time of launch. This web traffic can extend over the long tail through placing data and digital assets onto partner sites’ platforms, where engaged communities of practice make use of the content. Two crucial partners are Wikimedia platforms and Internet Archive, which authentically serve engagement goals with user communities, as well as provide analytics.
Second, digital humanities and other researchers, as well as data scientists, can perform new models of research and publication with unambiguously marked open content. Open access content enables the building of new intersectional and multimodal knowledge systems that are not possible with the restrictions of “closed” content.
Third, open access museums find that their collections, having been opened, become the go-to sources for data and images by journalists and scholars seeking quickly accessible, high-quality, and confidently rights-cleared content for their publications. Simply put, open access data and images are used, and closed data and images are increasingly not used due to the omnipresent burdens of time, money, and process needed to solve rights issues.
Fourth, museums that make the transition to open access improve operational efficiency, save money on operations (the image request process), and reduce friction for the benefit of users. Image revenue and licensing as a business for public domain artworks continue to decline. Staff who previously wasted resources manually processing burdensome rights-clearing requests for works in the public domain may now focus on rights cataloging for newly acquired and backlogged objects; can build more accurate and complete collection records; and can increase the amount of comprehensive data that provide greater possibilities for the use and interpretation of collections.
JB: What can users do today with open access collection content?
NS: We do not yet know the full extent of what is possible. Let’s examine several examples and potential applications for how users can engage with open access collections content as a guide.
The Next Rembrandt, a collaboration with ING and Microsoft with advisement from the Technical University of Delft, Mauritshuis, and Museum het Rembrandthuis, produced a “new” Rembrandt painting using data to algorithmically generate a composite portrait based off defining characteristics of Rembrandt’s style. The project drew upon many data sources, including data and images of Rembrandt portraits, which are largely in the public domain. Without further clarification, this project be cannot be considered an open access example per se, in that the research data, code and final image do not appear to available for reuse by others with an open access license. This kind of research and production does provide a useful example though of how public domain collections can foster creative potential for making new art and re-interpreting art history through data. Future examples could be made with open access artworks and data. Watch the video.
Artificial Intelligence and Machine Learning
Andrew Lih and members of the Wikimedia community used the Wikimedia “The Distributed Game” in a specific iteration “Depicts” to assist AI and machine learning to tag images from The Metropolitan Museum of Art’s open access collections. The new data created through this effort helps create standardized data on a decentralized platform of Wikidata where all can benefit from it rather than the data being solely confined to The Met’s collection online. This is a breakthrough for museums and scholars worldwide. Lih stated the project was, “...a powerful demonstration of how to combine AI-generated recommendations and human verification. Now, with more than 3,500 judgments recorded to date, the Wikidata game continues to suggest labels for artworks from The Met and other museums that have made their metadata available.” In conclusion, Lih wrote, “One benefit of interlinking metadata across institutions is that scholars and the public gain new ways to browse and interact with humanity's artistic and cultural objects.”
Creative developer Andrei Taraschuk is an art fan who makes Twitter art bots for individual artists to share their work on social media. Taraschuk also created art bots for each curatorial department that The Cleveland Museum of Art made available with its #CMAOpenAccess program. These artworks are now being shared more widely than any one institution could do within the confines of their own social media program. Watch Andrei’s Ignite Boulder 37 talk, “Enriching Social Media Through the Power of Art, Bots and AI.”
Commercial Art Platforms
Artsy is a unique platform in the art data environment for collecting and discovering art because of its museum partnerships, research in the Art Genome Project, and its incorporation of open access images and data from third-party providers. Artsy presents art information from the marketplace along with related works in museum collections. Artsy's collections online website is a rare opportunity to examine and find artworks in museum collections with similar works currently for sale. Artsy's approach is valuable for the history of collecting and studying connoisseurship at the intersection of the art market and art history on a nuanced digital platform. Artsy, for example, incorporates open access artworks from The Cleveland Museum of Art. Artsy also has a focus on open source software development, and its public API provides educational and non-commercial access to images and information for historical and public domain artworks.
Open access museum collection data can be interpreted and perhaps better understood through computational methods such as data visualization. A key leader in museum data is Jeff Steward at Harvard University Art Museums. Jeff’s 2015 "obJECT" lecture, which is part of the Sightlines series of The Digital Futures Consortium, gives an excellent overview of how museum collection data can be creatively visualized. Watch a video of the “collection blooms” visualization. Read more on Harvard University Art Museum’s Index and explore the API and GitHub pages. In addition, The Tate, from 2013 to 2015, developed a digital strategy and open access digital collections data initiatives. Key figures included John Stack, Elena Villaespesa Cantalapiedra, and Richard Barrett-Small. Data researcher Florian Kräutli created visualizations and provided analysis on the data for Tate and The Museum of Modern Art. The Cleveland Museum of Art partnered with Pandata to do a visualization of their collection with the launch of Cleveland’s open access initiative.
Open access museum images have been used in design collaborations with the Rijksmuseum and Etsy, as well as the National Gallery of Denmark and Shapeways. The Rijksstudio Awards by Rijksmuseum featured a top 30 finalist submission by Dr. Andrea Wallace called the “Pixel and Metadata” dress, where the museum collection data itself became a design object.
Smarthistory is one of the most accessible online learning resources for public and digital art history. It is an open educational resource, or OER. Its mission is to “open museums and cultural sites up to the world” through blog posts, essays, images, timelines, and videos on art history. Smarthistory has a deep corpus of content that serves learners at high school and undergraduate university levels, as well as lifelong learners. Its content is clearly communicated, well researched, and critically engaged, making it a reliable and progressive learning platform. Smarthistory uses Creative Commons legal tools for the licensing of its publication overall, as well as utilizes Creative Commons designated images to populate its essays and videos. Imagine if museums treated their websites like Smarthistory, using Creative Commons legal tools for content so that others could more freely build, create, and share art online. New types of art publications could be created algorithmically and by humans in the future with a more open approach modeled on and expanded from Smarthistory. Smarthistory was founded by Dr. Beth Harris and Dr. Steven Zucker, who are the executive directors. Dr. Naraelle Hohensee is the managing editor.
JB: What copyright and data frameworks are the museums you are working with using? What are those frameworks? It seems like institutions have areas of consensus, but also differences in their approaches to open access.
NS: Working in open access means building resources and working “in the commons.” An institution does not have to undertake the open access process in isolation and risk creating a bespoke policy that does not follow the established practices of leading open access institutions and allied organizations like Creative Commons. Creative Commons provides the most widely used, interoperable, and globally standardized legal framework for open access. The Creative Commons Zero Public Domain Dedication is the most open and permissive tool, as well being the most commonly used by leading cultural institutions who seek to assertively remove as many barriers as possible to foster the use, reuse, and remix of their collections. Note that it would not be considered open access if a museum applied a Creative Commons Attribution license to digitized objects in the public domain.
Some GLAM institutions have implemented conditions that require users to “share-alike,” meaning that creators who use “share-alike” content must offer their new creation or derivative work under the same conditions as the source material. While the “share-alike” concept may appear more progressive, it may potentially hinder the freedom of expression, individual liberty, and interpretation of others with its dependent contingencies. Share-Alike was intended to help build and expand the commons, but it may more often act as a deterrent, causing users to look elsewhere for content that can be used without undue burden on their creative production and consistent with other harmonious terms like Creative Commons Zero. Furthermore, museums may not have a right to license under share-alike, therefore creating confusion for both institutions and users. The application of other licenses like share-alike or non-commercial should only be considered for works created by the institution, where they hold the copyright as opposed to the digitization of underlying works that are in the public domain.
Some museums, early in the development of open access, created specific policies for open access in their terms and conditions or by using the statement “public domain.” It is important that cultural institutions understand that the concepts and legal framework for “public domain” are determined by a range of factors, and is often dependent on country-specific or national definitions. Some institutions may use the Creative Commons Public Domain Mark for collection images and data, but this tool does come with considerations around works that may have a “hybrid” public domain status, meaning they have a status that is “public domain in some jurisdictions but may also be known to be restricted by copyright in others.”
Museums especially should opt for Creative Commons Zero when applicable to digitized collections or museum produced content because it, as stated on the Creative Commons website, “provides the best and most complete alternative for contributing a work to the public domain given the many complex and diverse copyright and database systems around the world,” and “clarifies the status of your work unambiguously worldwide and facilitates reuse.” The commons of the Internet is a realm of production beyond any one nation or group. Museums doing open access should desire to see their collections engaged and used assertively on a global scale.
JB: I always get super excited every time another museum makes its collection open access, but to be honest, it is not always clear how to engage with this content. I feel like in addition to making data and assets available, we are missing the tools to make it easier for the average person to consume, filter, and mine all of this data for exciting insights and to tell their own stories or do their own research using the content. Do you agree? Are you aware of efforts to make museums’ collections easier to analyze and consume?
NS: Baseline content elements and tools for museums to deliver open access are identified in this text. They are more mature than people may realize. The GLAM sector does need to improve tools for working collaboratively at scale with decentralized and distributed data and digital assets at the peer level. Between museums and partners, what is needed are highly automated and sustainable pipelines for digital assets to connect and to be distributed online to partners and subsequently end-users. In terms of tools for end-users, there are exemplary artists, developers, and scholars working with museum content. Those creators, whether independent or institutionally affiliated, have the tools they need to make in their contexts. Active partnership with museums can maximize creative output and benefit makers. Museums also need to do the due diligence of documenting and sharing open access projects made from their collections that they admire to inspire others and build a greater corpus of relevant examples.
The first plateau for any museum to reach is to make data, images, and publications open access. After that best practice step, museums must understand and commit to the future development of open access initiatives for the long term as being equal to exhibition-making, collecting objects, conservation, and scholarly publishing. Open access is a pillar of both museum content development and community engagement. Open access is not a “set it and forget it” scenario. Open access requires not only ongoing operational and technical maintenance, but sincere incorporation into the programmatic functions of a museum such as education, public programs, and scholarly publishing. The answer to the public engagement question for the long term with open access museum collections is not one-time contests, festivals, or hack-a-thons. These short-term tactics will not achieve a museum’s goal of deep and authentic engagement with users because they do not scale and are not part of annually budgeted programmatic efforts.
The critical opportunity for museums is to co-produce knowledge systems and experiences of collections built in collaboration with users. I’ve written about this in detail recently for the Museums and the Web 2019, Boston conference paper, addressing the historical development of collections online and “Wikification.” The ability for users to see their contributions manifested, reflected and impacting the ways that museums carry out their missions on a data level is needed whether it be on a museum’s collection platform, a third-party site such as Wikimedia or a user’s independent creative project. Museums need to commit to working together on tool development and resources that work well beyond small consortia and self-selecting peer groups. The impact and scale of museum collections come into fruition when they are part of an ecosystem of content on popular and commercial applications familiar as well as widely adopted by users with a diverse range of interests and skills.
Making museum data easier to analyze, consume, and create with for users is a necessary part of the hard work of digital transformation that responsible museums must do. Museums can remain relevant by providing essential services for cultural production and consumption in the digital world. Museums must prioritize an operational philosophy and practice that efficaciously meets the transactional customer expectations of not only millennials, the rising dominant global generation, but also the successive-born digital generations who will have even higher levels of synchronicity between digital and physical lives. Artificial intelligence and machine learning, along with human interaction, have the potential with open access to help museums make more meaningful user connections through accessible, multilingual, and translated content, as well. Commercial businesses have already prioritized customer needs with new technology developments. Museums also can optimize the use of these tools to offer potential benefits for human connectivity and greater mutual understanding, especially when engaged with museum content with open access.
JB: Have you spoken to museums that are afraid to make their collections open access? If so, what drives the fear and how do you overcome this?
NS: Yes, I am in frequent conversations with museum clients about how to make the open access transition for their institutions. Most fear in this regard stems from the same conditions that undermine positive improvement in other aspects of business and life: uninformed anecdotes; too much self-focus; and a misguided sense of tradition that says “this is the way we have always done it” or “we are limited by an edge case.” While there may be real on-the-ground obstacles to taking on open access for an institution, it is important to face change and have the will to move forward. In addition to pointing the major open access success stories of leading institutions, I encourage executive leaders and staff throughout GLAM communities to remember their missions and responsibilities to the public they serve. Open Access is “mission critical” for museums.
JB: In addition to museum collection data, there are catalogue raisonné data and gallery and auction records. The New York Public Library defines catalogue raisonné as “a comprehensive, annotated listing of all the known works of an artist either in a particular medium or all media.” In a perfect world, we would have an artist-level view of all of the works an artist has created, where they currently reside, and where they have been in the past. How could catalogue raisonné data be useful working across museums and with estates, galleries, libraries, or auction houses in a unified and decentralized manner?
NS: Catalogue raisonné data is particularly interesting because, in aggregate with open access, it has the potential to transform how the history of collecting and provenance are studied across public and private collections over time. Catalogue raisonné numbers are facts that are not copyrightable. The difficulty in many cases with this data is that catalogue raisonné data is mostly still only in print format. In the case of 20th or 21st-century artists, this data remains the purview of artists’ estates or representatives whose primary interest is focused on the accounting and value promotion of a particular artist’s work rather than building shared knowledge through comparative research with other artists or collections. Catalogue raisonné projects published in print, or those in a digital format with restrictive or closed access, are prime examples of costly, inefficient, and outdated knowledge production processes. Moreover, they are “data silos.”
If catalogue raisonné numbers and data were published as open access, they would provide richer cataloging records for museum collections around the world through shared bibliographic data and enable museums to focus energy on creating new catalogue records for new or unprocessed collections. Catalogues raisonné are a collaborative publication in which academics and curators work together to produce knowledge, although typically in an enclosed and invite-only process. The perspectives contributed by external and independent scholars in making a catalogue raisonné entry are not often incorporated at the same level of authority as internal curatorial knowledge within museum collections online and may only be incorporated as citations or when absorbed into summary knowledge as presented to the public in an object description or label. Wikidata can act as a unified and decentralized platform where catalogue raisonné numbers and data could have a broader impact. From Wikidata, catalogue raisonné data could be used by museums as well as auction houses, collectors, and scholars. Wikimedia contributor Jane Darnell mentioned to me in a tweet that she digitizes catalogue raisonné data from old publications for use on Wikidata as related to WikiProject sum of all paintings. Jane shared examples of catalogue raisonné and Wikidata work on the paintings of Hofstede de Groot and Bartholomeus van der Helst.
Some examples of digital catalogue raisonné include SFMOMA Rauschenberg Research Project, Pieter and Jan Brueghel sites, and Artifex Press. A model that points to a more progressive future is the Paul Mellon Centre for Studies in British Art catalogue raisonné on the artist Francis Towne. In its copyright page, the Francis Towne catalogue provides nuanced details about the rights status of the overall publication and elements within it. The Towne catalogue [as a whole publication] is offered under Creative Commons Attribution-NonCommercial 3.0 Unported license, along with acknowledgment of sourcing open access images using Creative Common Zero. The online publication provides a search filter to find open access images directly within the catalogue itself that may be downloaded at high resolution and reused as related to terms of the source image. For other museum examples, although not catalogue raisonnés, consult Ancient Terracottas: From South Italy and Sicily In the J. Paul Getty Museum, and The Digital Walters. These digital publications offer downloadable and rich content packages and use Creative Commons legal tools.
Challenges concerning the current states of catalogue raisonnés speak to ongoing difficulties in the education, training, skills development, and present condition of digital art history and scholarly practice. Art historians are still working mainly in outmoded practices of knowledge production that can be made be more collaborative, transparent, and synchronized when compiling catalogue raisonnés not only as digital first, but open access as publications beyond images and text into data and code. The code can also be published with companion open source legal tools with Creative Commons licensed content and may work in conjunction with a Creative Common Zero Public Domain Dedication.
JB: How do museums manage rights and permissions issues? Do museums own the copyright for the images of their collections? Can people feel free to modify or share the images they find through the open access initiatives for these museums?
NS: Rights and permissions management for art museums can involve many roles internal and external to an organization. It may include staff within an organization such as rights and permissions managers, collections managers, legal counsel, registrars, curators, conservators, and in important cases, even museum directors. External to an organization are artist’s estates and their representatives, who may be the exclusive agent representing an artist’s rights for copyright use requests. For works in copyright, art museum staff work in close coordination with artists; estates and their representatives to review use and permission requests on a case-by-case basis. Loans and other restrictions can apply to works, as well, often defined on a contractual basis between parties. It is crucial to distinguish the fees charged by art museums for digitization vs. fees charged for rights and permissions requests. Assessing fees for digitization may be appropriate for the costs of museum staff labor (e.g., handling objects, photography, post-production), time, and resources.
The rights and permissions process is a highly manual, labor-intensive, time-consuming and often costly process for the museum and end user. Fees are assigned for projects for a variety of factors. The rights and permissions process within art museums acts more like “gatekeeping” to deny access to the use of artworks by the public, either at the behest of the specific institution or by the rights’ holder. A significant limitation of the rights and permissions process across the GLAM sector is that it is primarily focused on processing image requests, largely leaving no standard mechanisms or process for other content packages such as code, data, text, and multimedia asset requests. Another limitation is that these requests are typically handled through email or online web forms that take days to weeks to process.
JB: Improving art data to preserve and protect our art historical records is something I think about a lot. I worry that we may not get there in my lifetime. How would you describe your view of the need to improve art data? How does this look? How long do you think it will take us to get there? What are the biggest stumbling blocks to improving art data? How do we overcome them?
NS: We are already on the way to improving the quality of art data in the broadest sense of the concept. The GLAM sector continues to see steady progress for its commitment to open access around the world. There are successions of new institutions joining the open access wave. Just think about what has been achieved already and where we are right now. Some of the world’s leading and most significant institutions have made the open access transition with sincere public declarations and celebrations of their collections. Those institutions that lag behind must be held to account by their directors, boards, and staff to implement an open access future. Open access is a plateau that institutions must reach as soon as possible if they wish to participate in the next tier of digital, educational, and culturally relevant efforts that are inextricably interlinked with global technological innovation. Much has been achieved. More is to be done.
I see a future of open art data where entire ecosystems and suites of content (e.g., code, data, images, multimedia assets, and texts) are circulating in creative production between humans and machines, or what Director of MIT Media Lab Joi Ito refers to as “extended intelligence.” I can imagine a landscape where museum publishing becomes increasingly automated by bots pulling from open access texts, which is an exciting opportunity, but also speaks to the urgent need to improve infrastructure and copyright policy to expand our possibilities for making an inclusive and boundary-traversing art history. I see new applications being built by the commercial sector in partnership with museums that improve the user experience of exhibitions and collections. I imagine new commercial products being made in brand partnerships with new businesses that increase revenue and operational sustainability for museums. The road will be built collaboratively with iterative joint efforts from commercial and prosocial actors. Wikimedia platforms can have a vital role to play as a shared and unified, yet decentralized, third space where the integrated knowledge systems can be formed as they have not been before.
The biggest stumbling blocks are apathy, doubt, and fear. Museums and those allied across the cultural heritage communities can overcome these obstacles with dedication, mutual support, and ultimate concern for our users: the public. Museums, too, must prioritize users' liberty and individual self-actualization. As Merete Sanderhoff, Curator and senior advisor at the National Gallery of Denmark, stated in “The Only Way is Open,” open access aims to make “human creativity from all times and all corners of the world accessible to all citizens, to foster new knowledge and inspire new creativity.”
JB: Is there anything else you want to share, Neal?
NS: I want to thank my colleagues Nik Honseysett, Daniel Brennan, Michael Weinberg, and Ryan Merkley for their constructive feedback on this interview. Thank you, Jason, for the invitation to collaborate on this project. Those interested in working with me as a consultant can send me an message on the contact page of my website, Twitter or via LinkedIn.