This week, Christie’s will be auctioning one of the most important collections of American art, the Barney A. Ebsworth collection. The collection is valued at $300M and is brimming with work by artists like Georgia O’Keeffe and Edward Hopper, whom most of us feel like we know pretty well. But what do most of us really know about these artists from a quantitative perspective? The answer is not very much.

In this article on inventing new art analytics, we:

Outline a new approach to descriptive art analytics using Artnome’s database of artists’ completes works.
Chart a never-before seen view of Georgia O’Keeffe’s full body of work.
Share Artnome’s data scientist Kyle Waters’ early approach to predictive analytics using a random forest machine learning model.
Make predictions on four works from the Ebsworth collection going to auction this week.
Share artists’ price history, performance, and comps provided by our good friends at MutualArt.

Descriptive Art Analytics

Traditionally, art analytics are derived exclusively from auction databases, but only a fraction of an artist’s complete works ever make it to auction. What about the rest of the works? Should we pretend like the majority of works by an artist don’t exist when doing art analytics simply because it is convenient? I don’t think so.

In art, scarcity and uniqueness of a work drive much of its value. But without a database covering many artists’ complete works, neither scarcity nor uniqueness can be calculated. Most experts would be hard pressed to tell you how many oil paintings a popular artist like Georgia O’Keeffe made, fewer could tell you if that number is high or low compared to other artists, and none could tell you the average size of her paintings.

Over the last three years, Artnome has spent thousands of hours and tens of thousands of dollars building the world’s largest database of complete works by blue chip artists to help answer these (and other) questions. In this article, we pull from that database to provide descriptive statistics that paint a macro view of the artists’ complete works. This “big picture” in turn allows us to quantify what makes any singular work “unique” or “scarce” relative to the full body of work.

For example, Georgia O’Keeffe has 2,076 works listed in her official catalogue raisonne. The graph below gives of a breakdown of O’Keeffe’s complete works by primary media. It then breaks down the 810 oil paintings by substrate. We then look at all the oil paintings by whether they are listed in public or private collections. We can speculate works by artists with a low percentage of privately held work are less likely to come to auction and are therefore more scarce.

Below we show O’Keeffe’s complete oil paintings by width and height. We then further break it down by showing average surface area and total surface area painted for each year she was active. Contextually, an otherwise small work may be large for its year and vice versa. Not only is it interesting to see how the artist’s working habits evolved over time, but our model also shows how dimensions correlate to sale price at auction.

Why collect all this data? We believe the “Holy Grail” of art analytics will stem from a database of complete works enriched with auction data. We see the potential to harvest meaningful data sets from the images themselves and have already started experimenting towards that end using off-the-shelf solutions for identifying and searching objects shown within paintings.

Predictive Art Analytics

For our first attempt at predictive art analytics, Artnome data scientist Kyle Waters trained a random forest to make predictions on pricing using data across several dozen artists. The random forest is a popular machine learning model that uses many decision trees and makes predictions by averaging predictions from component trees. In general, the random forest is more accurate in its predictions than a single decision tree. The model learns basic relationships from training data to then predict new outcomes.

Machine learning is an exciting new tool that could help improve art analytics. However, what many people fail to realize is that a machine learning model is only as good as the quantity and quality of the data it is trained on. We believe this gives Artnome an advantage. Because our database covers complete works and not just those that have gone to auction, we have a larger data set from which to train the model.

Again, because we have the complete works in our database, we can also create estimates for all of an artist’s work, not just those that happen to be at auction at any given moment. For this reason, we like to think of ourselves as the “Zillow of blue chip art.”

Our pricing model is admittedly in its early stages and has lots of room for improvement. For example, our current model performs poorly on the works that typically sell for the most (often the ones that are also getting the most public attention).

Chop Suey, Edward Hopper, 1929 — *Chop Suey,* **Edward Hopper, 1929**

Works from the Ebsworth auction like Hopper’s Chop Suey and Pollock’s Composition with Red Strokes are masterpieces. This means they carry “masterpiece” price tags. Both works are estimated to sell for roughly 10x the artist’s average sale price at auction (since 2000). Like the largest mansions on Zillow, these masterpieces are the hardest prices to predict using historical data because there simply aren’t that many of them. Additionally, there are a limited number of buyers who can afford them, which makes it that much more difficult to predict a hammer price at auction.

Human experts also struggle with predicting prices for top works. Just this week, Van Gogh’s Coin de jardin avec papillons failed to sell at $30M despite estimates around $40M prior to the auction. As Christie’s CEO Guillaume Cerutti shared with The Wall Street Journal’s Kelly Crow, “The air is just thin at that price.”

Coin de jardin avec papillons, Vincent Van Gogh, 1887 — *Coin de jardin avec papillons*, Vincent Van Gogh, 1887

Though we struggled with the most expensive works, as you will see, our model did a very respectable job at predicting prices for works estimated at $3M or less, as we have a high volume of relevant data in our model for works in this price range. Our predictions are so good for work in this range that they may actually seem boring at first. Our model essentially came up with the same estimates as the experts at Christie’s. We are thrilled at these early signs that we could potentially automate pricing estimates at scale for an artist’s complete works using a machine learning model.

Selected Artworks for Analysis

We selected four works from the Ebsworth collection going to auction this week at Christie’s for analysis based on strength and availability of data from our database.

Horn and Feather - Georgia O’Keeffe, 1937
Cottages at North Truro - Edward Hopper, 1938
My-Hell Raising Sea - John Marin, 1941
Long Island - Arthur Dove, 1940

We compare Christie’s estimates to our estimates from the Artnome prediction model for each of the above works. You can see the correlation between variables in our model in the heat map below. (You may recognize this heat map as the feature image for this article. I thought it looked like a rather nice modernist painting, so I stripped off the annotations and repurposed it as art.)

We have also partnered with our good friends at MutualArt who offer access to auction prices and data on over 300,000 artists as part of their services. MutualArt’s insight analyst Kate Todd generously prepared pricing trends for the artists we cover in this article, as well as comps for the individual works we will be analyzing from the Ebsworth auction.

Georgia O’Keeffe - Horn and Feather

Horn and Feather, Georgia O’Keefe, 1937 — *Horn and Feather,* Georgia O’Keefe, 1937

Georgia O'Keeffe (1887-1986)
Horn and Feather
oil on canvas
9 x 14 in. (22.9 x 35.6 cm.)
Painted in 1937

Christie’s Low/High Estimate: $700,000 - $800,000
Artnome Model Estimate: $720,000

Above: The average lot value for works by Georgia O’Keeffe. Note the spike for 2014 when her Jimson Weed/White Flower No. 1 sold for $44.4M at Sotheby's.

Jimson Weed/White Flower No. 1 , Georgia O’Keefe, 1932 — *Jimson Weed/White Flower No. 1* , Georgia O’Keefe, 1932

O’Keeffe’s Horn and Feather is a lovely work, but few would confuse it as a masterpiece like Jimson Weed/White Flower No. 1. The estimate from Christie’s (as well as the Artnome estimate) reflect this. In fact, as a lifelong O’Keeffe fan, I’m not sure I would be able to identify Horn and Feather as O’Keeffe’s work out of context. It lacks the magnified, heavily cropped composition that is O’Keeffe’s signature treatment of small objects. Instead, the two objects float in a relatively passive sea of negative white space.

Our friends at MutualArt provided us with a great comp for Horn and Feather. Shell (Shell IV, The Shell, Shell I), painted in 1937 (the same year as Horn and Feather) sold at Sotheby’s last year for $1,515,000, which is 78% above its estimate.

Shell (Shell IV, The Shell, Shell I), Georgia O’Keefe, 1937 — *Shell (Shell IV, The Shell, Shell I)*, Georgia O’Keefe, 1937

Though it shares a similar subdued color palette, I think Shell is a superior work as it exhibits the cropping and use of negative space we expect from a work by O’Keefe. While this is a subjective observation on my part, Artnome believes these types of observations are also quantifiable and we are working toward that end.

If you are a regular Artnome reader, then you know that I believe all paintings by female artists are currently undervalued (research shows by as much as 47%) and worth investing in. As a data point, O’Keeffe (who may be the best-known female painter of all time) has an average lot value of $2,340,715 for paintings, far below that of Edward Hopper, her male contemporary, whose average lot value is $8,963,652 (2000-present). For this reason, I always root for O’Keeffe and other female artists to out-perform their estimates. That said, I will be rooting for Horn and Feather.

Edward Hopper - Cottages at North Truro

Cottages at North Truro, Edward Hopper, 1938 — *Cottages at North Truro*, Edward Hopper, 1938

Edward Hopper (1882-1967)
Cottages at North Truro
signed 'Edward Hopper' (lower right)
watercolor and pencil on paper
20 x 28 in. (50.8 x 71.1 cm.)
Executed in 1938.

Christie’s Low/High Estimate: $2,000,000 - $2,500,000
Artnome Model Estimate: $2,220,834.00

Above: The average lot value for works by Edward Hopper.

As a painter trained in both watercolor and oils, I see Hopper as every bit an accomplished watercolorist as he is a masterful painter with oils. So while works on paper generally fetch less than oils on canvas, I would not at all be surprised if Hopper’s Cottages at North Truro achieved the $2,220,834 estimate from our machine learning model.

While Hopper’s works on paper average just $318,554 at auction (since 2000), superior works can sell for much higher sums. In 2001, Charleston Slum, a Hopper watercolor on paper, sold for $1,876,000 at Christie’s on an estimate of $500,000 - $700,000.

Charleston Slum - Edward Hopper, 1929 — *Charleston Slum* - Edward Hopper, 1929

Our friends at MutualArt provided two additional comps below, both of which brought in significantly less at auction than Charleston and less than our estimate for Cottages at North Truro.

Vermont Sugar House, Edward Hopper, 1938 — *Vermont Sugar House*, Edward Hopper, 1938

Hopper’s Vermont Sugar House sold at Christie’s in 2007 for $881,000 and Shacks at Pamet Head sold at Sotheby’s in 2004 for $702,400. Both works exceeded estimates of $500,000 - $700,000. It will be interesting to see if Cottages at North Truro can rally past these prices to meet our estimate.

John Marin - My-Hell Raising Sea

Screen Shot 2018-10-31 at 3.54.59 PM.png

John Marin (1870-1953)
My-Hell Raising Sea
signed and dated 'Marin 41' (lower right)--inscribed with title (on the reverse)
oil on canvas
25 x 30 in. (63.5 x 76.2 cm.)
Painted in 1941.

Christie’s Low/High Estimate: $250,000 - $350,000
Artnome Model Estimate: $803,372.00

Above: The average lot value for works by John Marin.

And finally, a prediction from our model outside of the range of Christie’s own estimates. Our model likes this painting. Even though the trends suggest that the market for Marin may be headed downward, our estimate has it at $803,372, over twice Christie’s middle estimate of $300,000.

I also don’t have access to a condition report for My-Hell Raising Sea, but it does look like there may be a crease of some sort on the right sight of the canvas. Condition is likely the most important variable missing from our model, and we are actively seeking solutions to resolve this moving forward.

Marin was among the first American artists to paint abstracts and is a bridge between figurative painters and the abstract expressionists. For that reason, works that highlight his tendency toward the abstract like Sailboat, Brooklyn Bridge, New York Skyline (which sold for $1,248,000 in 2005) have done well.

Sailboat, Brooklyn Bridge, New York Skyline, John Marin, 1934 — *Sailboat, Brooklyn Bridge, New York Skyline,* John Marin, 1934

As a lifetime New Englander who is happiest on the northern shores of Maine, I strongly prefer Marin’s seascapes - they capture that landscape as well as any other painter, Winslow Homer included. But our model does not care about my fondness for the Maine seacoast.

The comps from MutualArt suggest that Christie’s experts have it right on this one and that our model may be too high. But we are of course standing by the the estimate from the model.

Two Sloops on a Squally Sea, John Marin, 1939 — *Two Sloops on a Squally Sea*, John Marin, 1939

Our first comp, Marin’s Two Sloops an a Squally Sea, sold at Sotheby’s in 2016 for $212,500. While it exceeded its own estimate of $120,000 - $180,000, it fell well short of our $803,372 estimate for My-Hell Raising Sea.

Cape Split, Maine, John Marin, 1945 — *Cape Split, Maine*, John Marin, 1945

And our second comp, Cape Split, Maine, had an estimate of $400,00 - $600,000, but failed to to find a buyer at auction just a year ago at Sotheby’s.

Arthur Dove - Long Island

Arthur G. Dove (1880-1946)
Long Island
signed 'Dove' (lower center)
oil on canvas
20 x 32 in. (50.8 x 81.3 cm.)
Painted in 1940.

Christie’s Low/High Estimate: $1,00,000 - $1,500,000
Artnome Model Estimate: $2,801,572

Like Marin, Arthur Dove is among the earliest American abstract painters. His works are simpler abstractions, and I mean that as a compliment. They have an organic feel that is supported by the use of an earthy palette.

Long Island is not a particularly sexy painting, even for Dove, but it has grown on me. It is unmistakably Dove in its pared-down composition, which features a nice balance of the sun (or moon) dwarfed by two massive monolithic forms resting on wave-like dunes. Our prediction model liked the piece more than I do, pricing it at $2,801,572, well above Christie’s estimate.

I personally much prefer the comp sent to us from MutualArt, Dove’s 1941 Lattice and Awning.

Lattice and Awning, Arthur Dove, 1941 — *Lattice and Awning*, Arthur Dove, 1941

Lattice and Awning last sold for $1,685,000 against an estimate of $1,200,000 - $1,800,000 in 2013 at Sotheby’s. I believe it to be a stronger composition than Long Island, but my data suggests Dove may not have made that many paintings compared to the other artists in my database, with just 459 works listed in his catalogue raisonne. If this is the case (I think an expanded catalogue is in the works), it may drive up the desirability of Long Island.

Moving Forward

In the future, we plan on leveraging deep learning to harvest data from the images themselves. We believe using the image data is useful for predictive accuracy because we can then detect things like color, subject matter, artistic style, composition, etc. These are all variables that people clearly visualize and use to determine the price of an artwork, but that have not been quantifiable in a scalable and manageable way. Until now.

We also have some early thoughts on how to improve detection and prediction of masterpieces as outliers in our model. One idea is to include data on exhibitions from the top museums and galleries. If we have data showing that several works from a single exhibition or combination of exhibitions has led to dramatic increase in sale price, other works that were in those same exhibitions may also receive a bump from the model. Recent research suggests the number of institutions that are influential in establishing the value of art and artists is relatively small, so this may be a fairly manageable undertaking. What I like about this approach is that we are essentially factoring in the good judgment of the best curators of the last 100 years in a quantifiable way for our model.

Summary and Conclusion

In this article, we looked at new descriptive analytics driven by Artnome’s complete works database that gave us a unique view of O’Keefe’s complete works. We then used our data to explore predictive analytics around auction prices for several pieces from the Ebsworth collection that will be going to auction this week. We also provided further context with performance history and comps thanks to our friends at MutualArt.

We think there is a ton of low-hanging fruit when it comes to applying modern analytical tools and practice to art and the art market. In addition to building better prediction models, improving available data on art and artists helps us understand these works in a new light and provides a much-needed barrier against forgery.

At Artnome, we are looking to onboard three to five clients in the next few months who are interested in benefiting from early access to our prediction model and the insights from our unique database. We would ideally like develop a long-term relationship with a few key clients as we grow the strength of both our one-of-a-kind database and our machine learning driven models. I can be reached at jason@artnome.com.

For those looking for a more mature solution, MutualArt offers both full advisory services including authentication as well as self-serve tools for auction data and analysis. While there are dozens of data and analytics providers to choose from, we like Zohar and his team at MutualArt because they share our vision of data and machine learning leading to better analytics and a stronger art market.