The Museum of Modern Art Research Dataset

kjell · on Aug 3, 2015

The Cooper Hewitt (to my knowledge) was the first museum to post their museum metadata on github:

https://github.com/cooperhewitt/collection http://labs.cooperhewitt.org/2012/releasing-collection-githu...

There are a few other museums that have done so since: https://github.com/tategallery/collection https://github.com/artsmia/collection are the two I can think of right now

The Rijksmuseum and the Walters both have CC0 metadata + images, but through a queryable API instead of downloadable csv/json.

https://www.rijksmuseum.nl/en/api http://api.thewalters.org/

wjnc · on Aug 3, 2015

It would be awesome if someone would use datasets like these to create a valuation of those musea.

Did you know musea do not know / publish the valuation of all of their works? And that they only ever sell works, even those they will never show to the public, to buy new works. And that buy selling a few percent of their assets they could basicly provide free access, with no loss to what works they show to the public. All these gems, and more, from a great article and podcast [1], [2]. (Totally not affiliated, but major recent eyeopeners for me.)

[1] http://www.democracyjournal.org/36/museums-can-change-will-t... [2] http://www.econtalk.org/archives/2015/05/michael_ohare_o.htm...

zz1 · on Aug 3, 2015

[It might not seem so, but this is a constructive criticism]

You clearly have no idea whatsoever about what museums are, how do they work and what do they do.

No, museums don't sell heir works. Only some US museums do so, and they are harshly blamed by all other museums all over the world.

No, works in deposit aren't "works that the public won't see", but essential works to ensure the longest lasting to the pieces usually shown (but periodically put to rest through a planned turnover). No, deposits aren't a trove of unexploited treasures: some important gems and a lot of rubbish.

A museum is not a shop, but a conservation institution: it works on a hundred years perspective. Sure, sell a couple works to grant gratis access for 100k people over 10 years. And what do we do for the 390 next years?

Do you know anything about the art market? Clearly not: there is no value, just trend and fancy. Do you know something about economics? The simple fact that a work is kept in a museum (and thus is outside the market) changes the prices of similar works (if ever such thing existed: we are talking about unique pieces, no one is ever "similar" to another) available on the market. Thus you can't use the market prices of an artist to estimate the value of an artwork (putting another one on the market will dilute value of the other ones). But yet, you should know about history and how a provenance might affect the market price of an artwork.

Do you know something about art history? Because that would teach you one thing or two about how museums are for keeping works and transmit a legacy that outgrows small periods of time, like a lifespan. A museum isn't to sell works, because it isn't to follow present trends (you might want to find out how praised was Caravaggio in the XIXth Century, or Georges de La Tour around the same time).

And please, please, tell me: how would you evaluate museums? The one with most value is the best one? The one with the most items? Well, make an inventory of a museum, first, and then let me know if you changed your mind once you find out that "item" has absolutely no meaning.

Also: your latin purism is plain silly and clearly makes it a bit harder for your reader to understand you (I had to think it over a little bit before finding that it wasn't a typo). Neither Greeks nor Roman (Latins) had museums. The first one was established in Rome in the XVth century: the world "musea" was never a thing.

_delirium · on Aug 3, 2015

Getting an idea of a collection's total value would be interesting. But where would the valuation estimates of individual works come from, to add up? Predicting the likely sale price of a given work of art seems like a pretty difficult prediction problem. The primary data source would probably be past sale prices of similar items, but "similar" has quite a bit of complexity to it, and there isn't even a good public dataset of past sale prices. Most collector-to-collector sales are private, and galleries/dealers are secretive about their own data. As far as I can tell, auctioneers like Sotheby's come up with estimated price ranges for works coming up for sale through fairly labor-intensive, case-by-case research that draws on non-public information about past prices and current market interest (and even their price estimates are frequently way off).

d--b · on Aug 3, 2015

Ha! The data is completely useless. None of it is uniform. Some artists are repeated twice with different names. The dates are totally unusable. The author field may contain one or more authors, etc.

MoMA wanted to look tech-savvy and trendy by publishing this data to github. But it just makes them look like an old institution that doesn't know how to keep their data clean and meaningfully structured. This is a bit sad. Plus Github's really not meant to publish content. Of course you can but it's weird.

Yes, art is hard to classify, but come on... we're talking about one of the richest museums in the world, and they can't properly manage 200k of data? I would be pretty afraid lending them anything...

zachrose · on Aug 3, 2015

I suspect art cataloging has a very specific taxonomy, or even deliberate examples of flaunting taxonomy for artistic reasons.

Normalization risks equating Prince and the artist formerly known as Prince, which is a worse picture of reality for a lot of purposes.

d--b · on Aug 4, 2015

I understand it is impossible to make these things fit in a proper relational model. For instance, modeling dates when artworks are made circa a date, between 2 dates, started by someone and finished by someone else, is very complex. That said, they should at least try. Being able to sort artworks by date seems pretty fundamental to me!

Look at the data released by the Cooper-Hewitt that someone posted below. They actually have some kind of structure!

minimaxir · on Aug 3, 2015

I took a look at the data. The data schema is disorganized to the point that a lot of janitorial work would be necessary to get it useable and perform any analysis or visualization.

For example, some works have a date of "1896" and others have a date of "1976-77" or "c.1937"; artist bios can have nationality, year-of-birth and year-of-death but not necessarily all 3; dimensions can be "12 1/2 × 12 1/4" (31.8 × 31.1 cm)" or "204 x 48 x 48 inches (variable)", etc.

_pmf_ · on Aug 3, 2015

> I took a look at the data. The data schema is disorganized to the point that a lot of janitorial work would be necessary to get it useable and perform any analysis or visualization.

In other words, it is real world data.

mrspeaker · on Aug 3, 2015

This is so cool! (but I can't help but be a little disappointed that there are no image resources include - even some low-res thumbnails would be great!)

danso · on Aug 3, 2015

I've only briefly perused the dataset...but there is a URL data, and many of the works have this filled in. The MoMA has always had one of the better structured websites, so it wouldn't be hard to write a scraper to grab the associated image.

e.g.

http://www.moma.org/collection/works/101730

http://www.moma.org/media/W1siZiIsIjIxNzgzOCJdLFsicCIsImNvbn...

(I guess one thing the MoMA could improve on is moving their image assets to a CDN...that thing took awhile to load)

zz1 · on Aug 3, 2015

For a museum like the MoMA, copyright is a tricky issue, and to publish online pictures requires long enquires about the right owner and getting him (or them) to accept (and they might not, or ask for money...). The cost of this kind of operation for a vast collection can range in the millions of dollars. I guess you can't blame the MoMA for using their money differently, and you'd rather blame it on the copyright mess.

ZoeZoeBee · on Aug 3, 2015

Its absolutely amazing the vast collections inside of Museum storage, most items will hardly ever see display.

ilzmastr · on Aug 3, 2015

Not just museums do this, a lot of collectors I've read about do something like it: http://bit.ly/1N49UaR

ilzmastr · on Aug 3, 2015

I used to work here and think its an amazing API: http://developers.artsy.net

The playground is very cool.