hoard it: stealing your data #mw2009
"illegal" collections spidering project, see www.hoard.it
How: screen-scaper + spider Difficulties:
- must have collections online
- must have a consistent template
- not real time
- technical variations
- flash/forms a barrier
- normalization of dates and location They spidered 20 museumwebsites and 70.000 objects
most dublin core data spidered What most change:
Services are king! (not data and content)
Execution more important than ideas.
in-house development (seems a trend this conference, ima also does everything inhouse...)
use simple apis, (Rest not soap/oai)
Produce value elsewhere.
How: screen-scaper + spider Difficulties:
- must have collections online
- must have a consistent template
- not real time
- technical variations
- flash/forms a barrier
- normalization of dates and location They spidered 20 museumwebsites and 70.000 objects
most dublin core data spidered What most change:
Services are king! (not data and content)
Execution more important than ideas.
in-house development (seems a trend this conference, ima also does everything inhouse...)
use simple apis, (Rest not soap/oai)
Produce value elsewhere.
