www.sjwaller.com

Portfolio

ArchaeotoolsDecember 1st, 2009

In collaboration with the Natural Language Processing Research Group at the University of Sheffield, Archaeotools has been an ambitious research and development project that has given me an opportunity to explore data mining, faceted classification and E-archaeology.

  • Dates: September 2007 to September 2009
  • Technology: Java EE 5, Apache SOLR
  • Responsibility: Ontology/Thesauri and data preparation, interface development, integration with Redsquid architecture.

The first aim was to index the ADS database of over one million metadata records describing sites and monuments in the UK, according to three criteria: When, What and Where. The project used the techniques of faceted classification, derived from information science and demonstrated in the Archaeobrowser project, to allow users to easily and intuitively navigate the ‘three-dimensional space’ created by the classification scheme. A map-based interface was developed to allow the spatial dimension to be best explored.

Secondly the project employed natural language processing (NLP) that allowed automated tools to search within documents for terms which are part of known classification schemes, adding them to the faceted index, and providing much deeper and richer access to unpublished archaeological literature.

Thirdly, these tools were also employed to investigate whether it was also possible to identify and harvest index terms within older antiquarian literature as represented by back runs of archaeological journals currently being digitised and being made available online. Natural language processing allowed the recognition and harvesting of place names which were then supplied to existing services (GeoCrossWalk) which could look up the names in an online gazetteer of names and return precise grid coordinates which were added to the index.

Visit the project website: http://ads.ahds.ac.uk/project/archaeotools

Leave a Reply