How should I position my talk? How many in the room are technical, how many are in management?
Exploring integrated search. Trying to change the way that we are serving the information to our users. Why do we want to to do this? Our integrated catalog project called is Summa, which will be open sourced this coming Friday (December 14th).
Behind the search program: user expectations and a survey – tried to observe users. had them fill out diaries for 28 days. Statements tested with a questionnaire. Determined 3 different personas of libray users:
- library enthusiasts: know about resources and databases, these are the ones we like. the minority.
- drive-in users: we don’t see them, spand as little time in library as possible, they think they know how to search, they think they are expert
- working students: students who sit in library to study, but don’t use resources. Feel they work better in the library. want peace and quiet and books on shelves to surround them.
Results of study: library users don’t use resources as we expect them to, they use other tools to locate resources. library catalog best only at finding a known item, rather than discovery. Students use amazon to find books then come back to the catalog to search for specific titles.
Question asked of users n surveyv- how do you discover resources? Google, chaining (using a known source to browse), OPAC, danish union catalog, advice from teacher, advice from peers, advice from librarian. students using Google tend to know about library resouces as well.
How do we capture the appeal of Amazon? Two disparate worlds of research – librarians have a number of tools to use, whereas students have problems, research topics, goals, and tasks. Suggestions, advisement, user involvement, should be part of the blending of these two worlds.
Inhibitors to using licensed library resources – lack of awareness, difficulty navigating library website, students try to search catalog for articles not understanding it’s not meant for this, authentication barriers.
Let users tailor the way search results are displayed. Including different werbservices in catalog searching such as research portals – generally well perceived.
Verificative search (exact terms, federated search) v. exploratory search (approximate search terms, tools to support refinement are essential, need to operate on all available data to make this functional).
Google was used by students more often for factfinding than to identify and locate articles.
How do we liberate the data from catalogs and databases? We need to develop a methodology that has a discovery layer, a logistics laer, and a delivery layer that are kept separate.
Summa – open source catalog project. Idea behind it: A variety of information containers (OPACs a,b,c, e-journal, institutional repositories) that are searched by index using discovery-based and explorative searches. Resource delivery component still needs work.
Summa – what is it? A search engine and front-end, for indexing metadata or full-text, integrated search, modular, open source
What it’s not – it’s not a library system. Nothing to do with federated search. Not another Google. It’s a resource discovery layer on top.
How Summa is structured – Storage layer, then computational layer, then business layer, then presentation layer. White paper that describes Summa in more detail that will accompany the open source release of Summa. http://www.statsbiblioteket.dk/summa or wiki.statsbiblioteket.dk/summa to try Summa.
Search results – we’ve “stolen” Google suggested alternate search terms, the “did you mean?” feature, basket feature of Amazon, relevancy ranking (or by year or title). “Try to do what people expect you to do.” Subject librarians are integrated in searhc results – survey results showed that people had little knowledge that there were subject librarins. Combines human part of the search with library resources.
“Cluster analysis” – groups search terms by subject, broadly described (not controlled vocabulary). Principle of using different mechanisms to guide a search – suggestions for narrowing searches down.
Nothing in Summa is being done for the first time – “we steal with honor.” Open sourced to gain the wisdom and innovation of the community.
Relevance and Quality – results can be ranked by users, helps us sort out how we define “relevance.”
Protecting the data from obscurity – when corporations digitize books, what information will be made available to libraires and at what cost?