Sunday, 30 October 2011

Information retrieval in the field of legal research

A study conducted in 2000 estimated that the amount of data produced per year was the equivalent of about 250 megabytes per person on the planet (Lyman, et al., 2000). When the study was carried out again in 2003 the number had increased to 800 megabytes (Lyman, et al., 2003). Eight years later, it would be safe to assume that number has increased again - possibly dramatically so. It is clear then, that we need technologies to help us store, search for, and retrieve information efficiently. This essay will look at one such technology, information retrieval systems, in the context of legal research and using the website Westlaw UK as an example.

Westlaw UK is one of the leading online research services for legal professionals in the UK. The website covers a wide range of legal materials including case law covering UK court decisions going back to 1220, legislation covering Acts dating back to 1267, and EU law. Westlaw UK also has access to thousands of full-text articles and over half a million article abstracts from specialist legal journals, as well as providing a current awareness service and coverage of over 1000 full text news sources (Thomson Reuters (Professional) UK Limited, 2011).

When presented with this staggering amount of material, how can users locate information that is relevant to their needs? Although it is possible to browse through much of the content available on Westlaw UK, doing so is unlikely to satisfy a user’s information need efficiently and effectively. Users, then, will need to use the website’s search system, following the information retrieval (IR) model illustrated by Broder (2002).

At first glance, the classic IR model appears to illustrate a fairly simple process. A user comes to an IR system with an information need they are seeking to fulfill. The user submits a query to the system, which selects documents matching the query. After evaluating the relevance of the results, the user may need to refine their query, repeating this process until the initial information need is (hopefully) satisfied. (Broder, 2002)

In practice, however, users need to be aware of a number of issues regarding IR systems in general, as well as the search options available on the particular IR system they are using. To look more closely at some of these issues and the search system on Westlaw UK, let us take a hypothetical example – a user (perhaps a solicitor) researching case law on who is liable for a fire that damages a neighbour’s property.

Westlaw UK’s front page presents users with a basic search option – a single field for entering search terms, along with five areas that can be searched within: cases, legislation, journals, current awareness, and the European Union. All five areas are ticked by default, but users can narrow their search by deselecting areas.

Submitting a query using ‘fire’ as a search term on the front page and selecting only cases brings up 4000 results - this is Westlaw UK's limit and users are told that their search has returned too many results. Users will need to reformulate or refine their search in some way, most commonly by modifying the search using Boolean operators such as AND, OR, and NOT, or adding more search terms. Our hypothetical user does both, as well as using Westlaw UK’s truncation character ‘!’ and searches for ‘fire’ AND ‘neighbour’ AND ‘liab!’. (It should be noted that Westlaw UK automatically assumes the use of AND between search terms, so it doesn’t need to be entered.) However, this search comes up with 639 results; much reduced from the initial 4000, but still far too many to evaluate efficiently.

Looking through these results reveals that while the recall of the search is high, the precision is very low (Schneiderman, Byrd and Croft, 1997). In other words, the search terms have been found, but they are not relevant to our user’s information need, with the search terms appearing in a number of very different contexts, such as a case involving a neighbour's mistaken access to a fire escape and their liability for damages caused due to this mistaken access (Ramzan v Brookwide Ltd, 2011).

At this point it becomes clear that users need a way of further refining their search beyond using Boolean operators and adding terms to the search string. For each search result Westlaw UK displays subjects and keywords; these are terms that are indexed in the website’s Legal Taxonomy (Thomson Reuters (Professional) UK Limited, 2011). Looking through the keywords for the search results above, ‘fire’ appears a number of times, in terms such as ‘fire’, ‘fire precautions’, and ‘fire escapes’. Unfortunately users cannot search by keyword on the website’s basic search page. It is necessary, then, for users to access the site’s advanced search options.

Westlaw UK provides these search options in a number of what Morville and Rosenfeld call ‘search zones’ (2007, p.151): cases, legislation, journals, current awareness, EU, books, and news. Each search zone presents a number of searchable fields, corresponding to terms indexed on the website. For example, the cases search zone allows users to search by (amongst other things) judge, court, and keyword.

Using ‘fire’ as a keyword in the cases search zone returns 683 results. Searching for ‘fire’ as keyword along with ‘neighbour’ in the free text field returns seven results, a much more manageable number.
  
Users need to be careful, though – a search that is very precise comes at the expense of recall (Morville and Rosenfeld, 2007, p.159). In other words, it's possible to construct a search that is too precise and actually miss results that are highly relevant. For a solicitor basing an argument upon precedents in case law, this could be disastrous.

In the above example the system has only searched for the term ‘neighbour’ - terms such as ‘neighbourhood’ and ‘neighbouring’ would have been excluded. Running the search again using ‘fire’ as a keyword and ‘neighbour!’ in the free text field returns 20 results. One of the new cases found (Maloco v Littlewoods Organisation Ltd, 1987) addresses a situation of neighbouring properties being damaged by a fire - something directly relevant to our hypothetical user’s information need.
   
As we can see then, using information retrieval systems to carry out legal research efficiently requires a degree of knowledge and skill from users. Users need to not only be familiar with IR issues such as the use of Boolean operators, truncation characters and recall versus precision, but also need to be aware of the various search options available on the IR system itself. As the volume of information we are required to navigate continues to grow these skills will become ever more indispensable. 

REFERENCES

Broder, A. (2002) A taxonomy of web search, SIGIR Forum, [online]. Available at: http://www.sigir.org/forum/F2002/broder.pdf [Accessed: 23 October 2011].



Eeles, C. (2011) Information retrieval in the field of legal research, Imaginary neko, [blog] 30 October. Available at: http://imaginaryneko.blogspot.com/2011/10/information-retrieval-in-field-of-legal.html [Accessed: 30 October 2011]


Lyman, P. et al. (2000) How much information? [online] University of California. Available at: http://www2.sims.berkeley.edu/research/projects/how-much-info/ [Accessed: 23 October 2011].

Lyman, P. et al. (2003) How much information? [online] University of California. Available at: http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/ [Accessed: 23 October 2011].

Maloco v Littlewoods Organisation Ltd [1987] 2 W.L.R. 480.

Morville, P. and Rosenfeld, L. (2007) Information architecture for the World Wide Web. 3rd ed. Sebastopol: O'Reilly Media.

Ramzan v Brookwide Ltd [2011] N.P.C. 95.

Schneiderman, B., Byrd, D. and Croft, W.B. (1997) Clarifying search: a user-interface framework for text searches, D-Lib Magazine, [online]. Available at: http://www.dlib.org/dlib/january97/retrieval/01shneiderman.html [Accessed: 23 October 2011].

Thomson Reuters (Professional) UK Limited (2011) Westlaw UK. [online] Available at: http://www.westlaw.co.uk [Accessed: 25 October 2011].

No comments:

Post a Comment