Sunday, 8 January 2012

Web 2.0 and the legal sector

Introduction

Web 2.0 can be described very briefly as a collection of Internet tools that allow users to create and publish digital content just as easily as they can access online content published by other people. This has had an enormous impact on society, culture, and industry, transforming the way in which we communicate and do business.

The legal sector, which can have a reputation for being rather conservative and slow to change, has begun to adapt in order to take advantage of the possibilities offered by Web 2.0 technologies. For example, in 2008 the Australian Capital Territory Supreme Court granted permission to serve legal documents via Facebook (BBC News, 2008), while in 2009 the High Court in England allowed an injunction to be served via Twitter (Johnson, 2009). In 2011 the Lord Chief Justice issued guidance on the use of live text communication by journalists in courts in England and Wales, allowing reporters to use mobile email, social media (such as Twitter) and laptops in court without having to make a previous application to do so (Lord Chief Justice, 2011).

A number of law firms, like many other commercial businesses, have attempted to harness the social media element of Web 2.0 as a marketing tool, launching blogs about law and legal practice (referred to as “blawgs” – Holmes, 2011) and creating Twitter accounts (this has seen varying degrees of success – a report by web consultancy Intendance found that while 66% of the top 50 UK law firms had created at least one account on Twitter, 19 of those accounts had tweeted nothing at all, running the risk of damaging the firms’ reputations by creating an image of laziness. Rothwell, 2011).

However, the use of Web 2.0 tools for marketing and business development purposes is very much customer or client facing. How are these technologies being used within law firms to publish and share information in ways that are accessible and effective? How are information professionals in the legal sector using these technologies to perform their tasks more efficiently? What innovative methods and tools does Web 2.0 offer for representing and organising digital data? This post will examine these questions, with a focus on law firms in the UK.

Publishing and sharing information in law firms: wikis and blogs

For law firms, like most organisations, the primary form of communication is by email. Communicating via email has a number of advantages, such as speed, security, and the ability to attach digital documents (for example, spreadsheets and photos). However, email also has a number of problems, particularly in regards to distributing and accessing information widely.

For example, emails can limit access to information – anything disclosed in an email is only accessible to whomever it was sent to. What happens if, at a later date, someone in the firm who didn’t receive the original email would benefit from knowing the information? What if they didn’t even know who sent or received the original message? What if they weren’t even aware the information existed at all? Email, then, can be viewed as a closed system (Gould, 2010).

Emails can also lead to a state of information overload, with lawyers becoming snowed under by the number of messages in their inbox, which could lead to crucial updates being missed.

These problems prompt a number of questions. How can firms, particularly large ones, leverage the knowledge of the people in their organisations? How can people in different offices, or even in different countries, share information easily and effectively?

Firms are starting to turn to wikis and blogs as a way of resolving these issues. The open, social nature of these tools allows people to collaborate, share and publish information much more easily and much more widely. For example, important documents that need to be referred to often by a group of people (for example, instructions regarding a particular client) can be turned into blog posts or wikis, enabling them to be updated easily by members of the group, as opposed to the much more involved process of locating the file the document is in, updating the document, and then re-uploading it to the firm’s intranet.

Wikis and blogs also allow people to access and discover information much more easily, regardless of where or how it is stored. Having this information openly available, instead of in an email in someone’s inbox, resolves the issue of people asking for the information to be resent, as well as the issue of people not even knowing the information exists.


An example of a law firm that has successfully implemented the use of wikis and blogs to create a much more open system of publishing and sharing information is Addleshaw Goddard. A particularly interesting element of their system is a plugin that allows the use of tags across the whole blog system. The following quote from Mark Gould, Head of Knowledge Management at the firm, explains how this works:

“As people begin to tag their content, it becomes possible to track activity wherever it appears. So, for example, the BD [Business Development] team might write about an opportunity they are pursuing with a particular company and tag it with the name of the company. At the same time, a post on the PDP [Partner Development Programme] blog might also mention the company and be tagged as well. The tag collection allows readers to click on a tag and read all the blog posts with that tag, so the BD and PDP posts would appear together, even though they exist on separate blogs.” (Gould, 2010, pp.11-12)

The use of wikis and blogs can be seen then to help people within a firm become more aware of the information and expertise available within their organisation – essentially, the firm becomes more self-aware, and as a consequence becomes more efficient.

Information Professionals: current awareness and RSS feeds

One of the most important services provided by information professionals in law firms is what is referred to in the industry as “current awareness” (Edwards, 2009). It is vital for lawyers to be up to date on any developments in their chosen areas of practice, such as amendments to legislation, judgments that set new legal precedents, publication of government and industry reports, announcements by agencies and executives, updates to any cases of interest that are currently in court, and statements by judicial bodies, along with news articles and commentary in legal journals and publications. In law firms the task of monitoring these developments falls to the information professionals employed by the firm. Carrying out this task manually by regularly visiting relevant websites to check for updates and scanning through journals and newspapers would be an incredibly time-consuming and laborious process for a large team, if not outright impossible for a smaller team.

RSS feeds, one of the key elements of Web 2.0, are of enormous benefit for information professionals carrying out current awareness. RSS (“Really Simple Syndication” or “Remote Site Syndication”) is a format which feeds information to a browser, as opposed to a browser pulling information from the web. This means that when a website is updated with new content, the RSS feed for the site delivers that same content (or an excerpt, or a link to it) to the browsers of interested readers. As a consequence, by selecting the RSS feeds of sites and information sources they are interested in, users can create personalized summaries of new content. This benefits users by saving them the time and effort of regularly visiting the sites themselves to see if anything new has been added, and ensures they don’t miss anything important.

Because information is so vital to the legal sector, it comes as no surprise that many online legal information sources provide RSS feeds. For example, Lawtel, one of the leading legal news and research sources on the Internet, provides RSS feeds focusing on practice areas (for example, human rights or personal injury) and particular types of content (such as case law or legislation), while the PLC (Practical Law Company) website, another popular online legal resource, also provides RSS feeds focusing on practice areas (cite – Lawtel and PLC). The BAILII (Britsh and Irish Legal Information Institute) website has RSS feeds for the judgments and decisions of individual courts throughout the U.K. and Ireland.

Monitoring these RSS feeds instead of checking for updates manually allows information professionals to carry out their current awareness tasks much more efficiently and accurately.

Why not publish the RSS feeds in a single place, such as a page on the firm’s intranet? Not every lawyer in a firm will work in the same practice areas, or be following the same cases, or be interested in the same news. Publishing all of the feeds together would still lead to too much information for lawyers to trawl through.

As a consequence it is still necessary for a law firm’s information professionals to monitor RSS feeds, selecting and sending out relevant updates to interested lawyers, or groups of lawyers. In large firms with hundreds of lawyers and many different practice areas this process of monitoring, sorting, collating, and resending information can still take up some time. This leads to the need for ever more innovative methods and tools for organising digital information.

Innovative methods and tools: smart alerts

A number of UK law firms have begun using RSS aggregator services such as Linex (used by Macfarlanes) and Attensa StreamServer (used by Reynolds Porter Chamberlain) to create ‘smart alerts’ (Attensa, 2011; Linex Systems Ltd, 2011). These services bring together the content of RSS feeds along with content from paid subscription sources into a single place, and then automatically create targeted custom alerts for groups and individuals in the firm.

These alerts can be published to intranet pages (such as pages for different divisions in the firm), delivered as personalized emails, or (in the case of StreamServer) delivered to user’s personal web dashboards.

Furthermore, these services also aggregate content generated within the firm, picking up on content added to the firm’s wikis, blogs, and intranet pages. Such services help firms to further capitalize on the investment they have made in social tools such as wikis and blogs, heightening awareness of the knowledge and expertise that exists within the firm.

Conclusion

It can be seen that the use of wikis, blogs, and RSS feeds is allowing law firms to share and distribute information within their organisations much more easily. The same tools also allow information professionals working within law firms to perform their tasks more efficiently. However, with firms starting to introduce social tools and frameworks we are seeing what could be the start of a paradigm shift – a move away from a document-centered view of information to something else.



References

Anon (200-) What is RSS? RSS explained, [online]. Available at: http://www.whatisrss.com/ [Accessed: 30 December 2011].

Attensa (2011) Attensa StreamServer, [online]. Available at: http://www.attensa.com/what-is-streamserver/ [Accessed: 4 January 2012].

BBC News (2008) Legal papers served via Facebook, [online] 16 December. Available at: http://news.bbc.co.uk/1/hi/7785004.stm [Accessed: 28 December 2011].

British and Irish Legal Information Institute (2012) BAILII RSS Feeds, [online]. Available at: http://www.bailii.org/rss/ [Accessed: 30 December 2011].

Edwards, P. (2009) Using Web 2.0 within the organisation, Internet Newsletter for Lawyers & Law 2.0, September/October, pp.11-12.

Eeles, C. (2012) Web 2.0 and the legal sector, Imaginary neko, [blog] 9 January. Available at: http://imaginaryneko.blogspot.com/2012/01/web-20-and-legal-sector.html [Accessed: 9 January 2012].

Gould, M. (2010) Social software at Addleshaw Goddard, Internet Newsletter for Lawyers & Law 2.0, March/April, pp.10-12.

Holmes, N. (2011) Blogging – been there, done that?, Internet Newsletter for Lawyers, January/February, pp.1-2.

Johnson, B. (2009) High court approves injunction via Twitter, Guardian, [online] 1 October. Available at: http://www.guardian.co.uk/technology/2009/oct/01/twitter-injunction [Accessed: 28 December 2011].

Linex Systems Ltd (2011) Linex Systems, [online]. Available at: http://www.linexsystems.com/ [Accessed: 4 January 2012].

Lord Chief Justice (2011) Guidance on Live, Text-Based Communications from Court, Judiciary of England and Wales, [online]. Available at: http://www.judiciary.gov.uk/publications-and-reports/guidance/2011/courtreporting [Accessed: 28 December 2011].

O’Reilly, T. (2006) Web 2.0 Compact Definition: Trying Again, [online] 10 December. Available at: http://radar.oreilly.com/2006/12/web-20-compact-definition-tryi.html [Accessed: 30 December 2011].

Practical Law Publishing Limited; Practical Law Company Limited (2012) Practical Law Company, [online]. Available at: http://uk.practicallaw.com/ [Accessed: 4 January 2012].

Rothwell, R. (2011) Law firms’ poor use of Twitter risks ‘damaging their brand’, Law Society Gazette, [online] 17 January. Available at: http://www.lawgazette.co.uk/news/law-firms-poor-use-twitter-risks-039damaging-their-brand [Accessed: 29 December 2011].

Thomson Reuters (Professional) UK Limited (2012) Lawtel, [online]. Available at: http://www.lawtel.com/Login.aspx [Accessed: 4 January 2012].

Thursday, 24 November 2011

DITA - week five: Web 2.0 (Part One: exactly what the hell is Web 2.0?)

I like to think I'm fairly "with it" when it comes to the Web. Mention "Web 2.0" technologies to me and I'll probably say, "hmmm" and nod sagely. But if you held a gun to my head and demanded that I explain what Web 2.0 is, I'd probably mumble something like, "erm, Facebook, um, social media, and ah, something about blogs, maybe?" (I'd probably also ask if the gun was really necessary for the conversation.)

So, what exactly is Web 2.0? Well, it doesn't help that there is no standard, agreed upon definition of the term - which means that everyone wants to have a go at defining it. Tim O'Reilly, who is closely associated with the term thanks to a conference in 2004, describes it thus:


Web 2.0 is the business revolution in the computer industry caused by the move to the internet as platform, and an attempt to understand the rules for success on that new platform. Chief among those rules is this: Build applications that harness network effects to get better the more people use them.

Clear on that? Yeah, me neither. Let's see if we can try this again (although we will return to a couple of points in O'Reilly's definition) ...

A simple way of defining and thinking about Web 2.0 is as a series of internet services where users find it just as easy to create and publish their own content as to read other people's content. In other words, we have reached the point where technologies have become cheap enough, accessible enough, and easy enough to use that you don't need to be an internet guru to put content up on the Web. In fact, I'm doing it right here, right now, for free, with this humble blog.

There's certainly no way I could have been doing this when I first started using the Web in earnest back in 1998. This has led some to describe Web 1.0 as the "read-only" Web, with most users searching for and consuming content, but (by and large) not creating it themselves - a rather passive experience. Taking the analogy further, Web 2.0 could be described as the "read/write" Web, with users continuing to search for content, but now also creating it themselves - a much more active and interactive experience. (I'm trying so very hard not to use the word prosumer here!) Tim Berners-Lee however, none other than the inventor of the Web, takes exception to this description - he claims that the Web was always intended to be "read/write", so he views the term Web 2.0 as just a piece of jargon.

Nonetheless, we reached a point somewhere around 2005/2006 that it became just as easy for us to create/publish content on the Web as to read other people's content. (It's interesting to note that no new technology was developed in order for this change to happen - as I said above it's only because existing technologies have become cheaper and easier to use.) This has led us to some pretty interesting developments:

  • Collaboration: As well as being incredibly easy now for people to create and publish their own content online, it's just as easy for people to comment on each other's content, and to interact and collaborate with each other. Blogs and blog comments are a very simple example of this. 
  • Creation of online "social spaces": Taking the idea of collaboration/interaction further has led people to create virtual online communities and social spaces, where users can interact and share ideas, photos, videos, jokes ... well, the list of what people can share is virtually endless these days. Social networking sites such as Facebook are an example of this. (This takes us back to O'Reilly's quote above about "applications that harness network effects to get better the more people use them." In other words, what he meant was that applications like Facebook get better/more interesting the more people use them. You could go and sign up for Friendster, but it wouldn't be a very interesting experience if your network of friends is over on Facebook.)
  • Dynamic content: Because there are now so many of us constantly creating and publishing new content online, the content on Web 2.0 sites isn't static, but is instead dynamic - constantly changing. This is referred to as the "flow internet", and requires tools such as RSS to manage effectively without feeling like you're drowning in information.
  • Internet as platform: Once upon a time, computers were platforms, and a user's experience would differ significantly depending on what platform they were using (for example, Mac versus Windows). Today, computers are becoming our portals onto the internet - and it is the internet itself which is the platform everything operates on (this again takes us back to the O'Reilly quote - he explicitly mentions "the internet as platform" as part of Web 2.0). More and more of our data, and even the applications we use, is being stored on the internet, and not on our computer's hard-drives. 

Looking through this list, the benefits of Web 2.0 are no doubt apparent, such as the ability to share and exchange information like never before, to connect and interact with people the world over, and the democratisation of many kinds of discourse. Of course, I'm sure a number of problems and issues have occurred to you too, such as the trivialisation of discourse, loss of privacy, and a sense of drowning in an ever-expanding sea of "updates".

One thing's for sure though: we're all becoming prosumers now, producing and consuming more and more content - the floodgates have opened, and there's no going back.

Sunday, 30 October 2011

Information retrieval in the field of legal research

A study conducted in 2000 estimated that the amount of data produced per year was the equivalent of about 250 megabytes per person on the planet (Lyman, et al., 2000). When the study was carried out again in 2003 the number had increased to 800 megabytes (Lyman, et al., 2003). Eight years later, it would be safe to assume that number has increased again - possibly dramatically so. It is clear then, that we need technologies to help us store, search for, and retrieve information efficiently. This essay will look at one such technology, information retrieval systems, in the context of legal research and using the website Westlaw UK as an example.

Westlaw UK is one of the leading online research services for legal professionals in the UK. The website covers a wide range of legal materials including case law covering UK court decisions going back to 1220, legislation covering Acts dating back to 1267, and EU law. Westlaw UK also has access to thousands of full-text articles and over half a million article abstracts from specialist legal journals, as well as providing a current awareness service and coverage of over 1000 full text news sources (Thomson Reuters (Professional) UK Limited, 2011).

When presented with this staggering amount of material, how can users locate information that is relevant to their needs? Although it is possible to browse through much of the content available on Westlaw UK, doing so is unlikely to satisfy a user’s information need efficiently and effectively. Users, then, will need to use the website’s search system, following the information retrieval (IR) model illustrated by Broder (2002).

At first glance, the classic IR model appears to illustrate a fairly simple process. A user comes to an IR system with an information need they are seeking to fulfill. The user submits a query to the system, which selects documents matching the query. After evaluating the relevance of the results, the user may need to refine their query, repeating this process until the initial information need is (hopefully) satisfied. (Broder, 2002)

In practice, however, users need to be aware of a number of issues regarding IR systems in general, as well as the search options available on the particular IR system they are using. To look more closely at some of these issues and the search system on Westlaw UK, let us take a hypothetical example – a user (perhaps a solicitor) researching case law on who is liable for a fire that damages a neighbour’s property.

Westlaw UK’s front page presents users with a basic search option – a single field for entering search terms, along with five areas that can be searched within: cases, legislation, journals, current awareness, and the European Union. All five areas are ticked by default, but users can narrow their search by deselecting areas.

Submitting a query using ‘fire’ as a search term on the front page and selecting only cases brings up 4000 results - this is Westlaw UK's limit and users are told that their search has returned too many results. Users will need to reformulate or refine their search in some way, most commonly by modifying the search using Boolean operators such as AND, OR, and NOT, or adding more search terms. Our hypothetical user does both, as well as using Westlaw UK’s truncation character ‘!’ and searches for ‘fire’ AND ‘neighbour’ AND ‘liab!’. (It should be noted that Westlaw UK automatically assumes the use of AND between search terms, so it doesn’t need to be entered.) However, this search comes up with 639 results; much reduced from the initial 4000, but still far too many to evaluate efficiently.

Looking through these results reveals that while the recall of the search is high, the precision is very low (Schneiderman, Byrd and Croft, 1997). In other words, the search terms have been found, but they are not relevant to our user’s information need, with the search terms appearing in a number of very different contexts, such as a case involving a neighbour's mistaken access to a fire escape and their liability for damages caused due to this mistaken access (Ramzan v Brookwide Ltd, 2011).

At this point it becomes clear that users need a way of further refining their search beyond using Boolean operators and adding terms to the search string. For each search result Westlaw UK displays subjects and keywords; these are terms that are indexed in the website’s Legal Taxonomy (Thomson Reuters (Professional) UK Limited, 2011). Looking through the keywords for the search results above, ‘fire’ appears a number of times, in terms such as ‘fire’, ‘fire precautions’, and ‘fire escapes’. Unfortunately users cannot search by keyword on the website’s basic search page. It is necessary, then, for users to access the site’s advanced search options.

Westlaw UK provides these search options in a number of what Morville and Rosenfeld call ‘search zones’ (2007, p.151): cases, legislation, journals, current awareness, EU, books, and news. Each search zone presents a number of searchable fields, corresponding to terms indexed on the website. For example, the cases search zone allows users to search by (amongst other things) judge, court, and keyword.

Using ‘fire’ as a keyword in the cases search zone returns 683 results. Searching for ‘fire’ as keyword along with ‘neighbour’ in the free text field returns seven results, a much more manageable number.
  
Users need to be careful, though – a search that is very precise comes at the expense of recall (Morville and Rosenfeld, 2007, p.159). In other words, it's possible to construct a search that is too precise and actually miss results that are highly relevant. For a solicitor basing an argument upon precedents in case law, this could be disastrous.

In the above example the system has only searched for the term ‘neighbour’ - terms such as ‘neighbourhood’ and ‘neighbouring’ would have been excluded. Running the search again using ‘fire’ as a keyword and ‘neighbour!’ in the free text field returns 20 results. One of the new cases found (Maloco v Littlewoods Organisation Ltd, 1987) addresses a situation of neighbouring properties being damaged by a fire - something directly relevant to our hypothetical user’s information need.
   
As we can see then, using information retrieval systems to carry out legal research efficiently requires a degree of knowledge and skill from users. Users need to not only be familiar with IR issues such as the use of Boolean operators, truncation characters and recall versus precision, but also need to be aware of the various search options available on the IR system itself. As the volume of information we are required to navigate continues to grow these skills will become ever more indispensable. 

REFERENCES

Broder, A. (2002) A taxonomy of web search, SIGIR Forum, [online]. Available at: http://www.sigir.org/forum/F2002/broder.pdf [Accessed: 23 October 2011].



Eeles, C. (2011) Information retrieval in the field of legal research, Imaginary neko, [blog] 30 October. Available at: http://imaginaryneko.blogspot.com/2011/10/information-retrieval-in-field-of-legal.html [Accessed: 30 October 2011]


Lyman, P. et al. (2000) How much information? [online] University of California. Available at: http://www2.sims.berkeley.edu/research/projects/how-much-info/ [Accessed: 23 October 2011].

Lyman, P. et al. (2003) How much information? [online] University of California. Available at: http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/ [Accessed: 23 October 2011].

Maloco v Littlewoods Organisation Ltd [1987] 2 W.L.R. 480.

Morville, P. and Rosenfeld, L. (2007) Information architecture for the World Wide Web. 3rd ed. Sebastopol: O'Reilly Media.

Ramzan v Brookwide Ltd [2011] N.P.C. 95.

Schneiderman, B., Byrd, D. and Croft, W.B. (1997) Clarifying search: a user-interface framework for text searches, D-Lib Magazine, [online]. Available at: http://www.dlib.org/dlib/january97/retrieval/01shneiderman.html [Accessed: 23 October 2011].

Thomson Reuters (Professional) UK Limited (2011) Westlaw UK. [online] Available at: http://www.westlaw.co.uk [Accessed: 25 October 2011].

Tuesday, 11 October 2011

DITA - week three: the joys of structuring and querying information stored in databases

"Woah, hold up there a minute!" I hear you say. "Week three? What happened to week two?" Erm, yes, I regret to say there's been a change to our regularly scheduled DITA services due to me still grappling with some of the finer points of the week two tasks - I'll be finishing those shortly (after which you will all be able to marvel at my mad HTML skillz), but for now let's talk about databases and (more importantly for us) how to query them using SQL.

Databases - what they're good for

First of all, why are databases good? Databases allow us to centralize information, thereby reducing the inevitable inconsistencies that creep in when that same information is stored in a number of different places. The example given in the lecture was that of a company, with different departments of that company storing information on its employees. If each department (for example, human resources, accounts/payroll, etc.) keeps their own files on employees (recording information on, for example, home address, phone number, bank account, etc.) you've got a lot of information being duplicated. And when any of that information changes (let's say somebody's phone number), each department will need to record that change. That's a lot of work being repeated, and once you start dealing with a large number of employees it's inevitable that someone somewhere won't update that information. It's much simpler to store all that information in one place that can be accessed and searched by all of the different departments - changes only need to be made once, saving time, eliminating redundant information, and minimizing inconsistencies. In other words, bring on the database!

Databases are also good because an organization (or individual) can structure it to suit their own needs. They can be made homogeneous in structure, and can be designed to create efficient searches. These three points, however, apply when you create or own the data involved - if it's not your data a database isn't so appropriate (information retrieval becomes more appropriate in that case - but more on that in week four).

What exactly is a database?

Okay, so we know that databases are useful - but what exactly is a database? Basically, a database is a collection of data tables, where each table is a two-dimensional table of data (in other words, rows and columns).  Each table in the database describes the attributes of a single thing - or, to use the technical term, an entity. For example, one table in a database could be about publishers, with that table listing different attributes of publishers, such as the name, address, city, and phone number. Another table in the same database could be about book titles, listing attributes such as title, year published, ISBN.

Each table in a database will also share a relationship with one other table in that database. In other words, each table will share a piece of information (an attribute) with one other table - for example, your publishers table and book titles table may both contain the attribute publisher ID. These relationships are extremely important, as they allow the tables in the database to all connect to each other - which in turn allows us to search across the tables of the database, pulling out the information we are looking for.

Searching databases with SQL


We search databases using SQL - Structured Query Language - which is a language that allows communication with a DBMS (Database Management System). Using SQL we can create databases, and insert, modify, and delete data from the data tables, but here we'll be focusing on using SQL to query what's in the data tables.

The basic syntax of an SQL search is pretty simple. You:

Search fields
From tables in the database
Where something is true;

That semicolon at the end is important, by the way - it signifies the end of the command.

So, if you wanted to see all of the names of the publishers in the publishers table, you would type:
search name from publishers;

If you wanted to put in some kind of condition, for example, all of the publishers from New York, you would type:
search name from publishers where city = "New York";

Of course, what starts off as being "pretty simple" quickly turns into "headache inducingly complex". To create more precise searches there are a number of different operators you can use. For example:

select * allows you to select everything
= is equal to
< is less than
> is greater than
<= is less than or equal to
>= is greater than or equal to
<> is not equal to
order by (column name) allows you to order the results of your search in a particular way
order by (column name) desc allows you to order the results in descending order
% is your wildcard character. Importantly, this isn't use with =, but is instead using with like. For example, select title from titles where title like "%philosoph%"; will give you all of the titles that have variations of the word philosophy anywhere in the title.
and
or
not


Using these operators allows us to create very precise searches - however, it's important in SQL to enter these in precisely, and in the correct order. Making a mistake on this, or missing anything out, the database will refuse to play ball with you.

Searching across the tables in a database is particularly tricky. In order to do this you'll need to know the relationship between the different data tables. You then need to put this relationship in your SQL command, joining the two tables. Earlier, I gave the example of a publishers table and book titles table both containing the attribute publisher ID. To get information from both of these tables you need to mention that particular relationship in your SQL command. So, if you wanted to find out the titles and publishers of books that had variations of the word philosophy in their title, you would type:
select title, company_name from publishers, titles where publishers.pubid = titles.pubid and title like "%philosoph%";

The part in the command where publishers.pubid = titles.pubid is what joins the two tables together, thereby allowing you to search across both tables.

Okay, this post is already far too long - time to wrap things up with some thoughts and reflections on the exercises for this week.

Reflections on this week's exercises


I found this week's exercises to be a little hard-going at first, although by the end of the session I think I had started to wrap my head around using SQL. Understanding the relationships between the tables in a database seems pretty crucial - you can do this with the commands show tables; and desc (table name); - the first command shows you what tables are available, and the second shows you the details of one of the data tables. By going through each data table you'll see what the relationships are between them (which you really need to know for searching across the database).

Mistakes that I made early on were forgetting to end my commands with ; and forgetting to put in underscores in attribute names (for example, typing in select year published instead of select year_published). I realized pretty quickly that I wasn't going to make much progress if I kept forgetting these!

Getting the syntax of commands exactly right is something that's extremely important. One of the questions was to find the name of the publisher who published a book with the ISBN 0-0280074-8-4. I typed select name from publishers, titles where publishers.pubid = titles.pubid and isbn = 0028007484;


This didn't work. I needed to type in the ISBN with the dashes. However, that didn't work either. Why? It turns out I was forgetting to put quotation marks around the ISBN - it should've been isbn = "0-0280074-8-4";


I need the quotation marks because the dashes in the ISBN are characters - and if you have a string of characters in your command, you need to use quotation marks. (This is not the case with pure numbers, however.)

So for example:
search name from publishers where city = "New York";
select title, company_name from publishers, titles where publishers.pubid = titles.pubid and title like "%philosoph%"; 
select title from titles where year_published = 1973;


I also discovered that if you are searching across two (or more) tables, and are selecting information that is contained in both tables (their relationship), then you need to specify exactly which table you want that information to come from. For example, let's say you're looking for company name and publishers ID for publishers that have published books with the word philosophy in the title. At first, the temptation is to type:
select company_name, pubid from publishers, titles where publishers.pubid = titles.pubid and title like "%philosophy%";


But this won't work - instead, you'll be told that the pubid in field list is ambiguous. This is because you haven't specified which of the two tables you want the publishers ID info to come from. It doesn't matter which one you specify (the information is the same in both tables). So, you should type:
select company_name, publishers.pubid from publishers, titles where publishers.pubid = titles.pubid and title like "%philosophy%";


Finally, I learnt that you can search across three (or more) tables in a single search - you just need to put all of their relationships in the command. For example:
select author, title, titles.isbn from authors, title_author, titles where titles.isbn = title_author.isbn and authors.au_id = title_author.au_id and title = "a beginner's guide to basic";


In this command I have searched across the authors, title author, and titles tables, using the relationships to join the titles and title author tables and the title author and authors tables.

And that's all for this week. Stayed tuned for when we go back in time to DITA week two - Cameron's adventures with HTML!

Monday, 26 September 2011

DITA - week one

The first session of Digital Information Technologies and Architectures (more affectionately known to you and I as DITA) was an introduction to computing. More specifically, we looked at the basic building blocks of how information is stored as data, discussing such things as what a bit actually is (it's a "binary digit" by the way - 0 or 1) and how these build up into bytes, then kilobytes, then megabytes, and so on. We then went on to look at some different formats, and the importance of using the right program to open these formats - particularly in the case of proprietary formats, such as Microsoft Word files. 


In the lab we played around with some of these different formats. We saved text in ASCII (a system for encoding alphanumeric characters as seven-digit binary sequences), then saved that same text into a proprietary format (Word), and then saw what happens when you view a Word document in ASCII using Notepad - you get a whole mass of gobbledygook. This is because Word had added all sorts of proprietary info that could only be interpreted by Word, so the file couldn't be viewed as what we'd consider to be any sort of meaningful text by just using the ASCII code. (This touches on some interesting implications regarding the business side of proprietary formats, such as Microsoft's market dominance, and the current tension between Apple and Adobe over Flash.)


We also saved a document in HTML, viewed a document marked-up in HTML, created an image file, and then linked to that image file in a document. (Very useful to know - if the image file is changed, the image in the document changes too.)


I found it interesting to examine the actual "building blocks" of data, and how these blocks are used to store information in different formats. As a user, I've never really given much thought to how data is built - I've only ever been interested in searching for and accessing that information. Yet when one is searching for information, it's worthwhile knowing how that information is stored - to take a look at the engine every so often, if you will, instead of just driving down the highway.


Something that I'm personally interested in is what happens when file formats and technology become obsolete or are no longer supported, and what this means for society. How can we access information on these "lost"/"historical" file formats that are not supported by new technologies? While we are creating ever more information, swimming (or perhaps drowning?) in an ever-expanding sea of data, how much information are we also losing?