Information flow part 1: Overview
This post gives you an overview of how our information flow is set up on the intranet. All the information systems are ”free-for-all”, meaning that they contain only public information, the only access control is whether your computer is inside the firewall, i.e on the intranet. We have been working on this architecture since 2006, and it’s not until lately that we’ve been able to get this far. A lot in fact thanks to open source software (many of which are mentioned in the illustration) and open standards (e.g. Atom and PubSubhubbub). This is the first in a series of posts (corrected and updated the list, 16th of december):
- Information flow overview (this post)
- Information and metadata
- Why persistent links are important
- Search statistics for our enterprise search
- Real-time information syndication and search on our intranet
- And then what?
I would really appreciate any feedback about this post. The coming posts will go through every part of the flow in more detail, and I would really like to write about and explain the things that you are interested in.
The information is stored, for example in a Document Management System, but often (until now) have lacked the same metadata (master metadata). Our master metadata, controlled vocabularies and taxonomies (for example MeSH) are stored in the metadata-service (documentation in swedish only). The controlled vocabularies and taxonomies are primarily used by the keyword-service which analyzes the information (content) and gives the editor suggestions for keywords to be stored as metadata, connected to the information. The metadata-service also supplies controlled list of topics, document-types, target groups etc, that the editor can choose from to add additional metadata. These lists changes depending on the type of information, for example a news item only has three types of editor provided metadata:
- target group
- organisational info (what department/unit etc).
On top of that we have system given metadata like, creation date, updated, published by, expires on and so on. Of course the metadata is presented in Dublin Core format where applicable. The information in the system is made searchable by feeding the search engine using sitemaps standard, or by using a HTML-crawler or by indexing via RSS-feeds. Sometimes we also directly index databases. All URLs are shortened via the URL-service (documentation coming), which works just like bit.ly, in order to be able to give them a permanent URL that will stay unchanged for the lifecycle of the information. It is highly probable that the actual URL will change for the specific information (e.g it is moved somewhere else or merged with some other information), but the shortened URL will not. When the information has been indexed, it is searchable. Most information is indexed within the hour, but some information will take up to 24 hours. Some very basic intranet search stats:
- 260 000 documents (this will at least double, maybe even increase tenfold over the next few months)
- information is indexed from about 11 different systems
- about 3000 searches per day (average over the whole week)
Another way of giving users access to the information is by using the Atom format. When we publish information in feeds, we will (in about a months time) use PubSubHubbub (PuSH) instead of polling the sources constantly. This way we achieve real-time publishing (and distribution) of the information. And when the search engine listens to the information flowing through the Pubsub-service (english documentation), we will get it indexed in real-time too! The final part is to make the information readable by the users. We have portlets displaying Atom (and RSS) feeds (documentation in swedish) in our portal solution, which main purpose is to give single sign-on access to many different information systems and to give the user personalized information. Or we have a template-page in our Web Content Management system that displays Atom feeds. The content of our WCM is accessible to anyone that connects a computer to our intranet, it needs no logging in. Our search engine also provides search results in Atom format (swedish documentation) based on the OpenSearch standard. This means that a user can get a search alert, as a search is ”saved” in an Atom feed and it (the search result) gets updated continuously! That’s about it, nothing special really.