Bioclipse : Integration of Data and Software in the Life Sciences

Sammanfattning: New high throughput experimental techniques have turned the life sciences into a data-intensive field. Scientists are faced with new types of problems, such as managing voluminous sources of information, integrating heterogeneous data, and applying the proper analysis algorithms; all to end up with reliable conclusions. These challenges call for an infrastructure of algorithms and technologies to supply researchers with the tools and methods necessary to maximize the usefulness of the data. eScience has emerged as a promising technology to take on these challenges, and denotes integrated science carried out in highly distributed network environments, or science that makes use of large data sets and requires high performance computing resources. In this thesis I present standards, exchange formats, algorithms, and software implementations for empowering researchers in the life sciences with the tools of eScience. The work is centered around Bioclipse - an extensible workbench developed in the frame of this thesis - which provides users with instruments for carrying out integrated research and where technical details are hidden under simple graphical interfaces. Bioclipse is a Rich Client that takes full advantage of the many offerings of eScience, such as networked databases and online services. The benefits of mixing local and remote software in a unifying platform are demonstrated with an integrated approach for predicting metabolic sites in chemical structures. To overcome the limitations of the commonly used technologies for interacting with networked services, I also present a new technology using the XMPP protocol. This enables service discovery and asynchronous communication between the client and server, which is ideal for long-running analyses. To maximize the usefulness of the available data there is a need for standards, ontologies, and exchange formats, in order to define what information should be captured and how it should be structured and exchanged. A novel format for exchanging QSAR data sets in a fully interoperable and reproducible form is presented, together with an implementation in Bioclipse that takes advantage of eScience components during the setup process. Bioclipse has been well received by the scientific community, attracted a large group of international users and developers, and has been awarded three international prizes for its innovative character. With continued development, the project has a good chance of becoming an important component in a sustainable infrastructure for the life sciences.

  KLICKA HÄR FÖR ATT SE AVHANDLINGEN I FULLTEXT. (PDF-format)