Home  -  What are SC's?  -  SC's in The Netherlands  -  Links  -  Downloads  -  Contact  
     
 

Nederland op SC2003

The Netherlands present themselves at SC2003 in Phoenix, Arizona with a demonstration of several innovative and interesting projects

REPRO

John Romein

Vrije Universiteit

john@cs.vu.nl

We present a novel, parallel algorithm for generating top alignments. Top alignments are used for finding internal repeats in biological sequences like proteins and genes. Our algorithm replaces an older, sequential algorithm (Repro), which was prohibitively slow for sequence lengths higher than 2000. The new algorithm is an order of magnitude faster (O(n3) rather than O(n4)).

We present a three-level parallel implementation of the algorithm: using SIMD multimedia extensions found on present-day processors (a novel technique that can be used to parallelize any application that performs many sequence alignments), using shared-memory parallelism, and using distributed-memory parallelism. It allows processing the longest known proteins (nearly 35000 amino acids). We show exceptionally high speed improvements: between 548 and 889 on a cluster of 64 dual-processor machines, compared to the new sequential algorithm. Especially for long sequences, extreme speed improvements over the old algorithm are obtained.

A paper on the subject is also presented at the technical program (Wednesday, Nov. 19, 3:30PM-4:00PM, room 38-39).

VR-applications for Mining Genomics Data – MasterWorks presentation

Anton Koning (speaker), Paul Wielinga, Bram Stolk, Jorrit Adriaanse, Jeroen Akershoek

SARA Computing and Networking Services

anton@sara.nl

The amount of genome related data that is collected worldwide and the speed at which more information becomes available, pose a problem to the (bio)medical researcher: How to identify the proteins that are the most promising drug targets or find homologous genes that may cause side effects, and how to improve the decision-making process during drug development.

By integrating information from various databases -- eg. location of genes on chromosomes, association with diseases, (co)expression data -- the virtual reality application developed by SARA Computing and Networking Services in cooperation with Johnson & Johnson Pharmaceutical Research and Development offers pharmaceutical researchers a novel and better way to identify gene functionality and disease markers and thus discover new drug targets.

Three-dimensional trees and graphs show interrelations between proteins, while multi-species chromosome maps identify their homologs and metabolic pathways indicate the processes in which they play a role. Using an immersive virtual environment like the CAVE makes it possible to view and interpret much larger datasets than would be possible on ordinary computer displays.

An invited paper on the subject is also presented at the technical program (Life Sciences Session, Thursday, Nov. 20, 11:15AM - 12:00PM, room 16-18).

Virtual Lab

Robert Belleman

University of Amsterdam

robel@science.uva.nl

VLAM-G, the Grid-based Virtual Laboratory AMsterdam, provides a science portal for distributed analysis in applied scientific research. It makes use of a number of services provided by the Globus toolkit to handle remote data access, resource allocation, security issues and access to external devices.

The demos will show the basic features of the VLAM-G project:

1- Introductory demos.

Two simple demos will be used to show how both computing resource and external devices can be controlled form within VL. The first demo, a histogram demo, shows how multi-domain distributed computing VLAM-G users. The second demo, a floating ball demo, shows how external devices can be easily used in the VLAM-G environment. The demonstration consists of controlling a simple device connected to a remote grid node via the VLAM-G environment.

2- The MRI-Scan demo.

This demo shows how medical data can be accessed from a remote storage device and visualized on screen through a remote computing system.

3- The Material Analysis of Complex surfaces (MACS).

The MACS demo shows a real scientific experiment performed at AMOLF in the domain of chemistry and physics, dedicated to the handling of device-generated data within a distributed environment. These data originate from apparatus at geographically different locations and should subsequently be combined to create a virtual Surface Analysis Laboratory for materials analysis at the micrometer scale.

Coupled ocean models on the grid using Cactus

Fokke Dijkstra*, Aad van der Steen* and Henk Dijkstra^

* High Performance Computing group Department of Physics Utrecht University
^ Institute for Marine and Atmospheric Research Utrecht (IMAU), Utrecht University

fokke.dijkstra@phys.uu.nl

The time to reach equilibrium for the thermohaline ocean circulation is in the order of thousands of years. This makes calculations with high resolution using the existing ocean models almost unfeasible.

In order to achieve high resolution and long time scales we have developed a coupled ocean model using implicit and explicit time integration. The Cactus Computational toolkit has been used as a framework for the coupling of these models. Two existing ocean codes have been integrated into this framework. The first is an implicit model developed at the IMAU. The second the regular explicit model MOM4 (beta).

One of the nice features of Cactus is the modular approach, which also allows us to make use of existing Cactus thorns (modules) for e.g. I/O or grid computing. We are therefore now able to run our coupled ocean model on the Dutch DAS2 test grid, while streaming the model output to a visualization machine using HDF5. This will be demonstrated live at SC2003.

A Concurrent Algorithm for Shape Preserving Connected Set Filtering, and its Application to Interactive Visualization

A. Meijster

Centre for High Performance Computing and Visualisation

University of Groningen,

a.meijster@rc.rug.nl

A method is presented for combined interactive filtering and visualization of volumetric data on shared memory architectures. The user can interactively set the filter parameters of a shape preserving class of morphological filters, called connected filters, and immediately see a volume rendering of the resulting filtered volume data set. The filters work by computing some attribute describing the shape or size for each connected component of the volume. The user can decide which to preserve based on some threshold. We use a method in which the computation of attributes and connected component analysis is separated from the decision stage of the filtering process. For both stages a concurrent algorithm has been developed. The first stage is a sort of initialization for the (faster) decision stage, which can be run many times with different threshold values, allowing interactive filtering and visualization of the results. We implemented the algorithms using POSIX Threads on an SGI Onyx 3400 with 16 CPUs, which is the system driving a visualization facility at the University of Groningen, consisting of a reality theatre, and a CAVE. We ran some tests on a 256×256×256 magnetic resonance angiography data set. We ran the program while a radiologist from the university hospital was standing in the CAVE steering the application. The user can interactively manipulate the filtering of the data set. The user can filter, scale, rotate, and translate the data set, and can also change colors by manipulating a color lookup table (color transfer function). These manipulations are quite interactive, performing at typical frame rates of about 10 to 20 frames per second. The radiologist seemed quite comfortable with this level of interactivity.

Maximum intensity projections of a magnetic resonance angiography volume data set filtered with an attribute thinning as shape filter. The attribute used was the moment of inertia divided by the volume of a peak component to the power 5/3. This attribute is a shape dependent number that expresses elongation. The top left-hand image is the original, in the others the attribute threshold was 0.5, 1.0, 2.0, 3.0 and 4.0, respectively.

Protein World

Tim Hulsen

CMBI (Nijmegen, The Netherlands)

timhulse@cmbi.kun.nl

Classification of proteins is becoming an increasingly important means of coping with the large amount of data resulting from the flood of sequences from large- scale genome sequencing projects. Currently more than 100 genomes have been completely sequenced. Comparisons of their genes and encoded proteins have revealed a tremendous amount of information. For example, finding unique sequences in certain bacterial species has made elucidation of their pathogenicity possible and revealed novel pathways for drug intervention. For understanding the aetiology of complex disorders, learning about the complexity in higher eukaryotes and enabling more in depth comparative studies exact mapping of interspecies relationships (orthologs and paralogs) is key to significant progress.

Currently several curated protein databases exist like Swiss-Prot and PIR. Despite their curation protein databases do not present a well-organized way of discovering protein characteristics like intra and inter species relationships to other proteins. Several efforts have shown the value of mapping similarities in classifying proteins based upon their primary structure. Some of these examples are the ClusTr database (EBI) Protomap (Cornell University) and Systers (MPI) that try to give a very concise view on all proteins and their similarities to other proteins creating a comprehensive view of the protein universe. Classification enables the designation of so-called superfamilies, families and sub-families. Other databases like Pfam (EBI) SMART, PRINTS are classifying domains, families and other characteristics.

All of the examples mentioned above are built upon sequence comparisons that lack sensitivity and do not permit easy updating without a complete recalculation of all protein sequence similarities. Most of them use a simple BLAST comparison with certain fine-tuned parameters. A more advanced way has been the iterative BLAST, Psi-BLAST because of the computer intensive character of more advanced algorithms like Smith-Waterman. The Z-value is an attempt to estimate the statistical significance of a Smith-Waterman dynamic alignment score (SW-score) through the use of a Monte-Carlo process. In the latter approach highly similar sequences are shuffled randomly a 100 times and similarities determined and its significance measured to the already obtained sub-optimal alignments.

Z-values offer a very precise definition of protein similarities and partly reduce the bias induced by the composition and length of the sequences (e.g. database size). The method is however, severely hampered by its CPU intensive character. In an international collaborative effort between Gene-IT (Paris, France), EBI (Hinxton, UK), CMBI (Nijmegen, The Netherlands), SARA (Amsterdam, The Netherlands) and Organon (Oss, The Netherlands), funded by the NCF (Den Haag, The Netherlands) a project has been initiated in which all of the currently known and predicted proteins are being compared and Z-scores determined. This dataset will be the basis for many projects that involve the clustering of proteins.

OGSA front end for the Bandwidth on Demand (BoD) services using Network Element provisioning through multiple administrative domains

Bas van Oudenaarde

Universiteit van Amsterdam

oudenaar@science.uva.nl

Our goal is to allow the creation of an end-to-end lightpath using an OGSA based grid services interface in each domain that is part of the path. We hereby explore the dynamic interaction of service invocations between grid services in a multi domain set-up. We discuss a Broker based service model that interacts with various Network Element managers in different domains, such as the Bandwidth on Demand (BoD) service based on Generic AAA (RFC2903) in the NetherLight domain. The signaling to provision the Network Elements for the end-to-end lightpath will be based on grid service messages and controlled by a broker service. This case study is meant to give us a feeling of the primitives and policy usages in multi administrative domain scenarios.

Collaborative Genomics Visualization using SARAgene

Paul Wielinga

SARA

wielinga@sara.nl

SARA is developing the SARAgene data mining application for genomics research in close cooperation with the Pharmaceutical Research and Development department of Johnson & Johnson. Using the 'infinite' space of the virtual world makes it possible to visualize large amounts of data simultaneously and to investigate it at different levels. This way a biologist can study gene loci, sequence data, molecular structures, metabolic pathways and expression information of a large number of proteins at the same time.

Currently, SARA is extending SARAgene into a collaborative version, which is using the principle of “Augmented Virtuality” by including live video streams into the VR environment created by the application. This project was sponsored by SURFnet, the Dutch Research Network Organization.

At SC03, SARA and SURFnet will demonstrate this collaborative version of SARAgene. Three different bioinformatics sites in the Netherlands will connect to the Dutch research booth in Phoenix. Scientists will use a Reality Cube at the University of Groningen, an Immersadesk at Erasmus Medical Center in Rotterdam and the CAVE at SARA in Amsterdam to collaborate on a number of genomics databases, while using video conferencing for the interaction.

SARAgene runs on IA32/Linux and MIPS/Irix and is best used in combinations with a virtual reality display system, like an ImmersaDesk, Passive Stereo Projection system or a CAVE(-like) Immersive VR environment.

SARA

NetherLight demo’s

Demo’s 1, 2 and 4 are from the Netherlands-booth

Demo# 1

Topic: Wirespeed IPv6

Lead: SURFnet, Cisco

Contact: Jac Kloots <Jac.Kloots@surfnet.nl>

Wishes: up to 2.5Gbps

Between: SURFnet / Abilene (NYC)

Demo# 2

Topic: Lightpath switching and AAA

Members: University of Amsterdam, Jason Leigh, Oliver Yu, Joe Mambretti, Leon Gommans, Bas Oudenaarde, etc.

Contact: Freek Dijkstra <fdijkstr@science.uva.nl>

Wish: 1x or 2x 1Gbps lightpath

Interface: 1000BaseLX

Endpoints: Amsterdam / Chicago

Demo# 3

Topic: 10G tests using S2IO 10GE NICs in Optron systems

Members: UvA, LBL et al

Contact: Antony Antony <antony@nikhef.nl>

Wish: 10G connectivity

Interface: OC192

Endpoints: Amsterdam / Chicago


Demo# 4

Topic: 10G tests using Intel 10GE NICs in Itanium systems

Members: UvA, Bob Grossman et al

Contact: Antony Antony <antony@nikhef.nl>

Wish: 10G connectivity

Interface: OC192

Endpoints: Amsterdam / Chicago

Demo# 5

Topic: Realitygrid2003

Members: BT, EVL, others

Contact: Erik Radius <Erik.Radius@surfnet.nl>

Wish: 2x 1GE connectivity

Interface: 1000BaseSX

Endpoints: Amsterdam / Chicago

Demo# 6

Topic: NCDM

Members: Bob Grossman

Contact: Bob Grossman <grossman@uic.edu>

Wish: 10x 1GE connectivity to NL 6509

Interface: 1000BaseSX

Endpoints: Amsterdam / Chicago