|

The Netherlands present themselves at SC2003 in Phoenix, Arizona with a demonstration
of several innovative and interesting projects
REPRO
John Romein
Vrije Universiteit
john@cs.vu.nl
We present a novel, parallel algorithm for generating top alignments.
Top alignments are used for finding internal repeats in biological
sequences like proteins and genes. Our algorithm replaces an older,
sequential algorithm (Repro), which was prohibitively slow for
sequence lengths higher than 2000. The new algorithm is an order of
magnitude faster (O(n3) rather than O(n4)).
We present a three-level parallel implementation of the algorithm: using SIMD
multimedia extensions found on present-day processors (a novel
technique that can be used to parallelize any application that
performs many sequence alignments), using shared-memory parallelism,
and using distributed-memory parallelism. It allows processing the
longest known proteins (nearly 35000 amino acids). We show
exceptionally high speed improvements: between 548 and 889 on a
cluster of 64 dual-processor machines, compared to the new sequential
algorithm. Especially for long sequences, extreme speed improvements
over the old algorithm are obtained.
A paper on the subject is also presented at the technical program (Wednesday, Nov. 19,
3:30PM-4:00PM, room 38-39).
VR-applications for Mining Genomics Data –
MasterWorks presentation
Anton Koning (speaker), Paul
Wielinga, Bram Stolk, Jorrit Adriaanse, Jeroen Akershoek
SARA
Computing and Networking Services
anton@sara.nl
The amount of genome related data that is collected worldwide and the
speed at which more information becomes available, pose a problem to
the (bio)medical researcher: How to identify the proteins that are
the most promising drug targets or find homologous genes that may
cause side effects, and how to improve the decision-making process
during drug development.
By integrating information from
various databases -- eg. location of genes on chromosomes,
association with diseases, (co)expression data -- the virtual reality
application developed by SARA Computing and Networking Services in
cooperation with Johnson & Johnson Pharmaceutical Research and
Development offers pharmaceutical researchers a novel and better way
to identify gene functionality and disease markers and thus discover
new drug targets.
Three-dimensional trees and graphs show
interrelations between proteins, while multi-species chromosome maps
identify their homologs and metabolic pathways indicate the processes
in which they play a role. Using an immersive virtual environment
like the CAVE makes it possible to view and interpret much larger
datasets than would be possible on ordinary computer displays.
An invited paper on the subject is also presented at the technical
program (Life Sciences Session, Thursday, Nov. 20, 11:15AM
- 12:00PM, room 16-18).
Virtual Lab
Robert Belleman
University of Amsterdam
robel@science.uva.nl
VLAM-G, the Grid-based Virtual Laboratory AMsterdam, provides a science portal for
distributed analysis in applied scientific research. It makes use of
a number of services provided by the Globus toolkit to handle remote
data access, resource allocation, security issues and access to
external devices.
The demos will show the basic features of the VLAM-G project:
1- Introductory demos.
Two simple demos will be used to show how both computing resource and external devices can
be controlled form within VL. The first demo, a histogram demo, shows
how multi-domain distributed computing VLAM-G users. The second demo,
a floating ball demo, shows how external devices can be easily used
in the VLAM-G environment. The demonstration consists of controlling
a simple device connected to a remote grid node via the VLAM-G
environment.
2- The MRI-Scan demo.
This demo shows how medical data can be accessed from a remote storage device and
visualized on screen through a remote computing system.
3- The Material Analysis of Complex surfaces (MACS).
The MACS demo shows a real scientific experiment performed at AMOLF in the domain of
chemistry and physics, dedicated to the handling of device-generated
data within a distributed environment. These data originate from
apparatus at geographically different locations and should
subsequently be combined to create a virtual Surface Analysis
Laboratory for materials analysis at the micrometer scale.
Coupled ocean models on the grid using Cactus
Fokke Dijkstra*, Aad van der Steen* and Henk Dijkstra^
* High Performance Computing group Department of Physics Utrecht University
^ Institute for Marine and Atmospheric Research Utrecht (IMAU), Utrecht University
fokke.dijkstra@phys.uu.nl
The time to reach equilibrium for the thermohaline ocean circulation is in the order of
thousands of years. This makes calculations with high resolution
using the existing ocean models almost unfeasible.
In order to achieve high resolution and long time scales we have developed a coupled
ocean model using implicit and explicit time integration. The Cactus
Computational toolkit has been used as a framework for the coupling
of these models. Two existing ocean codes have been integrated into
this framework. The first is an implicit model developed at the IMAU.
The second the regular explicit model MOM4 (beta).
One of the nice features of Cactus is the modular approach, which also allows us to
make use of existing Cactus thorns (modules) for e.g. I/O or grid
computing. We are therefore now able to run our coupled ocean model
on the Dutch DAS2 test grid, while streaming the model output to a
visualization machine using HDF5. This will be demonstrated live at SC2003.
A Concurrent Algorithm for Shape Preserving Connected Set Filtering, and its Application to Interactive Visualization
A. Meijster
Centre for High Performance Computing and Visualisation
University of Groningen,
a.meijster@rc.rug.nl
A method is presented for combined interactive filtering and
visualization of volumetric data on shared memory architectures. The
user can interactively set the filter parameters of a shape
preserving class of morphological filters, called connected
filters, and immediately see a volume rendering of the resulting
filtered volume data set. The filters work by computing some
attribute describing the shape or size for each connected component
of the volume. The user can decide which to preserve based on some
threshold. We use a method in which the computation of attributes and
connected component analysis is separated from the decision stage of
the filtering process. For both stages a concurrent algorithm has
been developed. The first stage is a sort of initialization for the
(faster) decision stage, which can be run many times with different
threshold values, allowing interactive filtering and visualization of
the results. We implemented the algorithms using POSIX Threads on an
SGI Onyx 3400 with 16 CPUs, which is the system driving a
visualization facility at the University of Groningen, consisting of
a reality theatre, and a CAVE. We ran some tests on a 256×256×256
magnetic resonance angiography data set. We ran the program while a
radiologist from the university hospital was standing in the CAVE
steering the application. The user can interactively manipulate the
filtering of the data set. The user can filter, scale, rotate, and
translate the data set, and can also change colors by manipulating a
color lookup table (color transfer function). These manipulations are
quite interactive, performing at typical frame rates of about 10 to
20 frames per second. The radiologist seemed quite comfortable with
this level of interactivity.
Maximum intensity projections of a magnetic resonance angiography volume data set
filtered with an attribute thinning as shape filter. The attribute
used was the moment of inertia divided by the volume of a peak
component to the power 5/3. This attribute is a shape dependent
number that expresses elongation. The top left-hand image is the
original, in the others the attribute threshold was 0.5, 1.0, 2.0,
3.0 and 4.0, respectively.
Protein World
Tim Hulsen
CMBI (Nijmegen, The Netherlands)
timhulse@cmbi.kun.nl
Classification of proteins is becoming an increasingly important means of coping with
the large amount of data resulting from the flood of sequences from
large- scale genome sequencing projects. Currently more than 100
genomes have been completely sequenced. Comparisons of their genes
and encoded proteins have revealed a tremendous amount of
information. For example, finding unique sequences in certain
bacterial species has made elucidation of their pathogenicity
possible and revealed novel pathways for drug intervention. For
understanding the aetiology of complex disorders, learning about the
complexity in higher eukaryotes and enabling more in depth
comparative studies exact mapping of interspecies relationships
(orthologs and paralogs) is key to significant progress.
Currently several curated protein databases exist like Swiss-Prot and PIR. Despite
their curation protein databases do not present a well-organized way
of discovering protein characteristics like intra and inter species
relationships to other proteins. Several efforts have shown the value
of mapping similarities in classifying proteins based upon their
primary structure. Some of these examples are the ClusTr database
(EBI) Protomap (Cornell University) and Systers (MPI) that try to
give a very concise view on all proteins and their similarities to
other proteins creating a comprehensive view of the protein universe.
Classification enables the designation of so-called superfamilies,
families and sub-families. Other databases like Pfam (EBI) SMART,
PRINTS are classifying domains, families and other characteristics.
All of the examples mentioned above are built upon sequence comparisons that lack
sensitivity and do not permit easy updating without a complete
recalculation of all protein sequence similarities. Most of them use
a simple BLAST comparison with certain fine-tuned parameters. A more
advanced way has been the iterative BLAST, Psi-BLAST because of the
computer intensive character of more advanced algorithms like
Smith-Waterman. The Z-value is an attempt to estimate the statistical
significance of a Smith-Waterman dynamic alignment score (SW-score)
through the use of a Monte-Carlo process. In the latter approach
highly similar sequences are shuffled randomly a 100 times and
similarities determined and its significance measured to the already
obtained sub-optimal alignments.
Z-values offer a very precise definition of protein similarities and partly reduce the bias
induced by the composition and length of the sequences (e.g. database
size). The method is however, severely hampered by its CPU intensive
character. In an international collaborative effort between Gene-IT
(Paris, France), EBI (Hinxton, UK), CMBI (Nijmegen, The Netherlands),
SARA (Amsterdam, The Netherlands) and Organon (Oss, The Netherlands),
funded by the NCF (Den Haag, The Netherlands) a project has been
initiated in which all of the currently known and predicted proteins
are being compared and Z-scores determined. This dataset will be the
basis for many projects that involve the clustering of proteins.
OGSA front end for the Bandwidth on Demand (BoD) services using Network Element
provisioning through multiple administrative domains
Bas van Oudenaarde
Universiteit van Amsterdam
oudenaar@science.uva.nl
Our goal is to allow the creation of an end-to-end lightpath using an OGSA based grid
services interface in each domain that is part of the path. We hereby
explore the dynamic interaction of service invocations between grid
services in a multi domain set-up. We discuss a Broker based service
model that interacts with various Network Element managers in
different domains, such as the Bandwidth on Demand (BoD) service
based on Generic AAA (RFC2903) in the NetherLight domain. The
signaling to provision the Network Elements for the end-to-end
lightpath will be based on grid service messages and controlled by a
broker service. This case study is meant to give us a feeling of the
primitives and policy usages in multi administrative domain
scenarios.
Collaborative Genomics Visualization using SARAgene
Paul Wielinga
SARA
wielinga@sara.nl
SARA is developing the SARAgene data mining application for genomics
research in close cooperation with the Pharmaceutical Research and
Development department of Johnson & Johnson. Using the 'infinite'
space of the virtual world makes it possible to visualize large
amounts of data simultaneously and to investigate it at different
levels. This way a biologist can study gene loci, sequence data,
molecular structures, metabolic pathways and expression information
of a large number of proteins at the same time.
Currently, SARA is extending SARAgene into a collaborative version, which is
using the principle of “Augmented Virtuality” by
including live video streams into the VR environment created by the
application. This project was sponsored by SURFnet, the Dutch
Research Network Organization.
At SC03, SARA and SURFnet will demonstrate this collaborative version of
SARAgene. Three different bioinformatics sites in the Netherlands
will connect to the Dutch research booth in Phoenix. Scientists will
use a Reality Cube at the University of Groningen, an Immersadesk at
Erasmus Medical Center in Rotterdam and the CAVE at SARA in Amsterdam
to collaborate on a number of genomics databases, while using video
conferencing for the interaction.
SARAgene runs on IA32/Linux
and MIPS/Irix and is best used in combinations with a virtual reality
display system, like an ImmersaDesk, Passive Stereo Projection system
or a CAVE(-like) Immersive VR environment.
SARA
NetherLight demo’s
Demo’s 1, 2 and 4 are from the Netherlands-booth
Demo# 1
Topic: Wirespeed IPv6
Lead: SURFnet, Cisco
Contact: Jac Kloots <Jac.Kloots@surfnet.nl>
Wishes: up to 2.5Gbps
Between: SURFnet / Abilene (NYC)
Demo# 2
Topic: Lightpath switching and AAA
Members: University of Amsterdam, Jason Leigh, Oliver Yu, Joe
Mambretti, Leon Gommans, Bas Oudenaarde, etc.
Contact: Freek Dijkstra <fdijkstr@science.uva.nl>
Wish: 1x or 2x 1Gbps lightpath
Interface: 1000BaseLX
Endpoints: Amsterdam / Chicago
Demo# 3
Topic: 10G tests using S2IO 10GE NICs in Optron systems
Members: UvA, LBL et al
Contact: Antony Antony <antony@nikhef.nl>
Wish: 10G connectivity
Interface: OC192
Endpoints: Amsterdam / Chicago
Demo# 4
Topic: 10G tests using Intel 10GE NICs in Itanium systems
Members: UvA, Bob Grossman et al
Contact: Antony Antony <antony@nikhef.nl>
Wish: 10G connectivity
Interface: OC192
Endpoints: Amsterdam / Chicago
Demo# 5
Topic: Realitygrid2003
Members: BT, EVL, others
Contact: Erik Radius <Erik.Radius@surfnet.nl>
Wish: 2x 1GE connectivity
Interface: 1000BaseSX
Endpoints: Amsterdam / Chicago
Demo# 6
Topic: NCDM
Members: Bob Grossman
Contact: Bob Grossman <grossman@uic.edu>
Wish: 10x 1GE connectivity to NL 6509
Interface: 1000BaseSX
Endpoints: Amsterdam / Chicago
|