BSD and Science?
Trish Lynch <[email protected]>
[ This is an article that first appeared in the
October 2000 issue of Open
Magazine which was submitted to us by the author
for republication here. ]
Open Source software grew out of the IT industry's
attempt to streamline development cycles and ease of
use for computer professionals. How many times did
one say, "Now if I only had the source." Still, Open
Source had to be accepted by IT management. Industry
analysts and IT crystal-ball watchers have all turned
that question of acceptance every which way: Where
doesn't Open Source work in the here and now? What
fields represent hurdles to Open Source principles and
methods? Can Open Source thrive in an environment
where the core expertise is not specifically
computer-related?
This BSD writer has found a match: An industry that
has turned away from Microsoft solutions due to
massive processing needs and a strong academic
tradition, and has turned to commercial UNIX vendors
such as SGI, Compaq/DEC, and Sun for their computing
solutions. Welcome to the hot field of
bioinformatics.
Bioinformatics is one of the fields where users are
truly end users. Most are scientists who don't want
to know anything about the underlying OS; they just
want the machines to work and the OS to run. As such,
we are talking about an audience where Open Source
faces impediments.
[ An interdisciplinary research area where computer
science is used in life sciences, bioinformatics is
where the two sciences interface to solve complex
biological problems. It includes the collection,
organization, storage, and retrieval of biological
information and databases. What's more,
bioinformatics is having a major impact on drug
discovery through the retrieval, storage, and analysis
of genetic information. Large-scale databases of
genomic information requiring high-performance
computing systems are foundations for
compute-intensive search and analysis algorithms and
applications that make bioinformatics a major focus in
drug discovery. Andrew Pollack of The New York
Times casts bioinformatics in the light of a wet
science, biology, shifting from work done in test
tubes to partly a dry one, where crucial analysis is
done on computers. ]
Considering all this, one easily sees that Open
Source traditionally has worked in other kinds of
settings.
The roots of Open Source are in the Internet
industry, where users have computing skills, and where
computers are not required to have massive processing
capabilities, and where typically there is not a lot
of money to throw at a problem. In turn, people in
the Internet/e-commerce arenas tend to go for
self-supported systems such as Linux and BSD.
Bioinformatics, on the other hand, is a place where
research is done entirely within the silicon structure
of a computer. many biologists cannot be competitive
without young, new talent in the IT fields today.
Johann Visagie of Electric Genetics says, "With new
blood came new ideas, and some of the ideas we brought
with us said that there are far more sensible ways of
doing things than paying millions upon millions to
hardware and software vendors. The idea that Open
Source should play an important part in bioinformatics
is quickly taking off. If Open Source bioinformatics
becomes the norm, you can bet that Open Source
operating systems will become common in the field.
Linux is finding acceptance quickly."
Visagie is quite optimistic about BSD usage in this
field. If one is going to build massive processing
boxes out of inexpensive hardware, one may as well use
an OS that is "stable, mature, and above all very
manageable in a large, networked environment," he
states.
There are several biotech software packages ported to
BSD now, such as emboss, sim4, bioperl, and biopython,
located in /usr/ports/biology in the
FreeBSD ports collection; feedback from the authors of
these packages has been very supportive. This kind of
endorsement is what BSD (also Linux and other Open
Source software) needs to succeed in that field.
What's more, hardware is an important factor. By
nature, bioinformatics is very CPU-intensive. [ As
obviously noted, bioinformatics handles enormous data
volumes. ].
Besides x86 and Alpha hardware, which all of the Open
Source BSDs support, NetBSD and BSD/OS (BSDi's
closed-source BSD offering) have UltraSparc ports. Big
Iron support is a significant advantage in the
bioinformatics field due to the processing power
needed.
With BSD optimists like Visagie, why has Open Source
not pervaded the world of bioinformatics? Impediments
are largely sociological.
The end users are set in their ways, as evidenced in
many academic environments. Can an Open Source
solution do the job? No question. It's a matter of
acceptance within a community that is largely shielded
from technical details.
Might BSD's academic background be used to leverage
greater acceptance of the OS in biotechnical fields?
This remains to be seen. What we do know is that
people like Johann Visagie and their quest for
inexpensive, powerful computing in this field are
really providing the groundwork needed for long-term
success.
- Trish
Return to the
March 2001 Issue