Small Business Resources, Business Advice and Forms from AllBusiness.com

Business Exchange

Chemical information software for searchers: A brief review

By:McBride, Matthew,Felter, Laura M
Publication: Searcher
Date: Wednesday, October 1 2003
IMAGE ILLUSTRATION1

Let's get the disclaimer out of the way at the start. This review will not evaluate or analyze the usefulness or accuracy of chemical information software at the level of a chemist or biologist. As former scientists (biologist and chemist, respectively), we will avoid any discussion of the advanced drawing features, complex databases, and combinatorial chemistry features others have discussed (1). Instead, we intend to review the available chemical information software from the perspective of the needs of information professionals. The average searcher is often called upon to research and analyze subjects outside their immediate expertise and this may require combining multiple data into complete packages tailored to the client. The software packages reviewed can all be used by individual searchers who have only minor or intermediate knowledge of chemical structures to integrate with other data sources for delivery to their clients. All the systems we looked at can integrate and import data with the most popular enterprise chemical information systems.

Imagine merging chemical structures received from a researcher with external physical or biological property data from CAS or other sources, bibliographic citations, patent data, etc., and then passing that report back to a scientist who can then perform additional analysis on the chemical substances. It's a clear case of scientific value-added research.

Chemical Information Software Background

File Types

If you have ever struggled with the multitude of file types (.wpd, .wkl, etc.) on the various Microsoft platforms, the problems of dealing with chemical structure data will seem familiar. Table I on page 53 lists some of the interoperable file types available across chemical information software. Among these file types, the industry standard formats for chemical information are called chemical table files (CTFiles), developed by MDL Information Systems, Inc. (2). Think of the various CTFiles as equivalent to Rich Text Format for text documents. CTFiles contain all the information necessary to recreate a chemical substance, including additional non-chemical data.

As with any major software "Office" application, the import/export of data is extremely important. Interoperability among the major chemical information software packages is usually enabled by one of the MDL file types, often a Molfile (.mol) for single structures and an SDfile (.sdf) for multiple structures with associated data. All of the software packages under review can import/export in at least one of these formats.

In addition, all the packages integrate with Microsoft Office applications through the object-linking and embedding (OLE) procedure in Windows. For example, copying chemical information from ChemOffice into the Windows clipboard, then pasting the data into Word does not eliminate the ability to alter the structure. Double-clicking (or right-clicking) the chemical object starts the original drawing program. In Office XP, the editing features are as seamless as editing any Word document. Transfer of chemical objects between the various programs can also be accomplished by this process, although you may lose some data.

Software Drawing

If you consider yourself fairly experienced with chemical structures, then the basics of drawing in any of the chemical information software packages should come very easy to you. Just point and click. If you are not very experienced, then take care in replicating any structure from other sources, since precise order and stereochemistry are critical for chemical substances (3). Most searchers will probably prefer to use the tools available in some software packages that allow conversion of the International Union of Pure and Applied Chemistry (IUPAC) standard names to chemical structures, or vice versa. In addition, most of the software will calculate basic chemical property information, while others use more complex formulas to give predictive values for boiling/melting points, experimental Henry's Law constants, and water solubility, for example. Most programs also allow the addition of other graphical elements or text as an option.

Review of Selected Software

For this article, we selected three comprehensive chemical information software packages (sometimes referred to as "Office" or integrated suites) and two Microsoft Excel Add-Ins for review. In selecting the software, we applied the following criteria: previous experience with the packages, well-established business presence in the chemical information industry, and the ability of the software to integrate with existing business practices and systems in the chemical or pharmaceutical industries. Software cost was not a primary factor in the evaluation. Searchers who operate on smaller budgets or only need the software infrequently should review Tom Kuppens "Tom's Free Chemistry Software" for alternative options (4).

IMAGE TABLE2

Table I. Selected Chemical File Types

Table II. Bundled Software

Chemical Information Software Packages

* ChemOffice 2002 (CambridgeSoft Corporation)

* ACD/Labs 7.0 (Advanced Chemistry Development, Inc. (ACD/Labs))

* Chemistry 4-D Draw Office 7.0 (ChemInnovation Software, Inc.)

Microsoft Excel Add-Ins

* Accord for Excel 5.0 (Accelrys)

* MDL ISIS for Excel 2.0 (MDL Information Systems, Inc.)

Some of the software packages (as reviewed) come bundled with additional software, as listed in Table II on page 53.

Each software package was evaluated independently of others, focusing on the following factors: features, installation, user manual, ease of use, import/export/compatibility, and Microsoft Office integration.

Sample chemical data: Databases of structures used in tests of the software packages came from the National Cancer Institute (NCI) Open Database Compounds as retrieved from Erlangen/Bethesda Data and Online Services [http://cactus.nci.nih.gov/ncidb2/] and the Enhanced NCI Database Browser (Release 2). We selected 1,000 nucleotide metabolism regulators with a molecular weight of 250-300 to evaluate the database programs.

Reviews

ChemOffice 2002

Features: ChemOffice is a full-featured chemical information suite, including an advanced chemical structure editor (ChemDraw), a fully customizable chemical database program (ChemFinder), a 3-D structure viewing program (Chem3D), and a chemical plug-in that gives any Web browser ChemDraw functionality when embedded structures appear in a page (ChemDraw Pro Plugin).

Installation: ChemOffice installed flawlessly, although the full ChemOffice package, including accessory databases, tools, and the Merck Index, takes up seven CD-ROMs.

Manual/Help: ChemOffice contains detailed printed manuals for each piece of bundled software, including an overview manual on the integrated office environment. Each program also contains an extensive help file with full search capabilities. CambridgeSoft offers extensive online (Web-based) customer support.

Ease of Use: ChemOffice is an extremely flexible and integrated software package. Functions from all the subprograms are always available. For instance, from within ChemDraw, you can import a chemical substance from an MDL molfile, generate the IUPAC name, represent the structure as a 3-D molecule, and then finally export it into a ChemFinder database. Software updates are available online, as well as searchable reference, chemical, and reactions databases. ChemOffice is sufficiently simple for novice users and equally powerful for expert searchers.

Import/Export/Compatibility: ChemDraw and ChemFinder have the largest array of import/export options available from any of the software packages we reviewed, offering almost every chemical file and graphic format available.

MS Office Integration: ChemOffice is fully compatible with Microsoft Office 2000 and XP using object linking and embedding. In addition, no problems were identified in chemical transfers between ChemDraw Ultra and other chemical information software.

ACD/Labs 7.0

Features: ACD/ChemSketch allows users to create professional chemical structures, work with text and graphics simultaneously, and transfer the resulting structure to any OLE-supported software. The software allows the calculation of the chemical formula, molecular weight, and percentage composition, as well as predictive chemical properties such as density and refractive index. ACD/ChemSketch includes an extensive chemical dictionary with over 103,000 trivial and systematic names with corresponding structures.

IMAGE ILLUSTRATION3

Chemical Editing Window of ChemDraw Ultra.

Chemical Database Window of ChemFinder Ultra.

IMAGE ILLUSTRATION4

Chemical Editing Window of Chemistry 4-D Draw.

Typical Accord for Excel worksheet.

ACD/ChemFolder is an advanced chemical database program that manages separate files with chemical structures, reactions, and reports. Records can be retrieved by structure similarity, substructures, exact structures, or other data.

ACD/Name and ACD/Name to Structure can generate accurate systematic names according to IUPAC recommendations and CAS Index rules for almost any organic structure and selected classes of biochemical, organometallic, and inorganic structures. ACD/Name to Structure also reports warnings about ambiguity of chemical names before structure generation.

Installation: The installation of ACD/Labs performed flawlessly, including all the electronic documentation.

Manual/Help: No printed materials were available for review, however, an extensive set of electronic documentation is available. Additional online (Web-based and newsgroup) customer support is available.

Ease of Use: Similar to the ChemOffice suite, ACD/Labs is a complex software package that requires some reading before proper use. It maintains a standard Windows interface that most users should find fairly easy to learn. Many of the more powerful functions are available as toolbar buttons. Incremental software updates are available online from ACD/Labs, including the option for the software to autodetect the availability of updates upon startup. Unlike some Windows-based software, ACD/Labs prompts you as to whether you would like to modify your file associations to utilize ACD/Labs versus other software packages, a particularly nice feature.

Import/Export/Compatibility: ACD/ChemSketch supports importing of a wide range of file formats, including Windows Metafile (*.wmf), MDL Molfile (*.mol), ChemSketch 1.0 Molfile (*.mst), ChemSketch 1.0 RPT file (*.rpt), CS ChemDraw CHM file (*.chm), CS ChemDraw CDX file (*.cdx), REACCS RXN file (*.rxn), and ISIS/Sketch BIN file (*.skc). Exporting options include Adobe Acrobat PDF, various graphic formats, and all import formats.

MS Office Integration: ACD/Labs is fully compatible with Microsoft Office 2000 and XP using object linking and embedding. Sometimes the Cut/Paste method may be represented as a set of numbers and figures instead of the actual chemical structure. ACD/Labs recommends using the Paste Special feature in the application to resolve the problem.

Chem 4-D Office 7.0

Features: Chemistry 4-D Office 7.0 comes bundled with several programs integrated into a single, functional program environment:

Chemistry 4-D Draw is ChemInnovation's advanced drawing tools for chemical structures. It works with structures, text, and graphics simultaneously and has the ability to transfer the information to any OLE-supported software. It uses a standard Windows software environment with simple, well-organized drop-down menus and toolbars.

NamExpert converts IUPAC chemical names into corresponding structures in one of three styles: shorthand, Kekule, or semistructural formula. Additionally, it adds labels to appropriate atoms and groups. NamExpert supports stereochemistry and includes 8,000 drug names and structures. In tests with common chemical moieties, pesticides, and generic pharmaceuticals, NamExpert performed flawlessly. You can define new structures with the NamExpert database. Unless you are extremely familiar with IUPAC nomenclature rules, you should stick to simple chemical names to assure accurate structure generation.

Nomenclator performs the opposite function of NamExpert by automatically assigning systematic names to organic structures according to IUPAC nomenclature rules. Nomenclator recognizes structures of hydrocarbon, fusion rings, heterocyclic systems, halogen, alcohol, ketone, aldehyde, carboxylic acid, amine, and more.

The Chem4-D Database module manages databases of molecular structures, graphics, and information associated with the data. It helps you search and reuse structures and graphics you have created.

Installation: Installation of Chem 4-D Office performed without a flaw. The software prompts the user for a software key during the installation process. The installer also includes a troubleshooting solution (similar to Microsoft installers) in case of future program errors.

Manual/Help: Chemistry 4-D Draw includes a well-written printed user's guide that has examples of how to utilize the features of Chem 4-D Database and NamExpert. Online (Web-based) customer support is not available, however, the program does include a simple help file.

Ease of Use: Although Chemistry 4-D Draw is not the most powerful editor among the selected software packages, it does succeed in providing one of the easiest-to-use interfaces for novices. The NamExpert software quickly converts structures to the standard IUPAC naming convention, or vice versa.

Import/Export/Compatibility: Unfortunately, Chemistry 4-D Draw supports a very limited range of file types, including the import of MDL Molfiles (*.mol), and the export of MDL Molfiles, Encapsulated PostScript (EPS), and Windows Metafiles.

IMAGE ILLUSTRATION5

Displaying and expanding chemical structures in Accord for Excel.

Typical ISIS for Excel worksheet.

MS Office Integration: Chemistry 4-D Draw is fully compatible with Microsoft Office 2000 and XP using object linking and embedding. Double-clicking a chemical object in any Word document starts the Chemistry 4-D Draw program.

Accord for Excel 5.0

Features: The Accord for Excel 5.0 add-in provides the ability to manage, analyze, and merge chemical structures with additional data in an Excel spreadsheet. Unlike other chemical database programs, Accord for Excel leverages the power and simplicity of a specific - though probably every searcher's favorite-spreadsheet environment (unless you are a WordPerfect QuattroPro user). All Accord functions are available from either drop-down menus or toolbar buttons.

Installation: There are several different editions of Accord for Excel that can be installed; however, the most useful version for information professionals is the Accord for Excel Professional Edition. The installer gives the user the option to perform an Evaluation install (30-day time limitation) or a full install. The installation program also contains a simplified, complimentary chemical drawing program, CASDraw, although the Accord for Excel software is compatible with other chemical drawing software. Accord also offers add-on modules that can perform physical chemistry property calculations and combinatorial chemistry functions. To activate the software after installation, you may have to go to Excel's "Tools -> Add-Ins..." and select the appropriate Accord tools. The software successfully installed with both Excel 2000 and Excel 2002 under Windows XP.

IMAGE TABLE6

Summary Review

Software Package Costs

Manual/Help: Accord for Excel includes an extensive set of printed user's guides that cover the use of Accord, the CombiChem Add-On program, as well as a thorough reference guide on the chemistry engine. Online (Web-based) customer support is available to registered customers only, though a detailed help file comes with the program. Since the add-in uses Visual Basic to perform its function, the Help file includes detailed instructions on how to use the commands for more advanced applications.

Ease of Use: Accord for Excel is analogous to Microsoft Word. On the face of it, the software is extremely simple. Deep down, Accord provides a host of more complex functions that require a more thorough understanding of the software; however, all basic functions are available from the drop-down menu. Structure files are quickly imported and displayed and chemical physical property data (such as molecular mass) can be calculated using the provided functions. Accord provides a nice blend of features for both novice and experienced chemical information professionals.

Import/Export/Compatibility: Structures in Accord for Excel spreadsheets are editable with ISIS/Draw, ChemDraw, or the included editor, CASDraw. The software can import and export a wide range of the following file types: MDL SD and RD Files (*.sdf, *.rdf); MDL Molfiles (*.mol), Rxnfiles & Sketch files (*.skc); CS ChemDraw (*.cdx); CAS CXF/CXG; SMD 4.3 (*.smd); SMILES; Questel DARC-F1 (*.fld).

MS Office Integration: Since Accord for Excel is an add-in product, the integration with MS Office is extremely well executed. The software is compatible with Microsoft Office 2000 or XP using object linking and embedding. We tested Accord for Excel with Excel 2000 and Excel 2002 for this review.

MDL ISIS for Excel 2.0

Features: MDL ISIS for Excel is similar in use to Accord for Excel, but with less functionality. Some physical property calculation tools, as well as the limited import/export functions, make the tool slightly easier for new users, however, more experienced chemical searchers may prefer Accord.

IMAGE TABLE7

Table III. Vendor Contacts

Installation: The installation application was very quick and easy, with no serial number or product activation key required. MDL releases periodic service pack (SP) updates to correct any errors with the product. On Windows XP, we were warned that the ISIS installer had overwritten some key files from older versions without a warning; fortunately, no further complications arose. The installation automatically configured the Add-In and the software successfully installed with both Excel 2000 and Excel 2002 under Windows XP.

Manual/Help: A well-written, although brief, set of electronic (PDF) documentation is available after installation. Online (Web-based) customer support is available and a modest Help file is included with the program.

Ease of Use: Using ISIS for Excel is extremely simple, so simple that you almost don't need the manual. Structure files can be imported in a single step or one molecule at a time. Structures can be displayed by selecting individual cells or an entire column. Chemical property data is available, although not nearly as conveniently as with Accord, since you can only display one molecule at a time. ISIS for Excel is perfectly suited for novice users to easily import chemical data and merge it with additional data sources.

Import/Export/Compatibility: Structures in ISIS for Excel is more limited in its ability to import chemical file types. Only MDLSD (.sdf) and Molfiles (.mol) can be imported. Additionally, the export functions are limited to ISIS List (.1st), MDL SD (.sdf), ISIS Database (.db), and Molefiles (.mol).

MS Office Integration: As with Accord for Excel, ISIS for Excel is well integrated with both MS Office and Excel. The software is compatible with Microsoft Office 97,2000, or XP using object linking and embedding. Accord for Excel was tested as compatible with Excel 2000 and Excel 2002 for this review.

Summary

We could not select a winner among the selected software packages, simply because each has unique positive and negative features depending on your application (see Table 3 above). The most important factor to consider is which software package and features will most easily integrate into the information delivery strategy that your business partners use or require. The superb Excel integration features of both ISIS for Excel and Accord for Excel Add-Ins may be ideal, if you only need a tool to integrate chemical structure data with additional information. However, a more robust (and expensive) solution that can generate IUPAC names, deal with complicated structures, or integrate with an in-house system would require either ChemOffice or ACD/Labs, the two premier chemical information packages. In addition, these two packages are the most flexible in terms of sharing large chemical databases with other users. Chemistry 4-D Office is a good choice for a simplified chemical drawing package that can perform some limited naming functions. Whether you choose one of the full software suites, or the most easily utilized Excel Add-Ins, these chemical information tools should add value to your products for clients who need chemical information merged with additional data.

SIDEBAR

Academic Chemical Information Opportunities Laura M. Felter

When I was an undergrad in the early 1990s conducting inorganic research at Purdue University, all my experimental data was kept in a laboratory notebook. Key NMR, mass spectroscopy, and crystallography spectra were printed and kept with other hardcopy data.

I've worked in several industries since then - specialty chemical, pharmaceutical, and consumer electronics - and I've had access to sophisticated chemical information databases, either off-the-shelf or customized, that provided reactions databases and access to the chemical activities of whole groups of researchers, past and present. These centralized databases are used as knowledge management and project management tools. These databases can provide a resource for tracking and searching the organization's proprietary information, as well as serving as a rich resource of research activities with valuable contact information on contributors and their specific areas of expertise.

From my chemical database experiences and my career transition from analytical chemist to corporate librarian, I've wondered what chemical database tools the academic community has been using in their research laboratories over the last decade.

Specifically, I wondered about departments that use chemical software as a centralized repository of research activities. A cursory search of both the information science and chemical literature provided some studies on information storage and retrieval systems used in academic chemical research laboratories, but there hasn't been much published since the days of my experience. There may very well be activity in this area, but not much incentive to publish on activities other than funded research. Or maybe I was missing something? So I called my former research advisor, who now heads the Chemistry Department at Purdue, for some perspective.

My former advisor confirmed that, in many circumstances, the best way to appraise an academic research laboratory's activities is to search the published literature. We discussed the options available to academic organizations, along with his experiences with corporations and their needs, such as the collection and distribution of electronic laboratory data for intellectual property interests, and for tracking R&D's attempts for planning future or past research efforts.

The primary barriers for instituting such centralized chemical knowledge management tools in an academic research setting appear to be the time involved for data input and the integration of data from various analytical database systems to make the data meaningful enough for mining and extracting. Perhaps, for now, the desire and drive for IP-based information differs in the academic environment? Compounding these issues are the familiar scenarios existing in all organizations: multiple system platforms, ownership issues of choosing and ensuring collaborative use of such systems, and overcoming IT difficulties and funding shortages impeding the implementation and maintenance of an organization-wide database. The desire and need to track all research must be strong to overcome these obstacles. Without that internal commitment, ultimately, the efforts of research documentation storage must focus on successful, published research.

How are these centralized research efforts addressed in academic areas across all the scientific and engineering fields? How will this change with the increased collaborations among universities and corporations, especially in the cutting-edge areas of biomedical and biotechnology? As universities see increased funding possibilities extend from nontraditional sources, such as corporate America, it will be interesting to note the changes in academic institutions and the increased expenditure of effort and funding into centralized research databases and repositories.

REFERENCE

References

1 For an online updating review of selected chemical drawing programs from a chemist's perspective, visit the Web site of Tamas Gunda of the University of Debrecen in Hungary at http://dragon.klte.hu/~gundat/rajzprogramok/dprog.html.

2 More information on MDL CTFile formats appears in a White Paper at http://www.mdl.com/solutions/white_papers/ctfile_formats.jsp.

3 For the authoritative source on stereochemistry, visit the IUPAC Basic Terminology of Stereochemistry (1996): http://www.chem.qmul.ac.uk/iupac/stereo/.

4 For a comprehensive list of free chemistry software, visit the extensive collection of links at Tom Kuppen's recently redesigned Web site, Tom's Free Chemistry Software, athttp://allserv.ugent.be/~tkuppens/chem/.

AUTHOR_AFFILIATION

by Matthew McBride

Principal Information Consultant

ClnC, Inc.

and Laura M. Felter

Information Consultant

ClnC, Inc.