Database


The information summarized at this web site is taken from a database that I created and maintain. Its primary purpose is to track information on geographic distributions and hosts based on specimen data. It is the source of data for creating distribution maps and analyzing patterns of host use. It tracks specific specimens deposited in museums or cited in the literature. Although it is incomplete in many regards, it has now grown to the point that the information contained is of interest and use to a broader audience. The database includes >26,000 collection records; 3,680 names (all species found in the New World), an additional 1200 synonyms, and information on 1500 images of nearly 500 species


HISTORY AND FUTURE


The database began as a tool for keeping track of specimens that I had seen from loans, museum visits, and expanded to include literature references relevant to specific research papers and projects. It has passed through versions in EXCEL, FoxPro, SAS and in its primary form is maintained as a relational database in FileMaker Pro (currently version 10). The data tables are gradually being converted to MySQL (5.1). This allows the data to be used dynamically in summarizing distribution and host information, creating dynamic distribution maps, lists, and indices.


For the foreseeable future, the MySQL version will be a secondary mirror of the primary FileMaker Pro database. I intend to gradually make more of the information in the database available to a wider audience, but have no short term plans for allowing others to contribute records. This may change depending on collaborative interest and or external support.


I am still actively adding records so the database will continue to grow in size and completeness.


GEOGRAPHIC COVERAGE


In decreasing order of priority, the database tracks species found in:


  1. Southeastern US: >95% complete for published information, still lacking South American collection event data from Wood (2007). In addition to published information, there data from numerous specimens in collections.
  2. Mexico: ~60% complete for published information and >80% complete for significant collections.
  3. North America, including Central America and the Caribbean: ~40% complete.
  4. New world species.

A fundamental guiding principle has been to include all New World distribution information for any species within a region of interest such as Mexico or the Southeastern U.S. For example, Dendroctonus valens is found in montane regions of the southeastern U.S. The database currently includes all published records from Honduras to the Northwest Territories. Consequently, for many species found in the northern and western United States there may be considerable or even complete distribution information available if the species also occurs in the southeastern U.S. or Mexico.


My main priority is to fill in the gaps for the 2 regions of major interest to me, but I am making an extra attempt to keep on top of new publications so coverage for literature of the last 25 years (outside of catalogs) is actually nearly complete.


Until recently I have made little attempt to include locality information from outside the New World for introduced species or New World species that have established themselves elsewhere.


STRUCTURE


The database consists of numerous tables, but the most important include information on:


  • Collection events: who, what, when, where, etc.
  • Species: current names, synonyms, species attributes.
  • Bibliography: information on literature from which records are taken.
  • Images: information on images.

Other tables include information on museums and collections, taxonomic authorities, taxonomic transactions, higher classification, and location of specimens. More detail is available on request.


RECORDS


The database is structured to capture data related to collection of specimens as a taxonomic tool. While information and linkages to nomenclature, collections and bibliography are included, these represent supporting functions and are not the primary thrust at this time. The emphasis on taxonomy has some very specific consequences in the design and use of the database.


  1. All taxonomically useful data are taken from collection of actual specimens, deposited in museums and available for verification by any subsequent reviewer. In addition to basic fields on locality and circumstances of capture, there are also numerous fields related to curatorial details of number of specimens, their physical location, person making the identification, etc.
  2. The basic record is the series or collection event. That is, all specimens of the same species with identical collection data. While specimen level databasing has value, it requires an effort that is almost an order of magnitude greater.
  3. When dealing with published sources, the information must be dissected into “collection image” events to the degree possible. This breaks up the information into discrete records that match the format of physical specimens actually seen and vouchered. The amount of information included varies considerably. For example, in his revision of the genus Pityophthorus, Bright (1991) includes complete specimen collection data and museum data for almost all species and specimens seen. This makes direct and complete data entry possible. At the other extreme, in his treatment of the bark and ambrosia beetles of Canada (Bright, 1976) no specimen data are included at all and all that can be extracted are province level data (presence/absence). Even though dot distribution maps are included, the data cannot be reliably extracted from these low resolution figures. In any event, between these 2 extremes, some level of collection and curatorial data can be extracted. Even though the numbers and location of specimens are not known, it can generally be assumed that some of these specimens were seen and verified by the authors. Finally, valuable distribution and biological information can also be gleaned from non-taxonomic publications. A recent paper on pheromones of Hylesinus pruinosus (Shepherd et al., 2010) gives a new locality in Mississippi (not previously reported from that state) and lists where voucher specimens are held and who made the identification.
  4. One consequence is that a “bottom-up” approach is favored or a “top-down” approach. For example, one could start with the most recent taxonomic catalogs (Wood and Bright, 1992a, 1992b) and supplements (Bright and Skidmore, 1997, 2002). The problem is that these sources are highly condensed and only list countries, states and provinces. In practice a combined approach is needed, citing the catalogs only when a more specific reference cannot be found. In a similar vein, not attempt has been made to capture all data from within all publications, or even within a specific publication. When entering data from checklists and revisions, the general rule of thumb is to only input data if the record provides significant range or host information. Non-specific records (e.g., “Alabama”) are not generally added if a more specific record has already been included. On the other hand, I don’t go out of my way to eliminate records after the fact.
  5. Particular groups of specimens may have been reviewed by more than one person and have been cited in more than one publication. This degree of detail may not be of interest to general users but is vital for further taxonomic research. In some cases it is obvious that the same specimens are involved, in others a certain amount of speculation is involved. Dodge (1938) listed specific collection data for many specimens collected in Minnesota although it is not clear where these specimens are today. Wood (1982) cited some of the exact same localities, very strongly suggesting that he reviewed at least some of the same specimens. Finally I have personally seen specimens of some of these species with the same collection data on visits to the USNM. When possible, these different publications and identifications are linked to the same collection data record. This preserves the additional detail while avoiding proliferation of duplicate records.