Constructing the Moby-Dick Gazetteer
- All place-names were identified via a manual review of the Moby-Dick text from the Modern Library 1926 edition of Moby-Dick.
- The Gutenberg on-line version was used to select, copy and paste the entire sentence in which the place-name occurred and enter it into the database.
- All occurrences of each place-name were captured.
- All duplicates were retained which allows for identification and analysis of different spellings or punctuations of place-names.
Key to styles on this page:
- Red, bold, san-serif = Coded value.
- Courier, bold = literal text from Moby-Dick.
- Sans-serif, bold = field or column name.
Rules for capturing place-names:
Capitalization and mappability
To be captured, the place-name must able to be found on a map, chart, or globe of the earth or be a known object in the sky. The mappable location can be very specific (e.g. Eddystone Lighthouse) or quite vague (e.g. Polar Seas). If it is a named place (on Earth or in Space), then it is in The Moden Moby-Dick Gazetteer.
To be captured, the place-name must be capitalized. If more than one word is capitalized (e.g. Pacific Ocean, Heidelburg Tun), then all capitalized words making up the place-name were captured. [The only exception to the capitalization rule is milky way which is capitalized by Melville the first time he used it, then in lower case the next two times he used it.]
Category and type coding
A valuable feature of the Moby-Dick Gazetteer is that it includes coded attributes for each place-name. These are the category and type attributes.
- Category is a broad categorization of place-names. The categories are:
- Celestial Feature. Planets, Constellations, Galaxies, etc.
- Cultural Feature. Statues, Monuments, Bridges, Parks, etc.
- Geographical Feature. Equator, Poles, Regions. Regions are somewhat ill-defined places that are named by people because of some human-centered and often arbitrary definition. For example, New England
is a known place that is mappable. But what is it? It is a region defined by humans. It is not an island (landform), continent (landform), or a state (political feature)
- Landform. Naturally occurring landforms such as islands, continents, mountain ranges, etc.
- Political Feature. Places whose location and boundaries are defined by political entities. States, Counties, Countries, etc.
- Populated Place. Cities, villages, etc.
- Water Feature. Oceans, seas, lakes, canals, etc.
- See the Summaries page to get lists and counts of categories and types.
Rules for assigning category and type
- Secondary word capitalized. Named things that include a capitalized place-name with a capitalized modifier are included when they can be mapped to a specific place. Examples:
- Tower of London (category = Cultural Feature; type = tower)
- Heidelburgh Tun (category = Cultural Feature; type = barrel)
- Phidias's marble Jove (category = Cultural Feature; type = statue)
- Secondary word not capitalized. If a secondary word is not capitalized, it is not captured as part of the place-name, but is retained in the modifier column. Example:
- Nantucket market will be captured as
- place-name = Nantucket
- modifier = market
- This place-name will be coded in category = Landform and type = Island because the place-name Nantucket is the only clue as to location. Lower case market
signifies that it is a generic market, not a known market with a specific location.
- If the text had been Nantucket Market (Market capitalized), then it would be captured as follows:
- place-name = Nantucket Market
- The capitalization of Market signifies that this is a particular named market with a mappable location (whether known or not).
- This place-name would have been coded as category = Cultural Feature and type = Market.
- Greenland whale will be captured as
- place-name = Greenland
- modifer = whale
- This place-name will be coded as category = Landform and type = Island because whale is not capitalized AND because a whale is not a specific, mappable object on a map.
- This coding may be somewhat confusing, but the Gazetteer is HEAVILY location-oriented. If the place-name includes a non-capitalized modifier and the compound place-name
cannot be mapped (a Greenland whale is not a place on a map), then only the place-name (e.g. Greenland) is used in assigning a category and type.
- Place-names with more than one meaning (e.g. Indian). The context and meaning of the sentence is used to identify the meaning of a place-name:
- Melville uses Indian to signify: the country; the ocean; and as a descriptor for Native American.
- The context of the sentence in which the term is used determines how the category and type are coded in these cases.
- NOTE: When Melville uses Indian as an adjective to describe someone (e.g. Queequeg), then Indian is used to describe a people and NOT a mappable place. Therefore, in this case Indian is
NOT captured.
- Islands that are also countries. Coding is somewhat subjective, but here are the general rules:
- If the place did not have global importance as a political or cultural feature during the period Melville wrote Moby-Dick, then the place is coded as category = Landform and type = Island
- Iceland
- Greenland
- Fiji
- Cuba
- Malta
- Tahiti
- Formosa
- If the place is an island and country, and its political and cultural importance was greater than its importance as an island, then it is coded as a category = Political Feature and type = country:
- If the island is part of more than one country, than it is coded as category = Landform, type = Island:
- If the place is an island and a province of a country (and not the whole country), then it is coded as category = Landform and type = Island, not a province:
- If the place is made up of islands and at least some of the islands are named, then the islands are coded as category = Landform and type = island and the country is coded category = Political Feature and type = country:
- Three of Japan's islands are named and each is coded as category = Landform and type = Island.
- The place-name for Japan is coded as category = Political Feature and type = country rather than an island chain.
- If a country is made up of islands, but none of the islands are named, then the name is coded as category = Politcal Feature and type = country.
- New Zealand is coded as category = Political Feature and type = country and not as an island chain because none of its islands (North Island, South Island) is named.
- Countries that are Continents. Australia is coded as type = 'Country' because that is a more useful interpretation than if it was coded as type = 'Continent'
Handling quoted dialogue
If a sentence is part of quoted dialogue, then the following will occur regarding quotation marks:
- If the sentence is at the beginning of quoted dialogue, the sentence in the gazetteer will contain the opening double quote.
- If the sentence is at the end of quoted dialogue, the sentence in the gazetteer will contain the closing double quote.
- If the sentence is the entire piece of quoted dialogue, then both beginning and ending double quotes will be included.
- If the sentence falls in the middle of a multi-sentence piece of quoted dialogue, the gazetteer will not contain any double quotes.
Purpose of the normalized field
The normalized field contains a single (e.g. normalized) name for a set of place-names having the same root. For example:
- Nantucket, Nantucker, Nantucketers,
Nantucketer's in the place-name field are all coded as Nantucket in the normalized field.
- Sunda and Straits of Sunda in the text are all coded as Sunda Strait in the normalized field.
- All variations of place-name Rome (Roman, Romish, Rome's) are coded as Rome in the normalized field.
Purpose of the modifier field
The modifier field, when used, provides the word (next or previous) to the place-name. This is done to add context and clarify the meaning of the term. There is special punctuation
in the modifier field as follows:
- parentheses () to enclose a term that is implied, but not stated. So, the record for Greenland or right whale in the text is coded as Greenland in the place-name field and (whale) in the modifier field. This indicates that whale did not follow Greenland, but was implied.
- backslash \ to indicate that the modifier term comes before the place-name. So Mississippi, preceded by mighty, would contain the text \mighty in the modifier field.
- backslash and parentheses \() to indicate an implied modifier that comes before the place-name. So, half baffled Channel billows in the text would contain Channel as the place-name and \(English) in the modifier field to indicate that English is implied before Channel.
Locations
How locations were collected:
- I called up each normalized place name in maps.google.com and grabbed the lat/long from the (near) center of the place-named object.
- The place-name data table contains the latitude and longitude coordinates and zoom level in separate fields. The maps.google.com link is constructed from those data values.
- The latitude and longitude for an island chain may be a generally central point among the islands or it may be on the center of the largest island of the island chain.
- For features that were not included in maps.google.com (for example, Babylon, Assyria, etc.), I went to wikipedia to call up a map of the approximate locations of these things
and used a point somewhere near the center of whatever map of the region or place was available.
- Named oceans and seas are somewhat problematic because Melville's contemporary usage may different from current use or because he is not clear about what he is actually writing about. When it is reasonably clear what he means
then the a reasonably centered location was chosen. When it wasn't clear (for example, it is not clear weather he means North Pacific or South Pacific), then the
location is placed at the rough center of the Pacific Ocean. Since features such as these are commonly known, I didn't worry too much about what point I chose.
- The maps.google.com URL structure includes both the lat/long to move to and also, a zoom level to determine how close in or far out the scene should be visualized.
I picked a reasonable zoom level for the features.
Flow-chart of Categorization and Typing