Posted by Tracy Thorleifson on Fri, Sep 03, 2010
One of the key goals of APDM 5 was to simplify manual data maintenance. Because PODS ESRI Spatial 5 borrows extensively from the APDM 5 core, many of the activities critical to the maintenance of an APDM 5 geodatabase also pertain to a PODS ESRI Spatial 5 geodatabase. This is the first of a series of practical articles dealing with manually populating an APDM 5 or PODS ESRI Spatial 5 geodatabase. This particular post deals with some necessary background information on key identifiers in the APDM / PODS ESRI Spatial. Future posts will get down to the nitty-gritty of data loading.
In an APDM or PODS ESRI Spatial geodatabase the primary key (or unique identifier) of each object / feature class is a field named EventID. You might ask, why not just use the geodatabase ObjectID? The answer is that ObjectIDs are not immutable; they can change when geodatabases are exported, copied or merged. If an ObjectID changes, all relationships that reference that ObjectID are broken. EventIDs never change, so relationships based on them cannot break.
The EventID field has a data type of GUID. GUID stands for Globally Unique ID (Identifier). Needless to say, in order to guarantee global uniqueness, a GUID cannot just be a simple integer (like an ArcMap ObjectID). Although the actual storage mechanism for GUIDs varies a bit depending on the underlying relational database, a GUID is actually a 16-byte integer value. This is a huge number. Given the randomness inherent in the GUID algorithm and the size of GUIDs, you could theoretically generate a million GUIDs per second for the next 100 years and still have only a one in two chance of generating a single duplicate.
GUIDs have been around and in use for a long time. For instance, every Microsoft COM class ever created is identified by a GUID, as a quick examination of the Windows registry will reveal:

In ArcMap GUIDs appear as complex, thirty-eight character strings, e.g. {FF87DB81-8620-40D0-80FF-139286392ACD}. The curly brackets and dashes are just for (Microsoft CLSID-style) formatting; the remaining 32 alphanumeric characters are the actual hexadecimal representation of the GUID. (For this reason you'll never see a letter in a GUID higher than 'F.' Hexadecimal digits run from 0-9,A-F.)
On the plus side, GUIDs are guaranteed to be globally unique. This makes it easy to combine or merge different APDM or PODS ESRI Spatial geodatabases without any fear of primary key collisions. A bit more subtly, you don't actually have to request new key values for new records directly from the database; you can do it on the client side. This makes it very easy to create data in disconnected applications (e.g. field data collection) that can be loaded to the geodatabase with minimal massaging.
On the minus side, GUIDs are just a tad cumbersome; some might even say unaesthetic. There is also some underlying performance overhead with GUIDs. Keep in mind that a GUID requires 16 bytes of storage, whereas a long integer requires eight bytes and a regular integer only four bytes. For this reason, database indexes built on GUIDs are more voluminous than indexes built on integers. By extension, queries with joins based on GUIDs tend to be more expensive than queries with joins based on integers. Five to ten years ago when GUIDs first came into use as primary keys, a variety of legitimate performance concerns were raised. These concerns have largely been mitigated by hardware, operating system and database advances (particularly with Microsoft SQL Server). For even the largest pipeline databases, as long as you are running on current hardware and database software, query performance with GUIDs should not be an issue.
It ought to be fairly obvious that just banging out an arbitrary thirty-eight character string in no way guarantees a valid GUID value. So generating GUID values can be a royal pain in the derriere, unless you have the appropriate trick up you sleeve. The next article in the series teaches you that trick. And with that trick, generating GUIDs is a snap!
Posted by Tracy Thorleifson on Mon, Jul 26, 2010
In the previous post we examined the pros and cons of the APDM, comparing and contrasting it with PODS. Here we take a look at the ESRI incarnation of PODS Spatial. As of this writing, the PODS ESRI Spatial 5.0 geodatabase model is scheduled for official release by the PODS organization in early September, 2010.
The PODS Spatial effort undertaken over the past couple of years by the PODS organization is largely a response to the lack of standards for spatially enabling PODS. There are actually two flavors of PODS Spatial, one based on Oracle Spatial technology, and the second based on ESRI geodatabase technology. Only the latter is considered here.
PODS ESRI Spatial (hereafter referred to simply as PODS Spatial) has recently been released for comment by the PODS organization and is thus the latest incarnation of the PODS model. It is essentially an ESRI geodatabase implementation of PODS, as shown below.

The PODS Spatial geodatabase design borrows freely from APDM concepts; the PODS Spatial class hierarchy is a close cousin to the APDM class hierarchy. Unlike the APDM, PODS Spatial remains a standards-based model. The content of PODS Spatial in terms of feature classes is virtually identical to the event tables of the traditional PODS relational model. As such, PODS Spatial occupies an interesting middle ground between the traditional PODS relational model and the APDM.
PODS Spatial might be a good fit for your organization if:
- Your company already makes use of ESRI geodatabase technology
- Your company is willing to maintain relational data integrity via application logic
- Your company is already making use of SOA technology
- Your company views a standards-based data model as being important
- Your company is committed to ArcGIS as the GIS technology of choice
- Your company views being able to incorporate a wide variety of ESRI-based 3rd party tools as important
This post concludes our brief look at the world of pipeline data models. Again, there is no single "best" pipeline data model. What's important is picking a pipeline data model that best fits your organization.
In future posts we'll concentrate on manual techniques for loading and maintaining APDM / PODS Spatial geodatabases. We'll touch on topics such as enabling archiving, and the peculiarities of versioned editing with these geodatabases. We'll also visit techniques for implementing ESRI geometric networks with these data models.
Posted by Christopher Moravec on Fri, Jun 04, 2010
Have you ever heard the phrase, "there's an app for that"? Who hasn’t?
Times are changing, we hear phrases like this, or even just "maps and apps," but what does it all mean, how can we use these apps to our advantage? Recently, we have seen an influx of mobile devices into our world, so what can we do with this technology we already have at hand? Well, it turns out there are lots of things we can do!
In this three part series I will cover the basics of GIS Data Collection, and cover five different methods for utilizing existing and emerging technology to collect data in the field, store it in the central GIS, and return it to the field.
Current methods for GIS Data Collection can often times leave you with the feeling of wanting more!
Fortunately, there are several ways to avoid this, and even come out of the process with a good feeling!
In the GIS world, data is our life blood, but many times we are not the true owners of the data, merely the stewards. It is our job to actively direct the affairs of the system, to ensure that the data is well kept, always available, stored in an efficient manner, and delivered to the subject matter experts and the users. It is very easy to fall into the trap of trying to make the data “perfect” before distributing it, and therefore not releasing it in a timely manner or at all! Boy do I have news for you!
It's never going to be perfect!
That can be hard to hear, but there are many ways to work on and with the data. The most important activity is getting the raw data from the people who know it, and returning the finished product back to them quickly. There are many different types of GIS Data Collection that can help you achieve this, some of them better suited than others, but more on that later.
Often times, GIS environments fail because they “hoard” the data in the central office, creating a “black hole” where data goes in, and never comes out. When designing a GIS and its implementation, always be conscious of the true owners of the data, and how and when they will get the data back.
There are several ways to approach solving this problem, you can:
- Provide a web portal where users can log in and search for events and submit changes to their attributes.
- Provide high accuracy GPS devices for users to review and collect data while in the field.
- Provide “lite-weight” applications for laptops or tablet computers to view, edit and collect data.
- Provide simple applications on smart phones to view, edit and collect data.
- Provide data editing and collection by leveraging Smart Alignment Sheets.
This blog series will focus on each of these items and provide the pros and cons of each. Check back next week for the next posting!
Posted by Tracy Thorleifson on Thu, May 20, 2010
In the previous post we examined PODS at a high level; here we examine the ArcGIS Pipeline Data Model (APDM) and compare and contrast it with the Pipeline Open Data Standard (PODS).
The APDM differs from PODS in important ways. First and foremost, as its name implies, the APDM is dependent on ESRI ArcGIS technology. Unlike PODS, the APDM works only with ArcGIS. If your company does not currently use, or is not at least considering the use of ArcGIS, then the APDM is not the right fit for your organization.
The APDM is an ESRI Geodatabase model. In an enterprise-level installation the geodatabase is implemented within a Relational Database Management System (RDBMS) such as Oracle or Microsoft SQL Server, as shown below.

Geodatabase technology is fundamentally object-relational technology. Even though the geodatabase is contained within the RDBMS, it is not purely a relational database. This leads to several interesting differences with respect to a relational database:
- Under most circumstances, Structured Query Language (SQL) cannot be reliably used to access or manipulate the data in a versioned geodatabase
- Although the geodatabase implements relationship classes and utilizes code domains, relational integrity is not strictly enforced
On the other hand, ESRI geodatabase technology provides functionality that is not available out-of-the-box in an RDBMS:
- Long transaction capability is built-in to the geodatabase via versioning
- History tracking is built-in via archiving
- Complex topology management is built-in via geometric networks and topologies
- And most importantly, the geodatabase is spatially enabled out-of-the-box
By virtue of its underlying object-relational framework, the APDM takes advantage of a data modeling concept known as inheritance. Inheritance facilitates the creation of a class hierarchy; classes at the end of the inheritance tree automatically inherit the content and behaviors of their ancestors. (The ancestors are referred to as abstract classes.) All feature classes and tables in the APDM are categorized according to their abstract class inheritance. Pipeline-specific abstract classes include, for example, categories such as online facility features. Mainline valves are classified as online point facility features.
As a result of the use of abstract class data categories, the APDM is a template-based data model. This affords the APDM a much higher degree of flexibility than a standards-based model such as PODS. As long as the abstract class hierarchy is adhered to, the APDM can be readily customized to suit your company’s particular needs. A software vendor (or in-house developer) that designs applications to work with the APDM abstract class framework can easily support wide variation in data model content without resorting to software modifications.
Although SQL data access is not a strong point of ESRI geodatabase technology, there have been recent improvements on this front. Be aware, however, that taking advantage of these improvements tends to compromise some of the long transaction functionality of the geodatabase. A better way to interact with the geodatabase at a low level is by employing Service Oriented Architecture (SOA) technologies. Web services are the primary components of an SOA; ESRI’s ArcGIS Server technology platform provides a robust mechanism for implementing web services for use with the APDM.
The APDM might be a good fit for your organization if:
- Your company already makes use of ESRI geodatabase technology
- Your company is willing to maintain relational data integrity via application logic
- Your company is already making use of SOA technology
- Your company views data model flexibility as important
- Your company is committed to ArcGIS as the GIS technology of choice
- Your company views the ability to incorporate a wide variety of ESRI-based, third-party tools as important
For the latest on the APDM, check out What's New in APDM Version 5.
Next, we'll take a look at the newest of the pipeline data models, PODS Spatial.
Posted by Tracy Thorleifson on Wed, May 05, 2010
In the previous post, we defined a basic approach to selecting a pipeline data model. The important thing is to determine which model best fits the needs of your organization. In this post we examine the high-level characteristics of the PODS data model, and lay out some criteria for deciding whether PODS is a good fit for you.
PODS is fundamentally a relational data model, meaning that it is intended to be implemented on a Relational Database Management System (RDBMS) platform such as Oracle or Microsoft SQL Server. PODS is neutral with respect to Geographic Information System (GIS) technology by design. Although optimized for use with ESRI Linear Referencing technology, PODS can readily be implemented with a wide variety of GIS technologies, as shown below.

As a relational data model, PODS benefits from the strengths of RDBMS technology:
- Relational integrity is automatically enforced
- Data in the model can be readily accessed via Structured Query Language (SQL)
- RDBMS processing tools such as stored procedures can be used to manipulate the data
As its name implies, PODS is intended to be a data standard. This means that the content of the data model is rigorously defined and must be adhered to closely for compliance purposes. The upside of this approach is a rich data model with extensive content; the downside is lack of flexibility. The PODS data model essentially embodies a one-size-fits-all approach.
A potential weakness of PODS is that the mechanism for spatially enabling the model with a GIS is not specified; there is no one approved method for spatially enabling PODS. In practice this means that every PODS software vendor sells a proprietary solution for spatially enabling PODS. You’ll have to exercise considerable care if you choose to implement a vendor-supplied PODS solution because not all PODS software providers play nicely with others.
PODS might be a good fit for your organization if:
- Your company has deep expertise in a particular RDBMS technology
- Your company regards relational data integrity as being extremely important
- Your company regards access to and integration with data at the SQL level as important
- Your company views a standards-based data model as being important
- Your company makes use of several different computer mapping technologies
- Your company is not averse to potentially being locked into a single provider spatial solution, OR your company has sufficient expertise to spatially enable PODS on its own
In the
next post, we'll review the ADPM in the same manner.
Posted by Tracy Thorleifson on Mon, Apr 12, 2010
Lions and tigers and bears! Oh, my! Or at least that’s how it feels when you're trying to select an industry standard pipeline data model for use by your organization. It seems like every expert has a different opinion, all of us vendors are trying to sell you something, and you don’t know who to trust. Well, fear not, Dorothy! You, too, can follow the yellow brick road through the dark forest of pipeline data model selection. Just remember, it's not a matter of which model is best; it's a matter of which model is the best fit for your organization. This is the first of a four part series called "Picking a pipeline data model", or "How to follow the yellow brick road to pipeline Oz."
As for the trust question, we're not trying to sell anything in these blog posts (at least not directly). Since Eagle Information Mapping offers solutions for all three data models, we truly are data model neutral. Eagle has a tremendous range of experience with each of these data models; we have been intimately involved in their designs; we have used each of them since their inceptions. So we understand their strengths and weaknesses better than most.
The Pipeline Open Data Standard (PODS, www.pods.org), the ArcGIS Pipeline Data Model (APDM, www.apdm.net) and PODS Spatial differ from each other in both concept and execution. Understanding these differences will help you make the best choice for your organization. In the following series we'll concentrate on the high level differences between the three models to give you ideas for further investigation. We'll tackle PODS first, then the APDM, and finish with PODS Spatial. And remember, the yellow brick road will lead you to a pipeline Oz that will work for you.