ogr_sql - OGR SQL The OGRDataSource supports executing commands against

NAME

       ogr_sql - OGR SQL The OGRDataSource supports executing commands against
       a datasource via the OGRDataSource::ExecuteSQL() method. While in
       theory any sort of command could be handled this way, in practice the
       mechanism is used to provide a subset of SQL SELECT capability to
       applications. This page discusses the generic SQL implementation
       implemented within OGR, and issue with driver specific SQL support.

Supported SQL syntax

       OGR SQL supports the following pseudo-syntax:

       SELECT <field-list> FROM <table_def>
            [LEFT JOIN <table_def>
             ON [<table_ref>.]<key_field> = [<table_ref>.].<key_field>]*
            [WHERE <where-expr>]
            [ORDER BY <sort specification list>]

       <field-list> ::= <column-spec> [ { , <column-spec> }... ]

       <column-spec> ::= <field-spec> [ <as clause> ]
                        | CAST ( <field-spec> AS <data type> ) [ <as clause> ]

       <field-spec> ::= [DISTINCT] <field_ref>
                        | <field_func> ( [DISTINCT] <field-ref> )
                        | Count(*)

       <as clause> ::= [ AS ] <column_name>

       <data type> ::= character [ ( field_length ) ]
                       | float [ ( field_length ) ]
                       | numeric [ ( field_length [, field_precision ] ) ]
                       | integer [ ( field_length ) ]
                       | date [ ( field_length ) ]
                       | time [ ( field_length ) ]
                       | timestamp [ ( field_length ) ]

       <field-func> ::= AVG | MAX | MIN | SUM | COUNT

       <field_ref>  ::= [<table_ref>.]field_name

       <sort specification list> ::=
                     <sort specification> [ { <comma> <sort specification> }... ]

       <sort specification> ::= <sort key> [ <ordering specification> ]

       <sort key> ::=  <field_ref>

       <ordering specification> ::= ASC | DESC

       <table_def> ::= [’<datasource name>’.]table_name [table_alias]

       <table_ref> ::= table_name | table_alias

SELECT

       The SELECT statement is used to fetch layer features (analygous to
       table rows in an RDBMS) with the result of the query represented as a
       temporary layer of features. The layers of the datasource are analygous
       to tables in an RDBMS and feature attributes are analygous to column
       values. The simpliest form of OGR SQL SELECT statement looks like this:

       SELECT * FROM polylayer

       In this case all features are fetched from the layer named ’polylayer’,
       and all attributes of those features are returned. This is essentially
       equivelent to accessing the layer directly. In this example the ’*’ is
       the list of fields to fetch from the layer, with ’*’ meaning that all
       fields should be fetched.

       This slightly more sophisticated form still pulls all features from the
       layer but the schema will only contain the EAS_ID and PROP_VALUE
       attributes. Any other attributes would be discarded.

       SELECT eas_id, prop_value FROM polylayer

       A much more ambitious SELECT, restricting the features fetched with a
       WHERE clause, and sorting the results might look like:

       SELECT * from polylayer WHERE prop_value > 220000.0 ORDER BY prop_value DESC

       This select statement will produce a table with just one feature, with
       one attribute (named something like ’count_eas_id’) containing the
       number of distinct values of the eas_id attribute.

       SELECT COUNT(DISTINCT eas_id) FROM polylayer

   Field List Operators
       The field list is a comma separate list of the fields to be carried
       into the output features from the source layer. They will appear on
       output features in the order they appear on in the field list, so the
       field list may be used to re-order the fields.

       A special form of the field list uses the DISTINCT keyword. This
       returns a list of all the distinct values of the named attribute. When
       the DISTINCT keyword is used, only one attribute may appear in the
       field list. The DISTINCT keyword may be used against any type of field.
       Currently the distinctness test against a string value is case
       insensitive in OGR SQL. The result of a SELECT with a DISTINCT keyword
       is a layer with one column (named the same as the field operated on),
       and one feature per distinct value. Geometries are discarded. The
       distinct values are assembled in memory, so alot of memory may be used
       for datasets with a large number of distinct values.

       SELECT DISTINCT areacode FROM polylayer

       There are also several summarization operators that may be applied to
       columns. When a summarization operator is applied to any field, then
       all fields must have summarization operators applied. The summarization
       operators are COUNT (a count of instances), AVG (numerical average),
       SUM (numericla sum), MIN (lexical or numerical minimum), and MAX
       (lexical or numerical maximum). This example produces a variety of
       sumarization information on parcel property values:

       SELECT MIN(prop_value), MAX(prop_value), AVG(prop_value), SUM(prop_value),
              COUNT(prop_value) FROM polylayer WHERE prov_name = ’Ontario’

       As a special case, the COUNT() operator can be given a ’*’ argument
       instead of a field name which is a short form for count all the records
       though it would get the same result as giving it any of the column
       names. It is also possible to apply the COUNT() operator to a DISTINCT
       SELECT to get a count of distinct values, for instance:

       SELECT COUNT(DISTINCT areacode) FROM polylayer

       Field names can also be prefixed by a table name though this is only
       really meaningful when performing joins. It is further demonstrated in
       the JOIN section.

   Using the field name alias
       OGR SQL supports renaming the fields following the SQL92 specification
       by using the AS keyword according to the following example:

       SELECT select *, OGR_STYLE AS ’STYLE’ FROM polylayer

       The field name alias can be used as the last operation in the column
       specification. Therefore we cannot rename the fields inside an
       operator, but we can rename whole column expression, like:

       SELECT COUNT(areacode) AS ’count’ FROM polylayer

       We can optionally omit the AS keyword in the field name aliases, like:

       SELECT *, OGR_STYLE ’STYLE’ FROM polylayer

   Changing the type of the fields
       OGR SQL supports changing the type of the columns by using the SQL92
       compliant CAST operator according to the following example:

       SELECT *, CAST(OGR_STYLE AS character(255)) FROM rivers

       Currently casting to the following target types are supported:

       1.  character(field_length)
       2.  float(field_length)
       3.  numeric(field_length, field_precision)
       4.  integer(field_length)
       5.  date(field_length)
       6.  time(field_length)
       7.  timestamp(field_length)
       Specifying the field_length and/or the field_precision is optional.
       Conversion to the ’integer list’, ’double list’ and ’string list’ OGR
       data types are not supported, which doesn’t conform to the SQL92
       specification.
   Field List Limitations
       1.  Field arithmetic, and other binary operators are not supported, so
           you can’t do something like:
       SELECT prop_value / area FROM invoices

       2.  Lots of operators are missing.

   WHERE
       The argument to the WHERE clause is a fairly simplistic logical
       expression used select records to be selected from the source layer. In
       addition to its use within the WHERE statement, the WHERE clause
       handling is also used for OGR attribute queries on regular layers.
       A WHERE clause consists of a set of attribute tests. Each basic test is
       of the form fieldname operator value. The fieldname is any of the
       fields in the source layer. The operator is one of =, !=, <>, <, >, <=,
       >=, LIKE and ILIKE and IN.
       Most of the operators are self explanitory, but is is worth nothing
       that  != is the same as <>, the string equality is case insensitive,
       but the <, >, <= and >= operators are case sensitive. Both the LIKE and
       ILIKE operators are case insensitive.
       The value argument to the LIKE operator is a pattern against which the
       value string is matched. In this pattern percent (%) matches any number
       of characters, and underscore ( _ ) matches any one character.
           String             Pattern       Matches?
           ------             -------       --------
           Alberta            ALB%          Yes
           Alberta            _lberta       Yes
           St. Alberta        _lberta       No
           St. Alberta        %lberta       Yes
           Robarts St.        %Robarts%     Yes
           12345              123%45        Yes
           123.45             12?45         No
           N0N 1P0            %N0N%         Yes
           L4C 5E2            %N0N%         No
       The IN takes a list of values as it’s argument and tests the attribute
       value for membership in the provided set.
           Value              Value Set            Matches?
           ------             -------              --------
           321                IN (456,123)         No
           ’Ontario’          IN (’Ontario’,’BC’)  Yes
           ’Ont’              IN (’Ontario’,’BC’)  No
           1                  IN (0,2,4,6)         No
       In addition to the above binary operators, there are additional
       operators for testing if a field is null or not. These are the IS NULL
       and IS NOT NULL operators.
       Basic field tests can be combined in more complicated predicates using
       logical operators include AND, OR, and the unary logical NOT.
       Subexpressions should be bracketed to make precidence clear. Some more
       complicated predicates are:
       SELECT * FROM poly WHERE (prop_value >= 100000) AND (prop_value < 200000)
       SELECT * FROM poly WHERE NOT (area_code LIKE ’N0N%’)
       SELECT * FROM poly WHERE (prop_value IS NOT NULL) AND (prop_value < 100000)
   WHERE Limitations
       1.  The left of any comparison operator must be a field name, and the
           right must be a literal value. Fields cannot currently be compared
           to fields.
       2.  Fields must all come from the primary table (the one listed in the
           FROM clause, and must not have any table prefix ... they must just
           be the field name.
       3.  No arithmetric operations are supported. You can’t test ’WHERE
           (a+b) < 10’ for instance.
       4.  All string comparisons are case insensitive except for <, >, <= and
           >=.
   ORDER BY
       The ORDER BY clause is used force the returned features to be reordered
       into sorted order (ascending or descending) on one of the field values.
       Ascending (increasing) order is the default if neither the ASC or DESC
       keyword is provided. For example:
       SELECT * FROM property WHERE class_code = 7 ORDER BY prop_value DESC
       SELECT * FROM property ORDER BY prop_value
       SELECT * FROM property ORDER BY prop_value ASC
       SELECT DISTINCT zip_code FROM property ORDER BY zip_code
       Note that ORDER BY clauses cause two passes through the feature set.
       One to build an in-memory table of field values corresponded with
       feature ids, and a second pass to fetch the features by feature id in
       the sorted order. For formats which cannot efficiently randomly read
       features by feature id this can be a very expensive operation.
       Sorting of string field values is case sensitive, not case insensitive
       like in most other parts of OGR SQL.
   JOINs
       OGR SQL supports a limited form of one to one JOIN. This allows records
       from a secondary table to be looked up based on a shared key between it
       and the primary table being queried. For instance, a table of city
       locations might include a nation_id column that can be used as a
       reference into a secondary nation table to fetch a nation name. A
       joined query might look like:
       SELECT city.*, nation.name FROM city
            LEFT JOIN nation ON city.nation_id = nation.id
       This query would result in a table with all the fields from the city
       table, and an additional ’nation.name’ field with the nation name
       pulled from the nation table by looking for the record in the nation
       table that has the ’id’ field with the same value as the city.nation_id
       field.
       Joins introduce a number of additional issues. One is the concept of
       table qualifiers on field names. For instance, referring to
       city.nation_id instead of just nation_id to indicate the nation_id
       field from the city layer. The table name qualifiers may only be used
       in the field list, and within the ON clause of the join.
       Wildcards are also somewhat more involved. All fields from the primary
       table (city in this case) and the secondary table (nation in this case)
       may be selected using the usual * wildcard. But the fields of just one
       of the primary or secondary table may be selected by prefixing the
       asterix with the table name.
       The field names in the resulting query layer will be qualified by the
       table name, if the table name is given as a qualifier in the field
       list. In addition field names will be qualified with a table name if
       they would conflict with earlier fields. For instance, the following
       select would result might result in a results set with a name,
       nation_id, nation.nation_id and  nation.name field if the city and
       nation tables both have the nation_id and name fieldnames.
       SELECT * FROM city LEFT JOIN nation ON city.nation_id = nation.nation_id
       On the other hand if the nation table had a continent_id field, but the
       city table did not, then that field would not need to be qualified in
       the result set. However, if the selected instead looked like the
       following statement, all result fields would be qualified by the table
       name.
       SELECT city.*, nation.* FROM city
           LEFT JOIN nation ON city.nation_id = nation.nation_id
       In the above examples, the nation table was found in the same
       datasource as the city table. However, the OGR join support includes
       the ability to join against a table in a different data source,
       potentially of a different format. This is indicated by qualifying the
       secondary table name with a datasource name. In this case the secondary
       datasource is opened using normal OGR semantics and utilized to access
       the secondary table untill the query result is no longer needed.
       SELECT * FROM city
         LEFT JOIN ’/usr2/data/nation.dbf’.nation ON city.nation_id = nation.nation_id
       While not necessarily very useful, it is also possible to introduce
       table aliases to simplify some SELECT statements. This can also be
       useful to disambiguate situations where ables of the same name are
       being used from different data sources. For instance, if the actual
       tables names were messy we might want to do something like:
       SELECT c.name, n.name FROM project_615_city c
         LEFT JOIN ’/usr2/data/project_615_nation.dbf’.project_615_nation n
                   ON c.nation_id = n.nation_id
       It is possible to do multiple joins in a single query.
       SELECT city.name, prov.name, nation.name FROM city
         LEFT JOIN province ON city.prov_id = province.id
         LEFT JOIN nation ON city.nation_id = nation.id
   JOIN Limitations
       1.  Joins can be very expensive operations if the secondary table is
           not indexed on the key field being used.
       2.  Joined fields may not be used in WHERE clauses, or ORDER BY clauses
           at this time. The join is essentially evaluated after all primary
           table subsetting is complete, and after the ORDER BY pass.
       3.  Joined fields may not be used as keys in later joins. So you could
           not use the province id in a city to lookup the province record,
           and then use a nation id from the province id to lookup the nation
           record. This is a sensible thing to want and could be implemented,
           but is not currently supported.
       4.  Datasource names for joined tables are evaluated relative to the
           current processes working directory, not the path to the primary
           datasource.
       5.  These are not true LEFT or RIGHT joins in the RDBMS sense. Whether
           or not a secondary record exists for the join key or not, one and
           only one copy of the primary record is returned in the result set.
           If a secondary record cannot be found, the secondary derived fields
           will be NULL. If more than one matching secondary field is found
           only the first will be used.

SPECIAL FIELDS

       The OGR SQL query processor treats some of the attributes of the
       features as built-in special fields can be used in the SQL statements
       likewise the other fields. These fields can be placed in the select
       list, the WHERE clause and the ORDER BY clause respectively. The
       special field will not be included in the result by default but it may
       be explicitly included by adding it to the select list. When accessing
       the field values the special fields will take pecedence over the other
       fields with the same names in the data source.
   FID
       Normally the feature id is a special property of a feature and not
       treated as an attribute of the feature. In some cases it is convenient
       to be able to utilize the feature id in queries and result sets as a
       regular field. To do so use the name FID. The field wildcard expansions
       will not include the feature id, but it may be explicitly included
       using a syntax like:
       SELECT FID, * FROM nation
   OGR_GEOMETRY
       Some of the data sources (like MapInfo tab) can handle geometries of
       different types within the same layer. The OGR_GEOMETRY special field
       represents the geometry type returned by OGRGeometry::getGeometryName()
       and can be used to distinguish the various types. By using this field
       one can select particular types of the geometries like:
       SELECT * FROM nation WHERE OGR_GEOMETRY=’POINT’ OR OGR_GEOMETRY=’POLYGON’
   OGR_GEOM_WKT
       The Well Known Text representation of the geometry can also be used as
       a special field. To select the WKT of the geometry OGR_GEOM_WKT might
       be included in the select list, like:
       SELECT OGR_GEOM_WKT, * FROM nation
       Using the OGR_GEOM_WKT and the LIKE operator in the WHERE clause we can
       get similar effect as using OGR_GEOMETRY:
       SELECT OGR_GEOM_WKT, * FROM nation WHERE OGR_GEOM_WKT
          LIKE ’POINT%’ OR OGR_GEOM_WKT LIKE ’POLYGON%’
   OGR_STYLE
       The OGR_STYLE special field represents the style string of the feature
       returned by OGRFeature::GetStyleString(). By using this field and the
       LIKE operator the result of the query can be filtered by the style. For
       example we can select the annotation features as:
       SELECT * FROM nation WHERE OGR_STYLE LIKE ’LABEL%’

CREATE INDEX

       Some OGR SQL drivers support creating of attribute indexes. Currently
       this includes the Shapefile driver. An index accelerates very simple
       attribute queries of the form fieldname = value, which is what is used
       by the JOIN capability. To create an attribute index on the nation_id
       field of the nation table a command like this would be used:
       CREATE INDEX ON nation USING nation_id
   Index Limitations
       1.  Indexes are not maintained dynamically when new features are added
           to or removed from a layer.
       2.  Very long strings (longer than 256 characters?) cannot currently be
           indexed.
       3.  To recreate an index it is necessary to drop all indexes on a layer
           and then recreate all the indexes.
       4.  Indexes are not used in any complex queries. Currently the only
           query the will accelerate is a simple ’field = value’ query.

DROP INDEX

       The OGR SQL DROP INDEX command can be used to drop all indexes on a
       particular table, or just the index for a particular column.
       DROP INDEX ON nation USING nation_id
       DROP INDEX ON nation

ExecuteSQL()

       SQL is executed against an OGRDataSource, not against a specific layer.
       The call looks like this:
       OGRLayer * OGRDataSource::ExecuteSQL( const char *pszSQLCommand,
                                             OGRGeometry *poSpatialFilter,
                                             const char *pszDialect );
       The pszDialect argument is in theory intended to allow for support of
       different command languages against a provider, but for now
       applications should always pass an empty (not NULL) string to get the
       default dialect.
       The poSpatialFilter argument is a geometry used to select a bounding
       rectangle for features to be returned in a manner similar to the
       OGRLayer::SetSpatialFilter() method. It may be NULL for no special
       spatial restriction.
       The result of an ExecuteSQL() call is usually a temporary OGRLayer
       representing the results set from the statement. This is the case for a
       SELECT statement for instance. The returned temporary layer should be
       released with OGRDataSource::ReleaseResultsSet() method when no longer
       needed. Failure to release it before the datasource is destroyed may
       result in a crash.

Non-OGR SQL

       All OGR drivers for database systems: MySQL, PostgreSQL and PostGIS
       (PG), Oracle (OCI), SQLite, ODBC and ESRI Personal Geodatabase (PGeo)
       override the OGRDataSource::ExecuteSQL() function with dedicated
       implementation and, by default, pass the SQL statements directly to the
       underlying RDBMS. In these cases the SQL syntax varies in some
       particulars from OGR SQL. Also, anything possible in SQL can then be
       accomplished for these particular databases. Only the result of SQL
       WHERE statements will be returned as layers.