         ================================================
         generateDS.py -- Generate Python Data Structures
         ================================================

----------
What is it
----------

generateDS.py generates Python data structures from an Xschema
document.  It generates a file containing: (1) a Python class for
each element definition and (2) parsers (which use the Python
minidom module) for XML documents that satisfy the Xschema
document.  The class definitions contain:

- A constructor with initializers for member variables.

- Get and set methods for member variables.

- A 'build' method used during parsing to populate and instance.

- An 'export' method that will re-create the XML element in an XML
  document.

- An 'exportLiteral' method that will write out a text (literal)
  Python data structure that represents the content of the XML
  document.


---------------------------
How to build and install it
---------------------------

Newer versions of Python have XML support in the Python standard
library.  For older versions of Python, install PyXML.  You can
find it at:

    http://pyxml.sourceforge.net/

De-compress the generateDS distribution file.  Use something like
the following:
 
    tar xzvf generateDS-1.5a.tar.gz

Then, the regular Distutils commands should work:

    python setup.py build
    python setup.py install


-------------
How to use it
-------------

See generateDS.html for documentation.

Produce class definitions and sub-class definitions with something
like the following:

    $ python generateDS.py -o people.py -s people_subs.py people.xsd

Here is a test using the enclosed sample Xschema file:

    $ python generateDS.py -o people.py people.xsd
    $ python people.py people.xml


----------------
More information
----------------

More information on generateDS.py is in generateDS.html.

There is more information on PyXML at:

    http://pyxml.sourceforge.net/
    http://www.python.org/sigs/xml-sig/


-----------
Limitations
-----------

XML Schema limitations -- There are lots of things in Xschema that
are not supported.  You will have to use a restricted sub-set of
Xschema to define your data structures.  See the documentation
(generateDS.html) for supported features.  See people.xsd and
people.xml for examples.

Mixed content -- generateDS.py generates a parser and data
structures that do not handle or represent mixed content.  Here is
an example of mixed content:

    <note>This is a <bold>nice</bold> comment.</note>

My only, and some what feeble, excuse for this is that
generateDS.py is intended for structured data rather than marked
up text.  However, whether my excuse is a good one or a feeble
one, you should be warned that if you anticipate needing mixed
text, do *not* use generateDS.py.

Large documents -- The parser generated by generateDS.py uses
minidom.  This means that the entire XML document must be read and
a DOM tree constructed in memory.  In addition, the data
structures generated by generateDS.py must occupy memory.  This
means that generateDS.py is not well-suited for applications that
read large XML documents, although what "large" means depends on
your hardware.  Notice that the parsing functions (parse() and
parseString()) over-write the variable doc so as to enable Python
to reclaim the space occupied by the DOM tree, which may help
alleviate the memory problem to some extent.


-------
License
-------

Copyright (c) 2002 Dave Kuhlman

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


--------------
Change history
--------------

Version 1.11b (11/19/2007)
  * Fixed bug that caused an infinite loop when a class has a
    simple type as a base/super class.
  * Added additional simple types to the list of recognized simple
    types.  For a list of simple types, see:
    http://www.w3.org/TR/xmlschema-0/#SimpleTypeFacets
  * Added additional Python keywords to list of transformed names.
    See global variable NameTable.

Version 1.11a (10/11/2007)
  * Various added features contributed by Chris Allan.  For more
    information see:
    http://www.rexx.com/~dkuhlman/generateDS.html#additional-features

Version 1.10a (08/21/2007, again)
  * Added xs:int basic type.  Handle same as xs:integer.
  * Generate tests so that for elements declared with
    minOccurs="0" and maxOccurs="1" and empty value, then
    export does not generate output.

Version 1.10a (05/11/2007)
  * Added support for user methods.  See section "User Methods" in
    the documentation.

Version 1.9a (03/21/2007, again)
  * Added process_includes.py which can be used as a pre-processor
    to process include elements and create an XML Schema document
    containing all included content.
  * Modified generateDS.py so that it will read its input from a
    pipe when given the command line argument "-" (dash).

Version 1.9a (02/13/2007, again)
  * Changed naming of getter and setter methods.  Default is to
    use get_var() and set_var() instead of getVar() and setVar().
    The old behavior is available using the flag
    --use-old-getter-setter.

Version 1.9a (01/30/2007, again)
  * Fix so that validator methods for simpleType are also
    generated when the <xs:simpleType> occurs within an
    <xs:element>.

Version 1.9a (12/04/2006, again)
  * Fixed errors (occuring on import of superclass module) when 
    an element is defined as an extension of an element that is
    defined as a simpleType restriction on an xs:string.

Version 1.9a (11/27/2006, again)
  * Fix for elements that have attributes and no nested children.
    Eliminated writing out new line chars in export methods.

Version 1.9a (10/22/2006, again)
  * Fix to capture text content of nodes defined with attributes
    but with no nested elements into member varialbe valueOf_.

Version 1.9a (10/10/2006)
  * Added minimal support for simpleType.
  * Generate stubs for and calls to validator methods for
    simpleType.
  * Retrieve bodies for validator methods for simpleTypes from
    files in a directory specified with the --validator-bodies
    command line flag.

Version 1.8d (10/4/2006, again)
  * Fixed several errors related to anyAttribute.  It was
    generating bad code if an element was defined with
    anyAttribute but had no other attributes.  And, in the same
    situation, it was not generating export code properly.

Version 1.8d (7/26/2006, again)
  * Allowed dot/period as special character in element tags/names.
  * Fixed several errors in generation of export and exportLiteral
    functions.  Special names (e.g. 'type', 'class') were not
    being mapped to special spellings (e.g. 'ttype', 'klass', ).
  * Fixed error in determining ExplicitDefine, which was
    preventing export of some objects.

Version 1.8d (7/19/2006, again)
  * Added support for empty elements, i.e. elements that have no
    children and no attributes.  Formerly, they were ignored due
    to a quirk in logic.

Version 1.8d (4/13/2006) 
  * Added support for the following simple types: duration, anyURI
    and unsignedShort.  They are coerced to (and treated the same
    as) xs:string, xs:string, and xs:integer, respectively

Version 1.8c (12/22/2005, again)
  * Fixed use of mapped names in generateExportLiteralFn().

Version 1.8c (12/20/2005, again)
  * Fix to generation of getters and setters for attributes.
    Formerly generating accessors that handled *lists* of attribute
    values.

Version 1.8c (12/15/2005, again)
  * Fix generated code so that it uses documentElement instead of
    childNodes[0] to get the root element.

Version 1.8c (5/10/2005, again)
  * Patch for <xs:attribute ref="xxxx"/> -- Use the value of ref
    as the name of the attribute.  I'm not sure whether this is
    correct in all situations.
  * Fix for generation of ctor for mixed type elements.  Before
    this fix, generateDS.py was failing to generate the
    initializers in the __init__ method signature.
  * Fix for generation of "class" declaration for extension
    classes whose base class name is qualified with a namespace
    (e.g., <xs:extension base="iodef:TextAbstractType">).  Removed
    the namespace.  This fix also corrected the order of
    generation of classes so that the base class is now correctly
    generated *before* the subclass.

Version 1.8c (4/26/2005)
  * Added support for several simple types: xs:token, xs:short,
    xs:long, xs:positiveInteger, xs:negativeInteger,
    xs:long, xs:nonPositiveInteger, xs:nonNegativeInteger,
    xs:date.
  * Fixed error produced when an element definition inherits from
    a simple type.

Version 1.8b (2/25/2005)
  * Added support for anyAttribute.

Version 1.8a (2/23/05, again)
  * Fixed incorrect generation of name and type for export
    functions for root element.
  * Fixed reference to root element type when root element name
    and type are different.

Version 1.8a (1/13/05, again)
  * Fixed incorrect handling of extension of in-line element
    definition.
  * Code cleanup in support of the above.

Version 1.8a (12/22/04)
  * Added support for attributeGroup.  Enables an XML Schema to
    define attribute groups and then include them in
    element/complexType definition.
  * Added support for substitutionGroup.  Enables use any of a set
    of element types as alternatives to another element type.
    Limitation: Does not work with simple element types.

Version 1.7b (11/15/04)
  * From an XML Schema, it is not possible to determine the
    outer-most element in instance documents.  generateDS.py now
    generates a parser (parseSelect) that first uses a small SAX
    parser to determine the outer-most element in the input
    document, then invokes the normal parser with that element as
    the root.

Version 1.7a (10/28/04)
  Thanks very much to Lloyd Kvam for help with these fixes and
  improvements.  His ideas, suggestions, and work have been
  extremely valuable.
  * Implementd partial support for <xsd:extension base="">.
    Limitation: extension elements cannot override members
    defined in a base.
  * Refactored generated methods export and build, so that they
    can be called by subclasses.
  * The generated method exportLiteral has been left behind during
    recent work.  Brought it up-to-date.
  * For Python, a super-class must be defined before the
    sub-classes that reference it.  Implemented a delaying
    mechanism that enforces this ordering of generation of classes.
  * Fixed a bug that occurred when an element is defined with
    maxOccurs given a value other than "1" or "unbounded".

Version 1.6d (10/1/04)
  * Several bug fixes.
  * Added command-line flag --subclass-suffix="X".  Changes the
    suffix appended to the class name in subclass files.  Default
    if omitted is "Sub".
  * Added an underscore to certain local variables to avoid name
    conflicts.
  * Thanks to Lloyd Kvam for help with this release.  Lloyd found
    and fixed a number of these problems.
  * Added command-line flag "--subclass-suffix", which specifies
    the suffix to be added to class names in subclass files.
    Default is "Sub".
  * Added command-line flag "--root-element", which makes a
    specified element name the assumed root of instance documents
  * In some schemas, attributes on a nested <complexType> pertain
    to the containing <element>.  Added code to copy the
    attributes from the <complexType> to the <element>, if it is
    nested.

Version 1.6c (9/15/04)
  * generateDS.py was not walking lower levels of the tree data
    structure collected by the SAX parser that describes the
    classes to be generated.  Now, function generate() calls
    function generateFromTree() to recursively walk lower levels
    of this tree structure.
  * Fixed various errors that were introduced or uncovered by the
    above change.
  * Strengthen handling of mixed content.  When an element
    definition (<element> or <complexType>) contains the attribute
    "mixed=" with a true value, then we generate the code for text
    content, e.g. getValue(), setValue(), capture value in
    build(), etc.

Version 1.6b (9/10/04, yet again)
  * Still fixing bug related to generating all the sub-class
    stubs.  All sub-classes were not being generated when no
    superclasses were generated (-o flag omitted), because there
    are data structures that are created when superclasses are
    generated and which are needed during sub-class generation.
    Now we *always* write out super-classes, but write them to a
    temp file if they are not requested.

Version 1.6b (8/26/04, again)
  * Fixed bug -- complexTypes defined in-line were omitted from the
    sub-class file.  Now these sub-classes are being generated.

Version 1.6b (8/18/04)
  * Added ability to access the text content of elements that are
    defined but have *no* nested elements.  The member variable is
    "valueOf_" (note underscore which will hopefully avoid name
    conflicts) and the getter and setter methods are "getValueOf_"
    and "setValueOf_".
  * Fixes to generation of exportLiteral methods.  Formerly,
    export of attributes was omitted.
  * Removed un-used function that contained "yield" statement,
    which caused problems with older versions of Python.

Version 1.6a (7/23/04, again)
  * Added optional generation of new style classes with
    properties.  This is experimental and, admittedly, not very
    useful, as the property functions are simple getters and
    setters.  Maybe someday ...  Use the "-m" flag to see the
    resulting code.

Version 1.6a (7/9/04, again)
  * Minor fixes.  Replaced dashes in names used as attributes (see
    cleanupName().

Version 1.6a (7/6/04, again)
  * For XMLBehaviors, implemented ability to retrieve
    implementation bodies for behaviors and for ancillaries
    (pre-conditions and post-conditions) from a Web address (URL).

Version 1.6a (6/30/04)
  * Added generation of behaviors.  An XML document can be used to
    specify behaviors (methods) to be added to generated sub-class
    files, including DBC (design by contract) pre- and
    post-condition tests.  See generateDS.html for more
    information on XMLBehaviors.

Version 1.5b (6/20/04, again)
  * Fixed handling namespace prefix in the XMLSchema file itself.
    generateDS.py now attempts to pick-up the namespace prefix
    (alias) from the "xmlns:" attribute on the "schema" element.

Version 1.5b (5/7/04)
  * Fixed several minor problems related to XML namespaces.
    Namespace prefix ignored when creating Python names (e.g. of
    classes and namespace prefix ignored during parsing.  That's
    about the best I know to do right now.
  * Fixed problems in generating code for names containing dashes.
    Now using underscore in place of dashes for Python names.

Version 1.5a (3/24/04)
  * Added keyword arguments to the generated factory functions.
  * Added generation of method "exportLiteral" and related support
    to export elements/instances to Python data structure
    literals.

Version 1.4c (3/10/04)
  * Element <complexType> in XSchema file not handled correctly.
    Fixed this so that when <complexType> is at top level, it will
    be handled the same way that an <element> is handled.  Note:
    We still have problems with <complexType> elements that are
    more deeply nested.

Version 1.4c (3/8/04)
  * Added ability to pass namespace abbreviation from the command
    line.  For example, the "-a" option enables you to replace
    "xs:" with "xsd:".

Version 1.4b (9/30/03, again)
  * Removed dependence on PyXML.  Will now import XML support from
    PyXML, if it is available, but if not, imports from the
    Python standard library.

Version 1.4b (9/30/03)
  * Fixed name conflict in factory function (added underscore).
  * Added generation of saxParseString function (parse string, not
    file/URL).
  * Fixed error -- ome constructors not using factory.

Version 1.4a (9/17/03)
  * Added generation of a SAX parser.

Version 1.3c (9/11/03)
  * Fixed problem caused by shared content model, i.e. when a
    field (content) is declared with a complex type and the name
    and the type are different.  The fix enabled the field name
    and the type of the object in that field to be different.

Version 1.3b (9/9/03)
  * Fixed error when a separate xs:element declaration is used for
    elements declared with a simple type.

Version 1.3a (8/18/03)
  * Removed YAML support.
  * Fixed error in name generation in generateBuildFn().
  * Various fixes and cleanup in tests/ and Demo/.

Version 1.2a (again, 5/16/03)
  * Fixed error in code generation for boolean attributes.
  * Fixed error in code generation for float values.
  * Added very simple unit tests in tests directory.  Can be run
    with:
        cd tests
        python test.py

Version 1.2a (3/14/03)
  * Added support for XML Schema xs:double and xs:boolean types.

Version 1.1a (8/13/02)
  * Added ability to generate subclass stubs for user method
    implementation.
  * A bit of clean-up to the command line options.

Version 1.0a (3/15/02)
  * Initial release


-----
To do
-----

The following enhancements and fixes remain to be done:

- Nested (and sometimes anonymous) types -- In some cases, instead
  of defining a type at top level (with a type name) and then
  referencing that type inside another type definition, the
  definition itself can be nested inside another type defintion.
  generateDS.py does not handle this.  In effect, generateDS.py
  requires that you declare all type definitions at top level and
  that you name them (give them a "name" attribute.  A future
  enhancement would be to enable generateDS.py to handle nested
  element/type definitions, which may, optionally, be anonymous
  (i.e. have no "name" attribute.

- The <sequence> element can have "minOccurs" and "maxOccurs"
  attributes.  I'm guessing, but am not sure that this specifies
  repeated groups.  For example, the following:

      <xs:sequence minOccurs="0" maxOccurs="unbounded">
          <xs:element name="description" type="xs:string"/>
          <xs:element name="size" type="xs:integer"/>
      </xs:sequence>

  specifies that we can have any number of pairs of elements
  "description" and "size".  A future enhancement to generateDS.py
  would enable us to specify and enforce this restriction.

- And so many more complexities in the XSchema specifications.


Dave Kuhlman
dkuhlman@rexx.com
http://www.rexx.com/~dkuhlman

