SGMLS − class for postprocessing the output from the sgmls and nsgmls parsers.
use SGMLS; my $parse = new SGMLS(STDIN); my $event = $parse−>next_event; while ($event) { SWITCH: { ($event−>type eq 'start_element') && do { my $element = $event−>data; # An object of class SGMLS_Element [[your code for the beginning of an element]] last SWITCH; }; ($event−>type eq 'end_element') && do { my $element = $event−>data; # An object of class SGMLS_Element [[your code for the end of an element]] last SWITCH; }; ($event−>type eq 'cdata') && do { my $cdata = $event−>data; # A string [[your code for character data]] last SWITCH; }; ($event−>type eq 'sdata') && do { my $sdata = $event−>data; # A string [[your code for system data]] last SWITCH; }; ($event−>type eq 're') && do { [[your code for a record end]] last SWITCH; }; ($event−>type eq 'pi') && do { my $pi = $event−>data; # A string [[your code for a processing instruction]] last SWITCH; }; ($event−>type eq 'entity') && do { my $entity = $event−>data; # An object of class SGMLS_Entity [[your code for an external entity]] last SWITCH; }; ($event−>type eq 'start_subdoc') && do { my $entity = $event−>data; # An object of class SGMLS_Entity [[your code for the beginning of a subdoc entity]] last SWITCH; }; ($event−>type eq 'end_subdoc') && do { my $entity = $event−>data; # An object of class SGMLS_Entity [[your code for the end of a subdoc entity]] last SWITCH; }; ($event−>type eq 'conforming') && do { [[your code for a conforming document]] last SWITCH; }; die "Internal error: unknown event type " . $event−>type . "\n"; } $event = $parse−>next_event; }
The SGMLS package consists of several related classes: see " SGMLS" , "SGMLS_Event", "SGMLS_Element", "SGMLS_Attribute", "SGMLS_Notation", and "SGMLS_Entity". All of these classes are available when you specify
use SGMLS;
Generally, the only object which you will create explicitly will belong to the "SGMLS" class; all of the others will then be created automatically for you over the course of the parse. Much fuller documentation is available in the ".sgml" files in the "DOC/" directory of the "SGMLS.pm" distribution.
The "SGMLS" class
This class holds a single parse. When you create an instance of it, you specify a file handle as an argument (if you are reading the output of sgmls or nsgmls from a pipe, the file handle will ordinarily be "STDIN"):
my $parse = new SGMLS(STDIN);
The most important method for this class is "next_event", which reads and returns the next major event from the input stream. It is important to note that the "SGMLS" class deals with most ESIS events itself: attributes and entity definitions, for example, are collected and stored automatically and invisibly to the user. The following list contains all of the methods for the "SGMLS" class:
"next_event()": Return an "SGMLS_Event" object containing the next
major event from the SGML parse.
"element()": Return an "SGMLS_Element" object containing the current
element in the document.
"file()": Return a string containing the name of the current SGML
source file (this will work only if the "−l" option was given to sgmls
or nsgmls).
"line()": Return a string containing the current line number from the
source file (this will work only if the "−l" option was given to sgmls
or nsgmls).
"appinfo()": Return a string containing the "APPINFO" parameter (if
any) from the SGML declaration.
"notation(NNAME)": Return an "SGMLS_Notation" object representing the
notation named "NNAME". With newer versions of nsgmls, all notations
are available; otherwise, only the notations which are actually used
will be available.
"entity(ENAME)": Return an "SGMLS_Entity" object representing the
entity named "ENAME". With newer versions of nsgmls, all entities are
available; otherwise, only external data entities and internal entities
used as attribute values will be available.
"ext()": Return a reference to an associative array for user-defined
extensions.
The "SGMLS_Event" class
This class holds a single major event, as generated by the "next_event" method in the "SGMLS" class. It uses the following methods:
"type()": Return a string describing the type of event:
"start_element", "end_element", "cdata", "sdata", "re", "pi", "entity",
"start_subdoc", "end_subdoc", and "conforming". See " SYNOPSIS" , above,
for the values associated with each of these.
"data()": Return the data associated with the current event (if any).
For "start_element" and "end_element", returns an "SGMLS_ELement"
object; for "entity", "start_subdoc", and "end_subdoc", returns an
"SGMLS_Entity" object; for "cdata", "sdata", and "pi", returns a
string; and for "re" and "conforming", returns the empty string. See
" SYNOPSIS" , above, for an example of this method’s use.
"key()": Return a string key to the event, such as an element or entity
name (otherwise, the same as "data()").
"file()": Return the current file name, as in the "SGMLS" class.
"line()": Return the current line number, as in the "SGMLS" class.
"element()": Return the current element, as in the "SGMLS" class.
"parse()": Return the "SGMLS" object which generated the event.
"entity(ENAME)": Look up an entity, as in the "SGMLS" class.
"notation(ENAME)": Look up a notation, as in the "SGMLS" class.
"ext()": Return a reference to an associative array for user-defined
extensions.
The "SGMLS_Element" class
This class is used for elements, and contains all associated information (such as the element’s attributes). It recognises the following methods:
"name()": Return a string containing the name, or Generic Identifier,
of the element, in upper case.
"parent()": Return the "SGMLS_Element" object for the element’s parent
(if any).
"parse()": Return the "SGMLS" object for the current parse.
"attributes()": Return a reference to an associative array of attribute
names and "SGMLS_Attribute" structures. Attribute names will be all in
upper case.
"attribute_names()": Return an array of strings containing the names of
all attributes defined for the current element, in upper case.
"attribute(ANAME)": Return the "SGMLS_Attribute" structure for the
attribute "ANAME".
"set_attribute(ATTRIB)": Add the "SGMLS_Attribute" object "ATTRIB" to
the current element, replacing any other attribute structure with the
same name.
"in(GI)": Return "true" (ie. 1) if the string "GI" is the name of the
current element’s parent, or "false" (ie. 0) if it is not.
"within(GI)": Return "true" (ie. 1) if the string "GI" is the name of
any of the ancestors of the current element, or "false" (ie. 0) if it
is not.
"ext()": Return a reference to an associative array for user-defined
extensions.
The "SGMLS_Attribute" class
Each instance of an attribute for each "SGMLS_Element" is an object belonging to this class, which recognises the following methods:
"name()": Return a string containing the name of the current attribute,
all in upper case.
"type()": Return a string containing the type of the current attribute,
all in upper case. Available types are " IMPLIED", "CDATA", "NOTATION",
"ENTITY", and " TOKEN".
"value()": Return the value of the current attribute, if any. This will
be an empty string if the type is " IMPLIED", a string of some sort if
the type is " CDATA" or " TOKEN" (if it is " TOKEN", you may want to split
the string into a series of separate tokens), an "SGMLS_Notation"
object if the type is " NOTATION", or an "SGMLS_Entity" object if the
type is " ENTITY". Note that if the value is " CDATA", it will not have
escape sequences for 8−bit characters, record ends, or SDATA processed
-- that will be your responsibility.
"is_implied()": Return "true" (ie. 1) if the value of the attribute is
implied, or "false" (ie. 0) if it is specified in the document.
"set_type(TYPE)": Change the type of the attribute to the string "TYPE"
(which should be all in upper case). Available types are " IMPLIED",
"CDATA", "NOTATION", "ENTITY", and " TOKEN".
"set_value(VALUE)": Change the value of the attribute to "VALUE", which
may be a string, an "SGMLS_Entity" object, or an "SGMLS_Notation"
subject, depending on the attribute’s type.
"ext()": Return a reference to an associative array available for
user-defined extensions.
The "SGMLS_Notation" class
All declared notations appear as objects belonging to this class, which recognises the following methods:
"name()": Return a string containing the name of the notation.
"sysid()": Return a string containing the system identifier of the
notation, if any.
"pubid()": Return a string containing the public identifier of the
notation, if any.
"ext()": Return a reference to an associative array available for
user-defined extensions.
The "SGMLS_Entity" class
All declared entities appear as objects belonging to this class, which recognises the following methods:
"name()": Return a string containing the name of the entity, in mixed
case.
"type()": Return a string containing the type of the entity, in upper
case. Available types are " CDATA", "SDATA", "NDATA" (external entities
only), " SUBDOC", "PI" (newer versions of nsgmls only), or " TEXT" (newer
versions of nsgmls only).
"value()": Return a string containing the value of the entity, if it is
internal.
"sysid()": Return a string containing the system identifier of the
entity (if any), if it is external.
"pubid()": Return a string containing the public identifier of the
entity (if any), if it is external.
"filenames()": Return an array of strings containing any file names
generated from the identifiers, if the entity is external.
"notation()": Return the "SGMLS_Notation" object associated with the
entity, if it is external.
"data_attributes()": Return a reference to an associative array of data
attribute names (in upper case) and the associated "SGMLS_Attribute"
objects for the current entity.
"data_attribute_names()": Return an array of data attribute names (in
upper case) for the current entity.
"data_attribute(ANAME)": Return the "SGMLS_Attribute" object for the
data attribute named "ANAME" for the current entity.
"set_data_attribute(ATTRIB)": Add the "SGMLS_Attribute" object "ATTRIB"
to the current entity, replacing any other data attribute with the same
name.
"ext()": Return a reference to an associative array for user-defined
extensions.
Copyright 1994 and 1995 by David Megginson, "dmeggins AT aix1 DOT uottawa DOT ca". Distributed under the terms of the Gnu General Public License (version 2, 1991) -- see the file "COPYING" which is included in the SGMLS .pm distribution.
SGMLS::Output and SGMLS::Refs.