Make NSXMLParser your friend..

As promised, here is a little How-I-did-it / How-To.

First off: I am not an experienced SAX-User.. So this approach might be packing the problem at it’s tail, but this is how DOM-Users feel comfortable with ;)

Let’s assume we want to parse the following XML:

tranist.xml

<root>
    <schedules>
        <schedule id="0">
            <from>SourceA</from>
            <to>DestinationA</to>
            <links>
                <link id="0">
                    <departure>2008-01-01 01:01</departure>
                    <arrival>2008-01-01 01:02</arrival>
                    <info>With food</info>
                    <parts>
                        <part id="0">
                            <departure>2008-01-01 01:01</departure>
                            <arrival>2008-01-01 01:02</arrival>
                            <vehicle>Walk</vehicle>
                        </part>
                        <part id="1">
                            <departure>2008-01-01 01:01</departure>
                            <arrival>2008-01-01 01:02</arrival>
                            <trackfrom>1</trackfrom>
                            <trackto>2</trackto>
                            <vehicle>Train</vehicle>
                        </part>
                    </parts>
                </link>
                <link id="1">
                    ...
                </link>
                <link id="2">
                    ...
                </link>
            </links>
        </schedule>
        <schedule id="1">
            ...
        </schedule>
        <schedule id="2">
            ...
        </schedule>
    </schedules>
</root>

In human readable format, this means: We have multiple schedules with from/to etc. These schedules consist of multiple links (different connections for the same route) with departure/arrival etc. These links consist then of multiple parts/sections with various elements which are not sure to be there..

With the let’s find the element called ‘part’ - approach, you won’t get anywhere..

The Basics

So what do we want to achieve? We want a list/array of Schedules, which have the given members. On member is a list/array of Links, also consisting of the given members and a list/array of parts with the respective members.

This is also the basic idea behind my approach: for every new node-container, use a new class/object (an array will also work, but it’s kinda crap..)

Now we have a Schedule class, a Link class and a Part class.

This is an example of the Link class interface:

Link.h

#import "Part.h"

@interface Link : NSObject {
    NSString *departure;
    NSString *arrival;
    NSString *info;
    NSMutableArray *parts;
}

@property (nonatomic, retain) NSString *departure;
@property (nonatomic, retain) NSString *arrival;
@property (nonatomic, retain) NSString *info;
@property (readonly, retain) NSMutableArray *parts;

- (void)addPart:(Part *)part;

@end

We use an accessor method for the parts, because it just feels better when dealing with arrays. (Instead of later using [foo.myArray addObject:..] we have [foo addMe:..])

Also we make it easier for us, using retain properties..

The Parser setup

A short introduction into SAX:

The parsing goes node by node and is not nesting-sensitive. That means that first we get root, then schedules, then schedule, then from, then to, then links, then link, then departure etc. As soon as the parser returns you the node for example, you don’t know anymore in what schedule you were. As long as you have a clearly defined structure where always every element must be present, you could do this using a counter, but as soon as you have multiple nodes with no defined count, you have a problem.

What we do is known as recursive parsing. What does this mean? We implement some kind of memory.

In our parser, we have 4 members and 1 method (to make actual use of the parser..):

@property (nonatomic, retain) NSMutableString *currentProperty;
@property (nonatomic, retain) Schedule *currentSchedule;
@property (nonatomic, retain) Link *currentLink;
@property (nonatomic, retain) Part *currentPart;
@property (nonatomic, readonly) NSMutableArray *schedules;

- (void)parseScheduleData:(NSData *)data parseError:(NSError **)error;

(Yes, this needs to be a NSMutableString..)

Your parseScheduleData method should look similar to the following:

parseJourneyData

- (void)parseJourneyData:(NSData *)data parseError:(NSError **)err {
    NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data];

    self.schedules = [[NSMutableArray alloc] init]; // Create our scheduler list

    [parser setDelegate:self]; // The parser calls methods in this class
    [parser setShouldProcessNamespaces:NO]; // We don't care about namespaces
    [parser setShouldReportNamespacePrefixes:NO]; //
    [parser setShouldResolveExternalEntities:NO]; // We just want data, no other stuff

    [parser parse]; // Parse that data..

    if (err && [parser parserError]) {
        *err = [parser parserError];
    }

    [parser release];
}

Now we need those delegate methods.

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string

This function is called by the parser, when it reads something between nodes. (Text that is..) Like with blah it would read “blah”. It is possible, that this method is called multiple times in one node. As you will see later, we define the property “currentProperty” only if we find a node, we care about. That’s why we test it against this property to make sure, that we need this property. This will then look something like this:

Parser

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
    if (self.currentProperty) {
        [currentProperty appendString:string];
    }
}

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict

This is called, when the parser finds an opening element. In this case, we have a few cases, we need to distinguish. These are:

It’s standard property in the schedule (like <form> etc.) or it’s a deeper nested node (like <links>), the same for all the other nodes.

How to? We define, that we only set a member, if we are in that node. That means, only when we have entered a <part>, then currentPart is set, otherwise it’s nil. The same with the others.

We do then need to check them in reverse order of their nesting level.. Why? Because if we would check for currentLink before currentPart, currentLink would also evaluate to YES/True and hence we will have a problem if their are elements with the same name. If we aren’t in any node, then there is probably a new main node comming -> in the else..

When we hit a nested node, we need to allocate the respective member of our class, so we can use it when the parser gets deeper into it.

This will look like this:

Parser

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName attributes:(NSDictionary *)attributeDict {
    if (qName) {
        elementName = qName;
    }

    if (self.currentPart) { // Are we in a
        // Check for standard nodes
        if ([elementName isEqualToString:@"departure"] || [elementName isEqualToString:@"arrival"] || [elementName isEqualToString:@"vehicle"] || [elementName isEqualToString:@"trackfrom"] || [elementName isEqualToString:@"trackto"] ) {
            self.currentProperty = [NSMutableString string];
        }
    } else if (self.currentLink) { // Are we in a
        // Check for standard nodes
        if ([elementName isEqualToString:@"departure"] || [elementName isEqualToString:@"arrival"] || [elementName isEqualToString:@"info"]) {
            self.currentProperty = [NSMutableString string];
        // Check for deeper nested node
        } else if ([elementName isEqualToString:@"part"]) {
            self.currentPart = [[Part alloc] init]; // Create the element
        }
    } else if (self.currentSchedule) { // Are we in a  ?
        // Check for standard nodes
        if ([elementName isEqualToString:@"from"] || [elementName isEqualToString:@"to"]) {
            self.currentProperty = [NSMutableString string];
        // Check for deeper nested node
        } else if ([elementName isEqualToString:@"link"]) {
            self.currentLink = [[Link alloc] init]; // Create the element
        }
    } else { // We are outside of everything, so we need a
        // Check for deeper nested node
        if ([elementName isEqualToString:@"schedule"]) {
            self.currentSchedule = [[Schedule alloc] init];
        }
    }
}

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName

Basically, the same things apply as for didStartElement above. This time, we need to clean things up and assign them if they are set :) This is a bit a pitty, since it’s a lot of code.. *(for not so much)

It’s the same checker-structure..

If we are in a deeper nested node (like <Link>) and we hit an ending element of that nested node (like </Link>), Then we need to add this element to the parent (like <Schedule>) and set it to nil

See yourself:

Parser

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qName {
    if (qName) {
        elementName = qName;
    }

    if (self.currentPart) { // Are we in a
        // Check for standard nodes
        if ([elementName isEqualToString:@"departure"]) {
            self.currentPart.departure = self.currentProperty;
        } else if ([elementName isEqualToString:@"arrival"]) {
            self.currentPart.arrival = self.currentProperty;
        } else if ([elementName isEqualToString:@"vehicle"]) {
            self.currentPart.vehicle = self.currentProperty;
        } else if ([elementName isEqualToString:@"trackfrom"]) {
            self.currentPart.trackfrom = self.currentProperty;
        } else if ([elementName isEqualToString:@"trackto"]) {
            self.currentPart.trackto = self.currentProperty;
        // Are we at the end?
        } else if ([elementName isEqualToString:@"part"]) {
            [currentLink addPart:self.currentPart]; // Add to parent
            self.currentPart = nil; // Set nil
        }
    } else if (self.currentLink) { // Are we in a
        // Check for standard nodes
        if ([elementName isEqualToString:@"departure"]) {
            self.currentLink.departure = self.currentProperty;
        } else if ([elementName isEqualToString:@"arrival"]) {
            self.currentLink.arrival = self.currentProperty;
        } else if ([elementName isEqualToString:@"info"]) {
            self.currentLink.info = self.currentProperty;
        // Are we at the end?
        } else if ([elementName isEqualToString:@"link"]) {
            [currentSchedule addPart:self.currentLink]; // Add to parent
            self.currentLink = nil; // Set nil
        }
    } else if (self.currentSchedule) { // Are we in a  ?
        // Check for standard nodes
        if ([elementName isEqualToString:@"from"]) {
            self.currentSchedule.from = self.currentProperty;
        } else if ([elementName isEqualToString:@"to"]) {
            self.currentSchedule.to = self.currentProperty;
        // Are we at the end?
        } else if ([elementName isEqualToString:@"link"]) {
            [schedules addObject:self.currentSchedule]; // Add to the result node
            self.currentSchedule = nil; // Set nil
        }
    }

    // We reset the currentProperty, for the next textnodes..
    self.currentProperty = nil;
}

Finally..

Well, that’s it. You can expand / shrink this principle as you like. You can also add a maxElements counter, like in the SeismicXML example of the iPhone SDK to get only a certain number of elements. You can abort the parser with [parser abortParsing]; It is important, that you don’t abort while in a deeper nested node, because this could lead to inconsistencies. You will need to skip them..

Please note, that I wrote this, while watching TV, so you may need to fix some syntax errors ;) But I hope you get the idea..

Comments

BW
August 4, 2008 at 2:56 pm

I think I’m missing something, at what point is the external XML files called?

Marc
August 4, 2008 at 3:41 pm

Right here:
- (void)parseJourneyData:(NSData *)data parseError:(NSError **)err {
NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data];

You can also load it with “initWithContentsOfURL” or with a string with [NSData initWithByes:length:] (and provide the xml string) or with [NSData initWithContentsOfFile]

Marc
August 4, 2008 at 3:43 pm

And this file is included in your controller/wherever class, where you downloaded and need your data. You then call this parser with
[myParser parseJourneyData:myData parseError:&err];

David
August 11, 2008 at 9:35 am

As far as I can see you are not releasing properly; as in e.g.

self.currentPart = [[Part alloc] init]; // Create the element

the Part instance should be relased just after the assignment because the currentPart property has a retain attribute.

marc
August 11, 2008 at 4:50 pm

David: You are absolutely right. I actually copied the property part from some real code and wrote the other in the blog editor. Will correct it, as soon as I find some time. Thanks!

matt
August 25, 2008 at 1:25 pm

Thanks for this excellent tutorial - this is exactly what I’ve been searching for.

Joseph Crawford
September 29, 2008 at 9:45 pm

Thanks for this write up. I am just starting out with Cocoa and even driven xml but will be sure to give this a second read tomorrow.

Any particular reason you used the dot syntax?

Agustin
October 2, 2008 at 2:05 pm

@Marc:

Do you have the code please??

is giving me an error

yile lv
October 7, 2008 at 6:50 am

Thanks so much, I got the link from a friend, I will have a try~


© 2008 some rights reserved by codesofa
Design by Stefan Sicher
Powered by WordPress