DekGenius.com
Previous Section  < Day Day Up >  Next Section

11.1 Parsing an XML Document

PHP 5's new SimpleXML module makes parsing an XML document, well, simple. It turns an XML document into an object that provides structured access to the XML.

To create a SimpleXML object from an XML document stored in a string, pass the string to simplexml_load_string( ). It returns a SimpleXML object. In Example 11-3, $channel holds XML that represents the <channel> part of an RSS 0.91 feed.

Example 11-3. Parsing XML in a string
$channel =<<<_XML_
<channel>
 <title>What's For Dinner</title>
 <link>http://menu.example.com/</link>
 <description>These are your choices of what to eat tonight.</description>
</channel>
_XML_;

$xml = simplexml_load_string($channel);

The contents of XML elements are available as the data stored in the SimpleXML object. Example 11-4 prints some data inside the $xml object created in Example 11-3.

Example 11-4. Printing XML element contents
print "The $xml->title channel is available at $xml->link. ";
print "The description is \"$xml->description\"";

Example 11-4 prints:

The What's For Dinner channel is available at http://menu.example.com/. The 
description is "These are your choices of what to eat tonight."

To descend into the hierarchy of XML elements, chain together the element names with arrows. Example 11-5 loads a full RSS feed into a SimpleXML object and prints channel information.

Example 11-5. Printing subelement contents
$menu=<<<_XML_
<?xml version="1.0" encoding="utf-8" ?>
<rss version="0.91">
 <channel>
  <title>What's For Dinner</title>
  <link>http://menu.example.com/</link>
  <description>These are your choices of what to eat tonight.</description>
  <item>
   <title>Braised Sea Cucumber</title>
   <link>http://menu.example.com/dishes.php?dish=cuke</link>
   <description>Gentle flavors of the sea that nourish and refresh you.</description>
  </item>
  <item>
   <title>Baked Giblets with Salt</title>
   <link>http://menu.example.com/dishes.php?dish=giblets</link>
   <description>Rich giblet flavor infused with salt and spice.</description>
  </item>
  <item>
   <title>Abalone with Marrow and Duck Feet</title>
   <link>http://menu.example.com/dishes.php?dish=abalone</link>
   <description>There's no mistaking the special pleasure of abalone.</description>
  </item>
 </channel>
</rss>
_XML_;

$xml = simplexml_load_string($menu);

print "The {$xml->channel->title} channel is available at {$xml->channel->link}. ";
print "The description is \"{$xml->channel->description}\"";

Example 11-5 prints the same text as Example 11-4. The curly braces are necessary around the element names so that the PHP interpreter can properly interpolate the values in the string.

Attributes of XML elements are treated like array indices. Example 11-6 uses the SimpleXML object created in Example 11-5 to access the version attribute of the <rss> tag.

Example 11-6. Print XML element attributes
print 'This RSS feed is version ' . $xml['version'];

Example 11-6 prints:

This RSS feed is version 0.91

Because there are multiple <item> tags in the RSS feed, you need to use array index notation to access a particular item. The first is item[0]. Example 11-7 prints the title of each item.

Example 11-7. Accessing identically named elements
print "Title: " . $xml->channel->item[0]->title . "\n";
print "Title: " . $xml->channel->item[1]->title . "\n";
print "Title: " . $xml->channel->item[2]->title . "\n";

Example 11-7 prints:

Title: Braised Sea Cucumber
Title: Baked Giblets with Salt
Title: Abalone with Marrow and Duck Feet

You can treat the items as an array with a foreach( ) loop. Example 11-8 iterates through the items with foreach( ) to print the titles.

Example 11-8. Looping through identically named elements with foreach( )
foreach ($xml->channel->item as $item) {
    print "Title: " . $item->title . "\n";
}

Example 11-8 prints the same text as Example 11-7.

In addition to groups of the same element (such as <item>), you can also use foreach( ) with any individual SimpleXML object. This is an easy way to iterate through all the children of a particular element. Example 11-9 prints all the children of the first <item> in the RSS feed.

Example 11-9. Looping through child elements with foreach( )
foreach ($xml->channel->item[0] as $element_name => $content) {
    print "The $element_name is $content\n";
}

Example 11-9 prints:

The title is Braised Sea Cucumber
The link is http://menu.example.com/dishes.php?dish=cuke
The description is Gentle flavors of the sea that nourish and refresh you.

Each time the PHP interpreter goes through the foreach( ) loop in Example 11-9, it sets $element_name to the name of an child element and $content to the text contents of that child element.

To change an element or an attribute, assign a new value to it. Example 11-10 changes the version attribute of the <rss> tag, uppercases the title of the channel, and replaces the hostname in each item's <link>.

Example 11-10. Changing elements and attributes
$xml['version'] = '6.3';
$xml->channel->title = strtoupper($xml->channel->title);

for ($i = 0; $i < 3; $i++) {
    $xml->channel->item[$i]->link = str_replace('menu.example.com',
        'dinner.example.org', $xml->channel->item[$i]->link);
}

You've seen how to print individual parts of the SimpleXML object. To print everything in the object as an XML document, use the asXML( ) method. Example 11-11 prints the RSS document we've been working with after its Example 11-10 modifications.

Example 11-11. Printing an entire XML document
print $xml->asXML( );

Example 11-11 prints:

<?xml version="1.0" encoding="utf-8"?>
<rss version="6.3">
 <channel>
  <title>WHAT'S FOR DINNER</title>
  <link>http://menu.example.com/</link>
  <description>These are your choices of what to eat tonight.</description>
 </channel>
 <item>
  <title>Braised Sea Cucumber</title>
  <link>http://dinner.example.org/dishes.php?dish=cuke</link>
  <description>Gentle flavors of the sea that nourish and refresh you.</description>
 </item>
 <item>
  <title>Baked Giblets with Salt</title>
  <link>http://dinner.example.org/dishes.php?dish=giblets</link>
  <description>Rich giblet flavor infused with salt and spice.</description>
 </item>
 <item>
  <title>Abalone with Marrow and Duck Feet</title>
  <link>http://dinner.example.org/dishes.php?dish=abalone</link>
  <description>There's no mistaking the special pleasure of abalone.</description>
 </item>
</rss>

Similar to sending a CSV file (as in Example 10-15), to send a page that consists only of XML back to a web client, you have to send a special header. Example 11-12 shows how to call the header( ) function with the appropriate argument. For an XML document, you need only to specify a Content-Type with header( ). You don't need the second call to header( ) for Content-Disposition, as in Example 10-14.

Example 11-12. Changing the page type to XML
header('Content-Type: text/xml');

As with setcookie( ) and session_start( ), you must call header( ) before any output is sent (or you must use output buffering). Example 11-13 is a complete program that sends a header and then uses SimpleXML to load an XML document from a string, modify it, and print it.

Example 11-13. Sending an XML document to the web client
<?php
$menu=<<<_XML_
<?xml version="1.0" encoding="utf-8" ?>
<rss version="0.91">
 <channel>
  <title>What's For Dinner</title>
  <link>http://menu.example.com/</link>
  <description>These are your choices of what to eat tonight.</description>
  <item>
   <title>Braised Sea Cucumber</title>
   <link>http://menu.example.com/dishes.php?dish=cuke</link>
   <description>Gentle flavors of the sea that nourish and refresh you.</description>
  </item>
  <item>
   <title>Baked Giblets with Salt</title>
   <link>http://menu.example.com/dishes.php?dish=giblets</link>
   <description>Rich giblet flavor infused with salt and spice.</description>
  </item>
  <item>
   <title>Abalone with Marrow and Duck Feet</title>
   <link>http://menu.example.com/dishes.php?dish=abalone</link>
   <description>There's no mistaking the special pleasure of abalone.</description>
  </item>
 </channel>
</rss>
_XML_;

// Create the SimpleXML object
$xml = simplexml_load_string($menu);

// Modify the SimpleXML object
$xml['version'] = '6.3';
$xml->channel->title = strtoupper($xml->channel->title);

for ($i = 0; $i < 3; $i++) {
    $xml->channel->item[$i]->link = str_replace('menu.example.com','dinner.example.org', 
$xml->channel->item[$i]->link);
}

// Send the XML document to the web client
header('Content-Type: text/xml');
print $xml->asXML( );
?>

So far, the source and destination of your XML documents have been strings: simplexml_load_string( ) creates a SimpleXML object from a string, and asXML( ) returns a string representation of a SimpleXML object. However, you can also load XML documents from (and save them to) files.

To process an XML document that is in an existing file, create the SimpleXML object with simplexml_load_file( ) instead of simplexml_load_string( ). Pass the filename of the XML document to simplexml_load_file( ), and it returns a SimpleXML object populated with the XML elements from the document. Example 11-14 creates a SimpleXML object from the XML document in a file called menu.xml.

Example 11-14. Loading an XML document from a file
$xml = simplexml_load_file('menu.xml');

Once the SimpleXML object is created by simplexml_load_file( ), it behaves the same way as if it had been created with simplexml_load_string( ).

If you want to parse an XML document located on a remote web server, you can still use simplexml_load_file( ). Just pass the URL of the XML document to simplexml_load_file( ). The function retrieves the remote page and puts it into a SimpleXML object. Example 11-15 prints an HTML list of item titles from the Yahoo! News "Oddly Enough" RSS feed.

Example 11-15. Loading a remote XML document
$xml = simplexml_load_file('http://rss.news.yahoo.com/rss/oddlyenough');

print "<ul>\n";
foreach ($xml->channel->item as $item) {
    print "<li>$item->title</li>\n";
}
print "</ul>";

The content of the Yahoo! News feed is always changing, but Example 11-15 prints something like:

<ul>
<li>Apologetic Arkansas Peeping Tom Leaves Cash, Note (Reuters)</li>
<li>She Closed Airport to Avoid Vacation with Boyfriend (Reuters)</li>
<li>'First' Pet Cat Found in Tomb (Reuters)</li>
<li>Eeeyew!!!! (Reuters)</li>
<li>Cross-Dressing Heats Up Republican Race (Reuters)</li>
<li>Authorities Finally Catch Rampaging Pig (AP)</li>
<li>"First" pet cat found in Cypriot tomb (Reuters)</li>
<li>9-Year-Old Girl Arrested for Rabbit Theft (AP)</li>
<li>Prostitutes Charge NATO Troops More (AP)</li>
<li>Police Track Down Elusive Fugitive Pig (AP)</li>
<li>No sex please -- we're giant pandas (Reuters)</li>
<li>Bored? Try Molvania, birthplace of whooping cough (Reuters)</li>
<li>Fat German hamster triggers police rescue (Reuters)</li>
</ul>

You can also save the XML document that asXML( ) generates directly to a file by passing a filename to asXML( ). Example 11-16 retrieves the Yahoo! News "Oddly Enough" feed and saves it to the file odd.xml.

Example 11-16. Saving an XML document to a file
$xml = simplexml_load_file('http://rss.news.yahoo.com/rss/oddlyenough');
$xml->asXML('odd.xml');

    Previous Section  < Day Day Up >  Next Section