php – How to join two XML files with a matching node-ThrowExceptions

Exception or error:

I need to find a way to join two XML files when they have a matching node. From what I gather this could be accomplished with many different languages… is there a PHP or an AJAX way to do this? From other posts on SO I see XSLT solutions.. that I dont really get. Is this the best/preferred method? If so, know of any helpful XSLT tutorials?

For example XML-1 is like :

<FOO>
    </A>
    </B>
    </C>
    </D>
</FOO>

and XML-2 :

<FOO>    
    </B>
    </E>
</FOO>

What would be the best approach for checking where <B>==<B> then add <E>

Update

Well I cant get this to work with my hypothetical example and thought I would update with what I am really doing to see if anyone can help me figure this. I have tried the methods from below and others I have found on SO with no luck.

The real schema is like :

file1.xml

<?xml version="1.0"?>
<DATA>
  <ITEM>
    <PRODUCT_TYPE>simple</PRODUCT_TYPE>
    <STYLE_COLOR>1524740007</STYLE_COLOR>
    <SHORT_DESCRIPTION>Black Shoe</SHORT_DESCRIPTION>
    <CLASS_NAME>FOOTWEAR</CLASS_NAME>
    <STATUS>Disabled</STATUS>
  </ITEM>
 ...
</DATA>

file2.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="merge.xsl" ?>
<DATA>
  <ITEM>
    <STYLE_COLOR>1524740007</STYLE_COLOR>
    <NEXT_ARRIVAL>2011-08-05</NEXT_ARRIVAL>
  </ITEM>
  ....
</DATA>

What I need to figure out is to have a new XML file generated that would merge these nodes with identical SYTLE_COLOR and look like:

<DATA>
  <ITEM>
    <PRODUCT_TYPE>simple</PRODUCT_TYPE>
    <STYLE_COLOR>1524740007</STYLE_COLOR>
    <SHORT_DESCRIPTION>Black Shoe</SHORT_DESCRIPTION>
    <CLASS_NAME>FOOTWEAR</CLASS_NAME>
    <NEXT_ARRIVAL>2011-08-05</NEXT_ARRIVAL>
    <STATUS>Disabled</STATUS>
  </ITEM>

I tried creating a merge.xsl that looks like :

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" version="1.0" encoding="ISO-8859-1" indent="yes" />
  <xsl:output indent="yes"/>
  <xsl:variable name="with" select="'file-2.xml'" />
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="scene">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()" />
      <xsl:variable name="info" select="document($with)/DATA/ITEM[STYLE_COLOR=current()/STYLE_COLOR]/." />
      <xsl:for-each select="$info/*">
        <xsl:if test="name()!='STYLE_COLOR'">
          <xsl:copy-of select="." />
        </xsl:if>
      </xsl:for-each>
    </xsl:copy>
  </xsl:template>
</xsl:transform>

I also tried a merge like this:

<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="yes"/>
    <xsl:variable name="input2" select="document('file-2.xml')/DATA/ITEM"/>
    <xsl:template match="STYLE_COLOR">
        <xsl:copy>
            <xsl:apply-templates select="*"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="*">
        <xsl:choose>
            <xsl:when test="$input2/*[name()=name(current())]">
                <xsl:copy-of select="$input2/*"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:copy-of select="."/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet> 

Neither of these methods are working.. sorry XSLT is very new to me so I am not sure what I am doing and would really appreciate some hand holding on this one.

How to solve:

This is the original transform slightly modified to adapt the new requirements. The merge is performed by checking against file2.xml elements. For the current ITEM in file1, a children ITEM in file2 will be merged only if not present in the file1.


[XSLT 1.0]

<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:variable name="input2" select="document('test_input2.xml')/DATA"/>

    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="ITEM">
        <xsl:variable name="item" select="
            $input2/ITEM[STYLE_COLOR=current()/STYLE_COLOR]"/>
        <xsl:variable name="ITEM" select="."/>

        <xsl:if test="$item">
            <xsl:copy>

                <xsl:for-each select="$item/*">
                    <xsl:if test="count($ITEM/*[name()=name(current())])=0">
                        <xsl:copy-of select="." />
                    </xsl:if>
                </xsl:for-each>

                <xsl:apply-templates select="*"/>
            </xsl:copy>
        </xsl:if>
    </xsl:template>

</xsl:stylesheet> 

Applied on this input1.xml:

<DATA>
  <ITEM>
    <PRODUCT_TYPE>simple</PRODUCT_TYPE>
    <STYLE_COLOR>1524740007</STYLE_COLOR>
    <SHORT_DESCRIPTION>Black Shoe</SHORT_DESCRIPTION>
    <CLASS_NAME>FOOTWEAR</CLASS_NAME>
    <STATUS>Disabled</STATUS>
  </ITEM>
  <ITEM>
    <PRODUCT_TYPE>simple</PRODUCT_TYPE>
    <STYLE_COLOR>1524740008</STYLE_COLOR>
    <SHORT_DESCRIPTION>Black Shoe</SHORT_DESCRIPTION>
    <CLASS_NAME>FOOTWEAR</CLASS_NAME>
    <STATUS>Disabled</STATUS>
  </ITEM>
  <ITEM>
    <PRODUCT_TYPE>simple</PRODUCT_TYPE>
    <STYLE_COLOR>777</STYLE_COLOR>
    <SHORT_DESCRIPTION>Black Shoe</SHORT_DESCRIPTION>
    <CLASS_NAME>FOOTWEAR</CLASS_NAME>
    <STATUS>Disabled</STATUS>
  </ITEM>
</DATA>

and input2.xml to merge, produces:

<DATA>
  <ITEM>
    <STYLE_COLOR>1524740007</STYLE_COLOR>
    <NEXT_ARRIVAL>2011-08-05</NEXT_ARRIVAL>
    <CLASS_NAME>XXX</CLASS_NAME>
    <OTHER>YYY</OTHER>
  </ITEM>
  <ITEM>
    <STYLE_COLOR>1524740008</STYLE_COLOR>
    <NEXT_ARRIVAL>2011-08-05</NEXT_ARRIVAL>
  </ITEM>
</DATA>

produces:

<DATA>
   <ITEM>
      <NEXT_ARRIVAL>2011-08-05</NEXT_ARRIVAL>
      <OTHER>YYY</OTHER>
      <PRODUCT_TYPE>simple</PRODUCT_TYPE>
      <STYLE_COLOR>1524740007</STYLE_COLOR>
      <SHORT_DESCRIPTION>Black Shoe</SHORT_DESCRIPTION>
      <CLASS_NAME>FOOTWEAR</CLASS_NAME>
      <STATUS>Disabled</STATUS>
   </ITEM>
   <ITEM>
      <NEXT_ARRIVAL>2011-08-05</NEXT_ARRIVAL>
      <PRODUCT_TYPE>simple</PRODUCT_TYPE>
      <STYLE_COLOR>1524740008</STYLE_COLOR>
      <SHORT_DESCRIPTION>Black Shoe</SHORT_DESCRIPTION>
      <CLASS_NAME>FOOTWEAR</CLASS_NAME>
      <STATUS>Disabled</STATUS>
   </ITEM>
</DATA>

Notice that:

  • the transform does not override existing elements for a given ITEM, just copy the missing ones
  • ITEM in input1.xml is copied in output only if has a match in input2.xml

Answer:

This transformation (c:/temp/file1.xml and c:/temp/file2.xml are as provided in the question):

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="pDoc1" select="'file:///c:/temp/file1.xml'"/>
 <xsl:variable name="vDoc1" select="document($pDoc1)"/>

 <xsl:param name="pDoc2" select="'file:///c:/temp/file2.xml'"/>
 <xsl:variable name="vDoc2" select="document($pDoc2)"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="/">
  <xsl:apply-templates select="$vDoc1/node()"/>
 </xsl:template>

 <xsl:template match="ITEM">
  <ITEM>
   <xsl:apply-templates/>

   <xsl:apply-templates select=
     "$vDoc2/*/ITEM
               [STYLE_COLOR = current()/STYLE_COLOR]
                /node()[not(self::STYLE_COLOR)]
     "/>
  </ITEM>
 </xsl:template>
</xsl:stylesheet>

when applied on any XML document (not used / ignored), produces the wanted, correct result:

<DATA>
   <ITEM xmlns:xs="http://www.w3.org/2001/XMLSchema">
      <PRODUCT_TYPE>simple</PRODUCT_TYPE>
      <STYLE_COLOR>1524740007</STYLE_COLOR>
      <SHORT_DESCRIPTION>Black Shoe</SHORT_DESCRIPTION>
      <CLASS_NAME>FOOTWEAR</CLASS_NAME>
      <STATUS>Disabled</STATUS>
      <NEXT_ARRIVAL>2011-08-05</NEXT_ARRIVAL>
   </ITEM> ...
</DATA>

Explanation:

  1. First we process (apply templates to) the nodes of the first XML document.

  2. The identity rule/template copies every node as is.

  3. There is a single template that overrides the identity rule. This template matches any element named ITEM. It creates an element also named ITEM, then processes all children nodes and this results in copying them by the identity template. Finally, all nodes that are children of any ITEM element from the second XML document whose STYLE_COLOR child has the same string value as the string value of the STYLE_COLOR child of the current (matched) element are also copied (by applying templates to them and as result of the selection and execution of the identity template), with the exception of the STYLE_COLOR child itself.

  4. Do note that the filepaths of the two pats are passed as parameters to the transformation, which makes it more flexible and able to work with any two XML files — without any modification. We use the XSLT document() function to load and parse these two documents so that they can be processed by the xslt transformation.

  5. Do note that no xsl:for-each or xsl:if or any other XSLT conditional instructions are used in this transformation. This results in simpler and easier to maintain and understand code.

  6. Finally, note the use and overriding of the identity rule/template. This is the most fundamental and powerful XSLT design pattern.

Answer:

XSLT is quite powerful, however I must admit that I’m not that fluently with it, so will give you a sugestion for a manual transformation:

print "<FOOCONTAINER>\n";
readfile($xml1file);
readfile($xml2file);
print "</FOOCONTAINER>\n";

Anything else you can easily accomplish within the further processing as this is XML.

Edit: This works only for the first XML offered by the OP.

Leave a Reply

Your email address will not be published. Required fields are marked *