Lemmatization of English words in sentences in XML format by Python

Python 2.7, NLTK 3.0
The input XML file look likes this:

<?xml version="1.0" encoding="UTF-8"?> <sentences version="1.0"> <item id="1" asks-for="cause" most-plausible-alternative="1"> <p>my body cast a shadow over the grass . </p> <a1>the sun be rise . </a1> <a2>the grass be cut . </a2> </item>


<item id="2" asks-for="cause" most-plausible-alternative="1">

<p>the woman tolerate the woman friend 's difficult behavior . </p>

<a1>the woman know the woman friend be go through a hard time . </a1>

<a2>the woman felt that the woman friend take advantage of her kindness . </a2>

</item>

...

</sentences>

Python Code

#This setting is only necessary for error about 'encoding utf-8' import sys reload(sys) sys.setdefaultencoding("utf-8")


import xml.etree.cElementTree as ET #library for XML processing
from nltk.tokenize import word_tokenize #library for word tokenize
from nltk.stem import WordNetLemmatizer #library for word lemmatize

wordnet_lemmatizer = WordNetLemmatizer()
tree = ET.parse('input.xml') #parse the XML tree from input.xml

root = tree.getroot() #get root element of the tree
for item_of_root in root: #for each item

for sentence in item_of_root: #for each sentence in the item

words = word_tokenize(sentence.text) #divide sentence to words

sentenceNew = &quot;&quot; #contatiner for new lemmatized sentence

for word in words: #for each word in the sentence

lamWord = wordnet_lemmatizer.lemmatize(word, pos='v') #lemmatize the words

sentenceNew += lamWord + ' ' #put the lemmatized word to the contatiner

sentence.text = sentenceNew #store the new sentence to the tree

tree.write('output.xml') #ouput the lemmatized tree to file