Zanurkuj w Pythonie/Wprowadzanie do dialect.py: Różnice pomiędzy wersjami

Usunięta treść Dodana treść
Piotr (dyskusja | edycje)
mNie podano opisu zmian
Piotr (dyskusja | edycje)
mNie podano opisu zmian
Linia 13:
== Introducing dialect.py ==
 
Dialectizer is a simple (and silly) descendant of BaseHTMLProcessor. It runs blocks of text through a series of substitutions, but it makes sure that anything within a <nowiki><pre>...</pre></nowiki> block passes through unaltered.
 
To handle the <nowiki><pre></nowiki> blocks, you define two methods in Dialectizer: start_pre and end_pre.
 
'''Example 8.17. Handling specific tags'''
Linia 27:
self.verbatim -= 1 #(6)
 
# start_pre is called every time SGMLParser finds a <nowiki><pre></nowiki> tag in the HTML source. (In a minute, you'll see exactly how this happens.) The method takes a single parameter, attrs, which contains the attributes of the tag (if any). attrs is a list of key/value tuples, just like unknown_starttag takes.
# In the reset method, you initialize a data attribute that serves as a counter for <nowiki><pre></nowiki> tags. Every time you hit a <nowiki><pre></nowiki> tag, you increment the counter; every time you hit a <nowiki></pre></nowiki> tag, you'll decrement the counter. (You could just use this as a flag and set it to 1 and reset it to 0, but it's just as easy to do it this way, and this handles the odd (but possible) case of nested <nowiki><pre></nowiki> tags.) In a minute, you'll see how this counter is put to good use.
# That's it, that's the only special processing you do for <nowiki><pre></nowiki> tags. Now you pass the list of attributes along to unknown_starttag so it can do the default processing.
# end_pre is called every time SGMLParser finds a <nowiki></pre></nowiki> tag. Since end tags can not contain attributes, the method takes no parameters.
# First, you want to do the default processing, just like any other end tag.
# Second, you decrement your counter to signal that this <nowiki><pre></nowiki> block has been closed.
 
At this point, it's worth digging a little further into SGMLParser. I've claimed repeatedly (and you've taken it on faith so far) that SGMLParser looks for and calls specific methods for each tag, if they exist. For instance, you just saw the definition of start_pre and end_pre to handle <nowiki><pre></nowiki> and <nowiki></pre></nowiki>. But how does this happen? Well, it's not magic, it's just good Python coding.
Example 8.18. SGMLParser
 
Linia 66:
# start_xxx and do_xxx methods are not called directly; the tag, method, and attributes are passed to this function, handle_starttag, so that descendants can override it and change the way all start tags are dispatched. You don't need that level of control, so you just let this method do its thing, which is to call the method (start_xxx or do_xxx) with the list of attributes. Remember, method is a function, returned from getattr, and functions are objects. (I know you're getting tired of hearing it, and I promise I'll stop saying it as soon as I run out of ways to use it to my advantage.) Here, the function object is passed into this dispatch method as an argument, and this method turns around and calls the function. At this point, you don't need to know what the function is, what it's named, or where it's defined; the only thing you need to know about the function is that it is called with one argument, attrs.
 
Now back to our regularly scheduled program: Dialectizer. When you left, you were in the process of defining specific handler methods for <nowiki><pre></nowiki> and <nowiki></pre></nowiki> tags. There's only one thing left to do, and that is to process text blocks with the pre-defined substitutions. For that, you need to override the handle_data method.
Example 8.19. Overriding the handle_data method
 
Linia 73:
 
# handle_data is called with only one argument, the text to process.
# In the ancestor BaseHTMLProcessor, the handle_data method simply appended the text to the output buffer, self.pieces. Here the logic is only slightly more complicated. If you're in the middle of a <nowiki><pre>...</pre></nowiki> block, self.verbatim will be some value greater than 0, and you want to put the text in the output buffer unaltered. Otherwise, you will call a separate method to process the substitutions, then put the result of that into the output buffer. In Python, this is a one-liner, using the and-or trick.
 
You're close to completely understanding Dialectizer. The only missing link is the nature of the text substitutions themselves. If you know any Perl, you know that when complex text substitutions are required, the only real solution is regular expressions. The classes later in dialect.py define a series of regular expressions that operate on the text between the HTML tags. But you just had a whole chapter on regular expressions. You don't really want to slog through regular expressions again, do you? God knows I don't. I think you've learned enough for one chapter.