Purple exclamation mark.svg Planning the future of Botwiki! - Help us bring Botwiki up to date, contribute to our strategy discussion, add bot scripts, and contribute manuals, guides, and tutorials! Almost anything related to bots, particularly those used to edit mediawiki, is welcome.

Red exclamation mark.svg UNABLE TO EDIT? - We've experienced attacks by spambots lately and now require you to confirm your e-mail before you can edit (go to your preferences, enter an e-mail address, and request a confirmation e-mail, then go to your e-mail and click on the confirmation link). We also require new accounts to make a few edits and wait a few minutes before before you can create a page; however, if this is a problem contact us in #botwiki and we can manually confirm your account. Sorry for the inconvenience.

User:Snowolf/Help:How Pywikipdiabot works

From Botwiki
Jump to: navigation, search

Python Wikipedia Bot, also known as Pywikipediabot (or sometimes Pywikipedia) is thought to be the mostly used bot engine for MediaWiki based wikis. The bot is written in Python computer language, which is an easy to learn, multiple platform, object oriented programming language.

The internal structure of Pywikipediabot is more or less like an engine with arms attached to it. Core scripts (like Wikipedia.py) define a set of classes, which are used by the other scripts, to do repetivite tasks in the target wiki. Each of these scripts (arms) consist of a module, which imports some of the classes defined in the core scripts; this lets them to use a similar method of interaction with the user, and an easy way to make access the wiki and make changes to it.

In order to understand the way a typical script in Pywikipediabot works, we start with a simple module: Get.py.

Nuvola apps ktip.png You may need to have a working copy of Pywikipediabot on your system, to be able to understand the following better. Needless to say, you need to have a basic knowledge of Python language too.

Contents

Running module

As said in the above, Pywikipediabot scripts are coded in the format of modules. Each module imports some of the other modules, to use the classes and methods defined in them. Each module also itself defines some functions and possibly some classes and methods.

Get.py starts with some comments about the script, including the licensing information, etc. After that, you will see a line which looks like this:

__version__='$Id: get.py 3327 2007-02-28 05:00:11Z wikipedian $'

This line pertains to versioning information; as it doesn't affect the way Get.py works, we skip this line for now. You can read more about it on Help:Subversion.

The next line of code imports wikipedia module. The wikipedia module is described in more details here. We will review some of the major parts of it later on.

The next line of code defines a function named main(), like this:

def main():
    singlePageTitleParts = []
    for arg in wikipedia.handleArgs():
        singlePageTitleParts.append(arg)
 
    pageTitle = " ".join(singlePageTitleParts)
    page = wikipedia.Page(wikipedia.getSite(), pageTitle)
 
    # TODO: catch exceptions
    wikipedia.output(page.get(), toStdout = True)

As you can see, a list singlePageTitleParts is defined. Then, all the arguments sent from the command line are added to the list. Finally, a string is created (named pageTitle) by joining the elements in the list and adding space between them. This is because, when the user calls the script like this:

python get.py Main Page

Main and Page are interpreted as two separate arguments; the method described will cause pageTitle to have a value of Main Page, which is exactly the page name entered by the user.

On the next line, an object is created with the name page. As you can see, this is the first place where the imported wikipedia module is needed. page object is from wikipedia.Page type. Let's have a glance at wikipedia.py file.

Page object

As you can notice, wikipedia.py file is well documented. In the beginning of the file, you find documentation about the Page class, which is used in get.py module. The __init__ function, is the constructor of the Page class. In simple words, having this line of code in get.py module:

page = wikipedia.Page(wikipedia.getSite(), pageTitle)

calls the __init__ function defined in Page class of wikipedia module, and passses the two parameters to it. As __init__ is defined like this:

def __init__(self, site, title, insite = None, defaultNamespace = 0) ...

the first parameter is a self reference, the second and thrid parameters are mandatory (must be passed when a new instance of Page is created), and the next two are optinal.

In the rest of its code, __init__<code> causes the bot to try to reach the wiki and get the page with the given name, and store the information in the properties of the instanciated <code>Page object, and make its method available too.

Making an output

The next line of code (after the TODO comment) calls another method defined in wikipedia module, to make an output. output method is defined like this:

def output(text, decoder = None, newline = True, toStdout = False) ...

in get.py, page.get() is called to return the value for mandatory text parameter, and one of the optional parameters (toStdout) is also set to True. Reading the documentation of the output function, we find out it is done to make sure the output is directly sent to the user, in a way that it can be piped to another process.

try .. except .. else

The rest of the code is the part which is run first, when you call the script. It starts with a conditional statement:

if __name__ == "__main__" ...

This ensures the script is called by the user directly, and is not imported by another module. The try .. except .. else clause will try to run the main() functino first. If it is run successfully, the code is cotinued from the else line where the stopme() method of wikipedia module is called. This will remove the bot from the list of running processes, so it would not slow down other bot threads anymore. If the try fails (i.e. an error occurs while running the main() function) the except part is run, which again will cause the bot to be removed from the running process, silently.

Where to go from here?

Now that you have been introduced to the very basic work flow of a Pywikipediabot module, you need to learn more about core scripts. It is a good idea to look at the code of other available modules (like extract_wikilinks.py, catall.py, templatecount.py etc) and learn from them.

Personal tools
Share