Purple exclamation mark.svg Planning the future of Botwiki! - Help us bring Botwiki up to date, contribute to our strategy discussion, add bot scripts, and contribute manuals, guides, and tutorials! Almost anything related to bots, particularly those used to edit mediawiki, is welcome.

Red exclamation mark.svg UNABLE TO EDIT? - We've experienced attacks by spambots lately and now require you to confirm your e-mail before you can edit (go to your preferences, enter an e-mail address, and request a confirmation e-mail, then go to your e-mail and click on the confirmation link). We also require new accounts to make a few edits and wait a few minutes before before you can create a page; however, if this is a problem contact us in #botwiki and we can manually confirm your account. Sorry for the inconvenience.

Python:Page list (en)

From Botwiki
Jump to: navigation, search

Most bots can take a -file parameter, which reads a file and gets pages it should act on from that. On a small wiki, it may be interesting to get a text list of all pages, go through it by hand (or with a text editor's search and replace) and then give that file to your bot.

This script generates a list of all page titles.

#!/usr/bin/python
# -*- coding: utf-8  -*-
"""
prints a flat text list of page titles as wiki links
for use with other bots and the -file option
"""
import wikipedia
import pagegenerators
import sys
import urllib
import re
 
def listpages(self, start = '!', namespace = 0, throttle = True):
        """This is just a hacked version of the function from wikipedia.py, 
            made to return text instead of objects."""
        while True:
            # encode Non-ASCII characters in hexadecimal format (e.g. %F6)
            start = start.encode(self.encoding())
            start = urllib.quote(start)
            # load a list which contains a series of article names (always 480)
            path = self.allpages_address(start, namespace)
            print 'Retrieving Allpages special page for %s from %s, namespace %i' % (repr(self), start, namespace)
            returned_html = self.getUrl(path)
            # Try to find begin and end markers
            try:
                # In 1.4, another table was added above the navigational links
                if self.version() < "1.4":
                    begin_s = '<table'
                    end_s = '</table'
                else:
                    begin_s = '</table><hr /><table'
                    end_s = '</table'
                ibegin = returned_html.index(begin_s)
                iend = returned_html.index(end_s,ibegin + 3)
            except ValueError:
                raise ServerError('Couldn\'t extract allpages special page. Make sure you\'re using the MonoBook skin.')
            # remove the irrelevant sections
            returned_html = returned_html[ibegin:iend]
            if self.version()=="1.2":
                R = re.compile('/wiki/(.*?)" *class=[\'\"]printable')
            else:
                R = re.compile('title ?="(.*?)"')
            # Count the number of useful links on this page
            n = 0
            for hit in R.findall(returned_html):
                # count how many articles we found on the current page
                n = n + 1
 
                yield hit
 
                # save the last hit, so that we know where to continue when we
                # finished all articles on the current page. Append a '!' so that
                # we don't yield a page twice.
                start = wikipedia.Page(self,hit).titleWithoutNamespace() + '!'
            # A small shortcut: if there are less than 100 pages listed on this
            # page, there is certainly no next. Probably 480 would do as well,
            # but better be safe than sorry.
            if n < 100:
                break
 
try:
    start = []
    test = False
    for arg in wikipedia.handleArgs():
        if arg.startswith("-test"):
            test = True
        else:
            start.append(arg)
    if start:
        start = " ".join(start)
    else:
        start = "!"
    mysite = wikipedia.getSite()
 
    for page in listpages(mysite):
        print "[[%s]]" % page
 
finally:
    wikipedia.stopme()
Personal tools
Share