Purple exclamation mark.svg Planning the future of Botwiki! - Help us bring Botwiki up to date, contribute to our strategy discussion, add bot scripts, and contribute manuals, guides, and tutorials! Almost anything related to bots, particularly those used to edit mediawiki, is welcome.

Red exclamation mark.svg UNABLE TO EDIT? - We've experienced attacks by spambots lately and now require you to confirm your e-mail before you can edit (go to your preferences, enter an e-mail address, and request a confirmation e-mail, then go to your e-mail and click on the confirmation link). We also require new accounts to make a few edits and wait a few minutes before before you can create a page; however, if this is a problem contact us in #botwiki and we can manually confirm your account. Sorry for the inconvenience.

Rewrite/site.py

From Botwiki
Jump to: navigation, search

Proposals for site.py module

  1. Change the factory function to Site(); change the returned object name to _Site()
  2. Change the immutable attributes of the site (language, family, hostname, protocol) to read-only properties; use methods only when access to the wiki or user-specific data is needed to generate a return value.
  3. Get rid of all the *_address() methods for information that can be obtained via API
  4. Replace path, api_address, get_address and api_path with script_path (equal to the Mediawiki {{SCRIPTPATH}} variable ("/w" here), but we need to know what it is before connecting so that we can find the address to send to!). Once this is known, both index.php and api.php addresses can be added to it.
  5. Replace getURL, postForm and postData with a single http_request() method that calls the httplib2 module to send an HTTP request.
  6. Add corresponding foo_request() methods for any other interfaces added to the data/ package (maybe sql_request() for retrieving information directly from a database).
  7. Note that proposals below are very tentative; since writing some of them, I've been thinking that it would be desirable to minimize changes to the API so that bots from the old framework can be reused with less effort.

Current API of the Site object

def getSite(code = None, fam = None, user=None, persistent_http=None):

  • Factory function; returns a cached Site object if possible, otherwise inits a new one.
    • Proposal: rename to Site

def setSite(site):

  • Sets the default language and wiki-family to those of site.
    • Proposal: delete (not actually used anywhere in the framework)

class Site(object):

  • A MediaWiki site. Do not instantiate directly; use getSite() function.
    • Proposal: rename to _Site to reinforce the above instruction
  • Constructor takes four arguments; only code is mandatory:
    • code: language code for Site
    • fam: Wikimedia family (optional: defaults to configured). Can either be a string or a Family object.
    • user: User to use (optional: defaults to configured)
    • persistent_http: Use a persistent http connection. An http connection has to be established only once, making stuff a whole lot faster. Do NOT EVER use this if you share Site objects across threads without proper locking.
      • Delete because httplib2 handles this automatically
  • Methods:
    • language: This Site's language code.
      • Convert to read-only property (initialized from family file)
    • family: This Site's Family object.
      • Convert to read-only property (initialized from family file)
    • sitename: A string representing this Site.
      • Convert to read-only property (initialized from family file)
    • languages: A list of all languages contained in this site's Family.
      • Can be computed from sitematrix or meta=siteinfo, but with some effort.
    • validLanguageLinks: A list of language codes that can be used in interwiki links.
      • is this really any different from "languages"?
    • loggedInAs: return current username, or None if not logged in.
    • forceLogin: require the user to log in to the site
      • rename to login()
    • messages: return True if there are new messages on the site
      • Rename to user_has_messages() to avoid confusion with MediaWiki messages
    • cookies: return user's cookies as a string
      • Anyone know why cookie caching is disabled? The current implementation fetches this from the disk on every http transaction.
    • getUrl: retrieve an URL from the site
      • Covered by new http_request() method
    • urlEncode: Encode a query to be sent using an http POST request.
      • Covered by new http_request() method
    • postForm: Post form data to an address at this site.
      • Covered by new http_request() method
    • postData: Post encoded form data to an http address at this site.
      • Covered by new http_request() method
    • namespace(num): Return local name of namespace 'num'.
      • "Canonical" name can be obtained from API; but is there any way to find all recognized optional spellings (like "Project:" for "Wikipedia:", etc.)?
    • normalizeNamespace(value): Return preferred name for namespace 'value' in this Site's language.
      • Easily obtained from API
    • namespaces: Return list of canonical namespace names for this Site.
      • Easily obtained from API
    • getNamespaceIndex(name): Return the int index of namespace 'name', or None if invalid.
    • redirect: Return the localized redirect tag for the site.
    • redirectRegex: Return compiled regular expression matching on redirect pages.
    • mediawiki_message: Retrieve the text of a specified MediaWiki message
    • has_mediawiki_message: True if this site defines specified MediaWiki message
    • shared_image_repository: Return tuple of image repositories used by this site.
    • category_on_one_line: Return True if this site wants all category links on one line.
    • interwiki_putfirst: Return list of language codes for ordering of interwiki links.
    • linkto(title): Return string in the form of a wikilink to 'title'
    • isInterwikiLink(s): Return True if 's' is in the form of an interwiki link.
    • getSite(lang): Return Site object for wiki in same family, language 'lang'.
    • version: Return MediaWiki version string from Family file.
    • versionnumber: Return int identifying the MediaWiki version.
    • live_version: Return version number read from Special:Version.
    • checkCharset(charset): Warn if charset doesn't match family file.
    • linktrail: Return regex for trailing chars displayed as part of a link.
    • disambcategory: Category in which disambiguation pages are listed.
    • Methods that yield Page objects derived from a wiki's Special: pages (note, some methods yield other information in a tuple along with the Pages; see method docs for details) --
      • search(query): query results from Special:Search
      • allpages(): Special:Allpages
      • newpages(): Special:Newpages
      • newimages(): Special:Log&type=upload
      • longpages(): Special:Longpages
      • shortpages(): Special:Shortpages
      • categories(): Special:Categories (yields Category objects)
      • deadendpages(): Special:Deadendpages
      • ancientpages(): Special:Ancientpages
      • lonelypages(): Special:Lonelypages
      • unwatchedpages(): Special:Unwatchedpages (sysop accounts only)
      • uncategorizedcategories(): Special:Uncategorizedcategories (yields Category objects)
      • uncategorizedpages(): Special:Uncategorizedpages
      • uncategorizedimages(): Special:Uncategorizedimages (yields ImagePage objects)
      • unusedcategories(): Special:Unusuedcategories (yields Category)
      • unusedfiles(): Special:Unusedimages (yields ImagePage)
      • withoutinterwiki: Special:Withoutinterwiki
      • linksearch: Special:Linksearch
    • Convenience methods that provide access to properties of the wiki Family object; all of these are read-only and return a unicode string unless noted --
      • encoding: The current encoding for this site.
      • encodings: List of all historical encodings for this site.
      • category_namespace: Canonical name of the Category namespace on this site.
      • category_namespaces: List of all valid names for the Category namespace.
      • image_namespace: Canonical name of the Image namespace on this site.
      • template_namespace: Canonical name of the Template namespace on this site.
      • protocol: Protocol ('http' or 'https') for access to this site.
      • hostname: Host portion of site URL.
      • path: URL path for index.php on this Site.
      • dbName: MySQL database name.
    • Methods that return addresses to pages on this site (usually in Special: namespace); these methods only return URL paths, they do not interact with the wiki --
      • export_address: Special:Export.
      • query_address: URL path + '?' for query.php
      • api_address: URL path + '?' for api.php
      • apipath: URL path for api.php
      • move_address: Special:Movepage.
      • delete_address(s): Delete title 's'.
      • undelete_view_address(s): Special:Undelete for title 's'
      • undelete_address: Special:Undelete.
      • protect_address(s): Protect title 's'.
      • unprotect_address(s): Unprotect title 's'.
      • put_address(s): Submit revision to page titled 's'.
      • get_address(s): Retrieve page titled 's'.
      • nice_get_address(s): Short URL path to retrieve page titled 's'.
      • edit_address(s): Edit form for page titled 's'.
      • purge_address(s): Purge cache and retrieve page 's'.
      • block_address: Block an IP address.
      • unblock_address: Unblock an IP address.
      • blocksearch_address(s): Search for blocks on IP address 's'.
      • linksearch_address(s): Special:Linksearch for target 's'.
      • search_address(q): Special:Search for query 'q'.
      • allpages_address(s): Special:Allpages.
      • newpages_address: Special:Newpages.
      • longpages_address: Special:Longpages.
      • shortpages_address: Special:Shortpages.
      • unusedfiles_address: Special:Unusedimages.
      • categories_address: Special:Categories.
      • deadendpages_address: Special:Deadendpages.
      • ancientpages_address: Special:Ancientpages.
      • lonelypages_address: Special:Lonelypages.
      • unwatchedpages_address: Special:Unwatchedpages.
      • uncategorizedcategories_address: Special:Uncategorizedcategories.
      • uncategorizedimages_address: Special:Uncategorizedimages.
      • uncategorizedpages_address: Special:Uncategorizedpages.
      • unusedcategories_address: Special:Unusedcategories.
      • withoutinterwiki_address: Special:Withoutinterwiki.
      • references_address(s): Special:Whatlinksere for page 's'.
      • allmessages_address: Special:Allmessages.
      • upload_address: Special:Upload.
      • double_redirects_address: Special:Doubleredirects.
      • broken_redirects_address: Special:Brokenredirects.
      • login_address: Special:Userlogin.
      • captcha_image_address(id): Special:Captcha for image 'id'.
      • watchlist_address: Special:Watchlist editor.
      • contribs_address(target): Special:Contributions for user 'target'.

Proposals for site.py module

  1. Change the factory function to Site(); change the returned object name to _Site()
  2. Change the immutable attributes of the site (language, family, hostname, protocol) to read-only properties; use methods only when access to the wiki or user-specific data is needed to generate a return value.
  3. Get rid of all the *_address() methods for information that can be obtained via API
  4. Replace path, api_address, get_address and api_path with script_path (equal to the Mediawiki {{SCRIPTPATH}} variable ("/w" here), but we need to know what it is before connecting so that we can find the address to send to!). Once this is known, both index.php and api.php addresses can be added to it.
  5. Replace getURL, postForm and postData with a single http_request() method that calls the httplib2 module to send an HTTP request.
  6. Add corresponding foo_request() methods for any other interfaces added to the data/ package (maybe sql_request() for retrieving information directly from a database).
Personal tools
Share