Purple exclamation mark.svg Planning the future of Botwiki! - Help us bring Botwiki up to date, contribute to our strategy discussion, add bot scripts, and contribute manuals, guides, and tutorials! Almost anything related to bots, particularly those used to edit mediawiki, is welcome.

Red exclamation mark.svg UNABLE TO EDIT? - We've experienced attacks by spambots lately and now require you to confirm your e-mail before you can edit (go to your preferences, enter an e-mail address, and request a confirmation e-mail, then go to your e-mail and click on the confirmation link). We also require new accounts to make a few edits and wait a few minutes before before you can create a page; however, if this is a problem contact us in #botwiki and we can manually confirm your account. Sorry for the inconvenience.

Botwiki:Threading

From Botwiki
Jump to: navigation, search

When creating a bot that interacts with users or other systems over a network connection, one often faces the problem that lag greatly reduces the performance. This can be avoided by w:multitasking, a process that became widely used during the 90s.

Multitasking on a UNIX system can be done two ways: forking and threading. The main difference between those two, is the use of the memory. Two threads within one process share one and the same address space, and have thus access to the same variables. When forking is used on the other hand, the process is split into two, and both processes do not share the same address space. This means that for communication, some kind of interprocess communication must be used. For more information on forking, see the Wikipedia article on forking.

Contents

Locking

Contrary to two processes, multiple threads share the same address space. This might cause problems one or more threads write to a variable, while another tries to read from it. Data corruption and unexpected crashes may follow.

To solve this, one needs to make sure that one and only one thread has access to a variable at the same time. This is done using locks. In Python, locks are available in the threading module. If a thread needs to access a shared variable, it should first acquire a lock, read the variable and release it as soon as possible.

Dead locks

A common error in programming is a dead lock. There can be two causes of this: a thread that has acquired a lock may have unexpectedly terminated without releasing the lock, or it may be a design error, which will generally only manifest itself under rare conditions.

Example of a dead lock caused by an exception:

import thread, threading, time
 
lock = threading.Lock()
shared_variable = "some string"
 
def read():
 time.sleep(1)
 lock.acquire()
 print 'Our variable is', shared_variable
 lock.release()
def write():
 lock.acquire()
 # Casting to an int will raise a ValueError
 shared_variable = int(shared_variable)
 lock.release()
 
thread.start_new_thread(write, ())
thread.start_new_thread(read, ())

In this example, the write() function will exit prematurely with a ValueError, without ever releasing the lock. Then the read() function will wait until eternity to read the variable. The solution to this is a try .. finally' clause, which will release the lock any way:

def write():
 lock.acquire()
 try:
  # Casting to an int will raise a ValueError
  shared_variable = int(shared_variable)
 finally:
  lock.release()

When not to use threads

Don't use threads to be able to edit to a site very fast. Not only will the server admins get angry, also the pywikipedia framework has a builtin, thread-safe edit throttle, although that does not seem to work always.

Thread-safety and pywikipedia

The author has checked the following methods and classes for thread-safety: Site, Page.get, Page.put.

Using the threadpool module

Sometimes it is necessary to have jobs performed by another thread, so that the current thread can continue. As explained above, starting a thread for each job is a bad idea. Therefore the threadpool module for Python has been developed. This module implements a thread pool which allows scripts that require performing concurrent jobs, an efficient and thread safe way to do this.

The advantages of this is that the overhead of starting a thread is gone. Also the number of threads can be very easily limited. By the use of events, no thread will run into a so called "busy while loop". This module also implements a method to exit all threads as soon as they have finished their jobs.

Personal tools
Share