Most RPG programmers will never need to use multiple threads. Even those RPG programmers who can perceive a need for multiple threads may decide that using them is too complex. At this point, you might expect me to say that "I don’t want to scare you." The truth is, however, that I do want to scare you. It is easy to do the coding for multiple threads, but it’s extremely difficult, some say impossible, to use multiple threads correctly. What’s more, multiple threads are difficult to test because they may behave slightly different each time they run. With that said, we'll explore 6.1's new support for multithreaded programming in RPG. So read on, but beware! Don't get too eager to try out these new techniques. If you do decide to introduce multithreading into your applications, you should read every word in the general Threads section of the Information Center, and read every book you can find on multithreaded programming. Then, read them again.
Thread is short for the term thread of execution. A thread of execution is the sequence of machine instructions that the system performs as it runs the programs and procedures in your job. When you have two threads of execution in the same job, the system is performing two sequences of machine instructions simultaneously. The two threads could be running different programs, different procedures in the same module, the same procedure, or even the same instruction. When an application uses more than one thread, it's called a multithreaded application.
RPG has been able to run in threads since 4.4, with the introduction of THREAD(*SERIALIZE). Coding this keyword serialized access to the module. In other words, only one thread at a time could run in the module. The 4.4 support allowed RPG programs to participate in a multithreaded application, but doing so often hindered the application's performance.
Starting in 6.1, RPG can fully participate in a multithreaded application and possibly gain the performance benefits of running multithreaded. The THREAD(*CONCURRENT) keyword allows multiple threads to run in a module at the same time. However, THREAD(*CONCURRENT) doesn't replace THREAD(*SERIALIZE). In fact, both modes can be useful. (To learn more, refer to the information about thread-related keywords in the ILE RPG Reference.)
Multiple Threads vs. Multiple Jobs
We are all familiar with multiple jobs running at the same time. The difference is that threads are tightly coupled, while jobs are usually isolated from each other. Jobs can communicate with each other only through external objects such as data queues or message queues. Threads communicate with each other by default because they are running in the same job, and sometimes even running in the same program with the same program variables. The challenge with threads is limiting their communication and only allowing them to communicate with each other in a thread-safe way.
The issue of thread safety becomes critical when a job has multiple threads. To ensure your application is thread-safe, let it call programs or procedures that are themselves thread-safe and use only thread-safe commands. If your application uses shared resources, it must use them in a thread-safe way. Some examples of resources that can be shared are user spaces, static storage, storage accessed through a basing pointer, and files overridden to be shared.
When you call a program or service program for the first time, the system allocates some storage for the global program variables and any variables in subprocedures defined with the STATIC keyword. This storage is allocated to the program for the lifetime of the activation group. Every time you call the program, it uses the same storage for those variables. This storage is called static storage.
The other two types of storage are automatic storage and heap storage. Automatic storage is used for most of the local variables in your subprocedures; it is allocated to your procedure when it is called and deallocated when your procedure returns. Heap storage is allocated by %alloc and %realloc and deallocated by the DEALLOC opcode.
All RPG modules use static storage, even if the module doesn't have any global variables, and all the subprocedures define only automatic storage. The RPG compiler uses static storage for many of its internal variables. If a job has two threads running in the same module, both threads can access the static storage in the module. This is an inherently dangerous situation. Consider the following fragment of code:
arr(index) = arr(index) + 1;
If these variables are in static storage, and only one thread is running in the code, the variables would progress through the sequence of values in Figure 1 as control flows through the statements (the changing variables are in red). Notice that only one variable changes at a time. If these variables are in static storage, and two threads are allowed to run this code at once, the variables might progress through the sequence of values that Figure 2 shows. I use T1 and T2 to indicate which of the two threads changed the value.
You can see how the threads have interfered with each other. The second thread started the loop when the first thread was on the second iteration. This caused the first array element to be incremented twice. Then, the threads both incremented the loop counter—index—before either thread could increment the second array element. This example is only one of the many ways that threads can interfere with one another. The interaction is always unpredictable, so even if the result is correct occasionally, odds are it usually isn't.
The problem is not limited to two threads running the same code. One thread might be running the above code snippet, while another thread is running the code that initialized the array by reading records from a file. You cannot allow this kind of interaction in a thread-safe application. These static variables are a shared resource, and the access to shared resources by multiple threads must be controlled.
RPG has two ways to control access to a module's static storage. When you specify THREAD(*SERIALIZE), it is impossible for two threads to run the same code at the same time. In fact, it is impossible for two threads to run in any procedure in the module at the same time. If thread T1 is running procedure PROC1, and thread T2 attempts to call procedure PROC2 in the same module, thread T2 will not be able to start running PROC2 until thread T1 has returned and is no longer running in the module.
When you specify THREAD(*CONCURRENT), each thread has its own copy of the module's static storage. If thread T2 begins the loop and sets index to 1, the index that thread T1 is working with is not affected. With THREAD(*CONCURRENT), static storage is not a shared resource, so there is no thread-safety issue associated with it.
In a THREAD(*CONCURRENT) module, you can specify the SERIALIZE keyword on any procedure-begin specification, which makes it impossible for more than one thread to simultaneously run that procedure. If thread T1 is running serialized procedure SERIALPROC, and thread T2 attempts to call the procedure, thread T2 will not be able to start running SERIALPROC until thread T1 has returned from the procedure. If two procedures in a module are serialized, they are serialized separately. If thread T1 is running serialized procedure SERIALPROC1, thread T2 can be running serialized procedure SERIALPROC2 in the same module at the same time.
In a THREAD(*CONCURRENT) module, the default is for static storage to be thread local static storage, a new storage type for the IBM i in 6.1. If you want a particular variable to be shared among your threads, you can specify STATIC(*ALLTHREAD) for the variable. When you use this keyword, you must be aware that there are thread-safety issues with the variable. You must ensure that when one thread is initializing or changing the all-thread-static variable, that no other thread is using the variable at that time.
Sometimes this is easy. If all the changes to the variable are done before the application has started secondary threads, there is no thread-safety issue. It is safe for multiple threads to "read" a variable at the same time. Another easy way to ensure thread safety is to code the variable as a local variable in a serialized procedure. Because only one thread can run in the serialized procedure at a time, it is impossible for more than one thread to simultaneously use the all-thread-static variable.
Accessing Shared Resources
When you need to use an all-thread-static variable in less rigid ways, you must control access to the variable using a synchronization mechanism such as a mutex (short for "mutual exclusion") or a semaphore. Your application associates a mutex or semaphore with the variable, and every time the application needs to use the variable, it first locks the mutex or acquires the semaphore. When it's finished using the variable, the application unlocks the mutex or releases the semaphore, freeing it for any waiting threads. For the code to be thread-safe, every point of access to the variable must use the same synchronization mechanism. If one part of the code uses the variable without the synchronization, or with a different synchronization, the code is not thread-safe.
An all-thread-static variable is the simplest type of shared resource. Usually, access to the shared variable is limited to a single module. If a multithreaded application needs to use an external shared resource, such as the data in a user space, then every module and program in the application must use the same synchronization mechanism. This is much more complicated than managing an all-thread-static variable (which is already complicated enough). It is beyond the scope of this article to discuss the details of mutexes and semaphores, but you can read about them in the Multithreaded Programming topic of the Information Center. The 6.1 version of the ILE RPG Programmer’s Guide provides some simple examples as well.
*SERIALIZE or *CONCURRENT?
If an RPG module might be used in a multithreaded application, you must code the THREAD keyword as either *SERIALIZE or *CONCURRENT. You can mix and match serialized and concurrent modules in a program. Both modes have advantages and disadvantages. With THREAD(*SERIALIZE), the RPG module might act as a bottleneck in the application. With THREAD(*CONCURRENT), the static storage requirements of the application could increase beyond the capacity of the job, because each thread has its own copy of the static storage required by the module.
Ideally, you could reduce the amount of static storage required by an RPG module to rid any barrier to coding THREAD(*CONCURRENT). Other RPG enhancements in 6.1, such as local files and OPTION(*NOUNREF), are aimed at reducing static storage. Unfortunately, many of the internal variables used by the RPG compiler are in static storage, so you can't eliminate all static storage from your module. As a rule of thumb, choose THREAD(*SERIALIZE) for a module if the module uses a lot of static storage, especially if you expect a high number of threads for your application. Select THREAD(*CONCURRENT) if serializing access to your module would cause the module to act as a noticeable bottleneck.
Why Use Multithreading?
As I said earlier, many RPG programmers never see a need to use multiple threads. For IBM i programmers, the alternative to multiple threads is multiple jobs, and RPG programmers have always been familiar with using multiple jobs. A significant difference between multiple jobs and multiple threads is the level of communication available between the jobs or threads. Although little support is specifically aimed at interjob communication, almost all the thread APIs are targeted at interthread communication. There are APIs to start threads and wait for threads to end; APIs to tell threads to stop, to wait, to wake up; and APIs to synchronize access to shared resources. If you're contemplating using multiple jobs in an application, but you want the jobs to communicate closely, you might consider using multiple threads instead.
Another difference between jobs and threads is that threads are lighter weight than jobs. A job takes longer to start than a thread and consumes more system resources. If your application needs to create hundreds or thousands of jobs, then you might consider using multiple threads. For example, say you have a web application in which each client starts a separate job on your IBM i server. In this case, consider using multiple threads.
Read More on Threads
Before you take on the challenge of using multiple threads in your application, be sure to read everything you can on the topic. The IBM Information Center and the ILE RPG Programmer's Guide are good places to start. Most of the information you find on threads is discussed in terms of some other programming language, such as C or Java, or in terms of some other operating system like Linux or Windows. Don't let that deter you from reading it. The concepts are the same.
Barbara Morris works at the IBM Toronto Lab, where she is the lead developer of the RPG compilers.