| .\" This manpage is copyright (C) 2001 Paul Sheer. |
| .\" |
| .\" %%%LICENSE_START(VERBATIM) |
| .\" Permission is granted to make and distribute verbatim copies of this |
| .\" manual provided the copyright notice and this permission notice are |
| .\" preserved on all copies. |
| .\" |
| .\" Permission is granted to copy and distribute modified versions of this |
| .\" manual under the conditions for verbatim copying, provided that the |
| .\" entire resulting derived work is distributed under the terms of a |
| .\" permission notice identical to this one. |
| .\" |
| .\" Since the Linux kernel and libraries are constantly changing, this |
| .\" manual page may be incorrect or out-of-date. The author(s) assume no |
| .\" responsibility for errors or omissions, or for damages resulting from |
| .\" the use of the information contained herein. The author(s) may not |
| .\" have taken the same level of care in the production of this manual, |
| .\" which is licensed free of charge, as they might when working |
| .\" professionally. |
| .\" |
| .\" Formatted or processed versions of this manual, if unaccompanied by |
| .\" the source, must acknowledge the copyright and authors of this work. |
| .\" %%%LICENSE_END |
| .\" |
| .\" very minor changes, aeb |
| .\" |
| .\" Modified 5 June 2002, Michael Kerrisk <mtk.manpages@gmail.com> |
| .\" 2006-05-13, mtk, removed much material that is redundant with select.2 |
| .\" various other changes |
| .\" 2008-01-26, mtk, substantial changes and rewrites |
| .\" |
| .TH SELECT_TUT 2 2021-03-22 "Linux" "Linux Programmer's Manual" |
| .SH NAME |
| select, pselect \- synchronous I/O multiplexing |
| .SH SYNOPSIS |
| See |
| .BR select (2) |
| .SH DESCRIPTION |
| The |
| .BR select () |
| and |
| .BR pselect () |
| system calls are used to efficiently monitor multiple file descriptors, |
| to see if any of them is, or becomes, "ready"; |
| that is, to see whether I/O becomes possible, |
| or an "exceptional condition" has occurred on any of the file descriptors. |
| .PP |
| This page provides background and tutorial information |
| on the use of these system calls. |
| For details of the arguments and semantics of |
| .BR select () |
| and |
| .BR pselect (), |
| see |
| .BR select (2). |
| .\" |
| .SS Combining signal and data events |
| .BR pselect () |
| is useful if you are waiting for a signal as well as |
| for file descriptor(s) to become ready for I/O. |
| Programs that receive signals |
| normally use the signal handler only to raise a global flag. |
| The global flag will indicate that the event must be processed |
| in the main loop of the program. |
| A signal will cause the |
| .BR select () |
| (or |
| .BR pselect ()) |
| call to return with \fIerrno\fP set to \fBEINTR\fP. |
| This behavior is essential so that signals can be processed |
| in the main loop of the program, otherwise |
| .BR select () |
| would block indefinitely. |
| .PP |
| Now, somewhere |
| in the main loop will be a conditional to check the global flag. |
| So we must ask: |
| what if a signal arrives after the conditional, but before the |
| .BR select () |
| call? |
| The answer is that |
| .BR select () |
| would block indefinitely, even though an event is actually pending. |
| This race condition is solved by the |
| .BR pselect () |
| call. |
| This call can be used to set the signal mask to a set of signals |
| that are to be received only within the |
| .BR pselect () |
| call. |
| For instance, let us say that the event in question |
| was the exit of a child process. |
| Before the start of the main loop, we |
| would block \fBSIGCHLD\fP using |
| .BR sigprocmask (2). |
| Our |
| .BR pselect () |
| call would enable |
| .B SIGCHLD |
| by using an empty signal mask. |
| Our program would look like: |
| .PP |
| .EX |
| static volatile sig_atomic_t got_SIGCHLD = 0; |
| |
| static void |
| child_sig_handler(int sig) |
| { |
| got_SIGCHLD = 1; |
| } |
| |
| int |
| main(int argc, char *argv[]) |
| { |
| sigset_t sigmask, empty_mask; |
| struct sigaction sa; |
| fd_set readfds, writefds, exceptfds; |
| int r; |
| |
| sigemptyset(&sigmask); |
| sigaddset(&sigmask, SIGCHLD); |
| if (sigprocmask(SIG_BLOCK, &sigmask, NULL) == \-1) { |
| perror("sigprocmask"); |
| exit(EXIT_FAILURE); |
| } |
| |
| sa.sa_flags = 0; |
| sa.sa_handler = child_sig_handler; |
| sigemptyset(&sa.sa_mask); |
| if (sigaction(SIGCHLD, &sa, NULL) == \-1) { |
| perror("sigaction"); |
| exit(EXIT_FAILURE); |
| } |
| |
| sigemptyset(&empty_mask); |
| |
| for (;;) { /* main loop */ |
| /* Initialize readfds, writefds, and exceptfds |
| before the pselect() call. (Code omitted.) */ |
| |
| r = pselect(nfds, &readfds, &writefds, &exceptfds, |
| NULL, &empty_mask); |
| if (r == \-1 && errno != EINTR) { |
| /* Handle error */ |
| } |
| |
| if (got_SIGCHLD) { |
| got_SIGCHLD = 0; |
| |
| /* Handle signalled event here; e.g., wait() for all |
| terminated children. (Code omitted.) */ |
| } |
| |
| /* main body of program */ |
| } |
| } |
| .EE |
| .SS Practical |
| So what is the point of |
| .BR select ()? |
| Can't I just read and write to my file descriptors whenever I want? |
| The point of |
| .BR select () |
| is that it watches |
| multiple descriptors at the same time and properly puts the process to |
| sleep if there is no activity. |
| UNIX programmers often find |
| themselves in a position where they have to handle I/O from more than one |
| file descriptor where the data flow may be intermittent. |
| If you were to merely create a sequence of |
| .BR read (2) |
| and |
| .BR write (2) |
| calls, you would |
| find that one of your calls may block waiting for data from/to a file |
| descriptor, while another file descriptor is unused though ready for I/O. |
| .BR select () |
| efficiently copes with this situation. |
| .SS Select law |
| Many people who try to use |
| .BR select () |
| come across behavior that is |
| difficult to understand and produces nonportable or borderline results. |
| For instance, the above program is carefully written not to |
| block at any point, even though it does not set its file descriptors to |
| nonblocking mode. |
| It is easy to introduce |
| subtle errors that will remove the advantage of using |
| .BR select (), |
| so here is a list of essentials to watch for when using |
| .BR select (). |
| .TP 4 |
| 1. |
| You should always try to use |
| .BR select () |
| without a timeout. |
| Your program |
| should have nothing to do if there is no data available. |
| Code that |
| depends on timeouts is not usually portable and is difficult to debug. |
| .TP |
| 2. |
| The value \fInfds\fP must be properly calculated for efficiency as |
| explained above. |
| .TP |
| 3. |
| No file descriptor must be added to any set if you do not intend |
| to check its result after the |
| .BR select () |
| call, and respond appropriately. |
| See next rule. |
| .TP |
| 4. |
| After |
| .BR select () |
| returns, all file descriptors in all sets |
| should be checked to see if they are ready. |
| .TP |
| 5. |
| The functions |
| .BR read (2), |
| .BR recv (2), |
| .BR write (2), |
| and |
| .BR send (2) |
| do \fInot\fP necessarily read/write the full amount of data |
| that you have requested. |
| If they do read/write the full amount, it's |
| because you have a low traffic load and a fast stream. |
| This is not always going to be the case. |
| You should cope with the case of your |
| functions managing to send or receive only a single byte. |
| .TP |
| 6. |
| Never read/write only in single bytes at a time unless you are really |
| sure that you have a small amount of data to process. |
| It is extremely |
| inefficient not to read/write as much data as you can buffer each time. |
| The buffers in the example below are 1024 bytes although they could |
| easily be made larger. |
| .TP |
| 7. |
| Calls to |
| .BR read (2), |
| .BR recv (2), |
| .BR write (2), |
| .BR send (2), |
| and |
| .BR select () |
| can fail with the error |
| \fBEINTR\fP, |
| and calls to |
| .BR read (2), |
| .BR recv (2) |
| .BR write (2), |
| and |
| .BR send (2) |
| can fail with |
| .I errno |
| set to \fBEAGAIN\fP (\fBEWOULDBLOCK\fP). |
| These results must be properly managed (not done properly above). |
| If your program is not going to receive any signals, then |
| it is unlikely you will get \fBEINTR\fP. |
| If your program does not set nonblocking I/O, |
| you will not get \fBEAGAIN\fP. |
| .\" Nonetheless, you should still cope with these errors for completeness. |
| .TP |
| 8. |
| Never call |
| .BR read (2), |
| .BR recv (2), |
| .BR write (2), |
| or |
| .BR send (2) |
| with a buffer length of zero. |
| .TP |
| 9. |
| If the functions |
| .BR read (2), |
| .BR recv (2), |
| .BR write (2), |
| and |
| .BR send (2) |
| fail with errors other than those listed in \fB7.\fP, |
| or one of the input functions returns 0, indicating end of file, |
| then you should \fInot\fP pass that file descriptor to |
| .BR select () |
| again. |
| In the example below, |
| I close the file descriptor immediately, and then set it to \-1 |
| to prevent it being included in a set. |
| .TP |
| 10. |
| The timeout value must be initialized with each new call to |
| .BR select (), |
| since some operating systems modify the structure. |
| .BR pselect () |
| however does not modify its timeout structure. |
| .TP |
| 11. |
| Since |
| .BR select () |
| modifies its file descriptor sets, |
| if the call is being used in a loop, |
| then the sets must be reinitialized before each call. |
| .\" "I have heard" does not fill me with confidence, and doesn't |
| .\" belong in a man page, so I've commented this point out. |
| .\" .TP |
| .\" 11. |
| .\" I have heard that the Windows socket layer does not cope with OOB data |
| .\" properly. |
| .\" It also does not cope with |
| .\" .BR select () |
| .\" calls when no file descriptors are set at all. |
| .\" Having no file descriptors set is a useful |
| .\" way to sleep the process with subsecond precision by using the timeout. |
| .\" (See further on.) |
| .SH RETURN VALUE |
| See |
| .BR select (2). |
| .SH NOTES |
| Generally speaking, |
| all operating systems that support sockets also support |
| .BR select (). |
| .BR select () |
| can be used to solve |
| many problems in a portable and efficient way that naive programmers try |
| to solve in a more complicated manner using |
| threads, forking, IPCs, signals, memory sharing, and so on. |
| .PP |
| The |
| .BR poll (2) |
| system call has the same functionality as |
| .BR select (), |
| and is somewhat more efficient when monitoring sparse |
| file descriptor sets. |
| It is nowadays widely available, but historically was less portable than |
| .BR select (). |
| .PP |
| The Linux-specific |
| .BR epoll (7) |
| API provides an interface that is more efficient than |
| .BR select (2) |
| and |
| .BR poll (2) |
| when monitoring large numbers of file descriptors. |
| .SH EXAMPLES |
| Here is an example that better demonstrates the true utility of |
| .BR select (). |
| The listing below is a TCP forwarding program that forwards |
| from one TCP port to another. |
| .PP |
| .EX |
| #include <stdlib.h> |
| #include <stdio.h> |
| #include <unistd.h> |
| #include <sys/select.h> |
| #include <string.h> |
| #include <signal.h> |
| #include <sys/socket.h> |
| #include <netinet/in.h> |
| #include <arpa/inet.h> |
| #include <errno.h> |
| |
| static int forward_port; |
| |
| #undef max |
| #define max(x,y) ((x) > (y) ? (x) : (y)) |
| |
| static int |
| listen_socket(int listen_port) |
| { |
| struct sockaddr_in addr; |
| int lfd; |
| int yes; |
| |
| lfd = socket(AF_INET, SOCK_STREAM, 0); |
| if (lfd == \-1) { |
| perror("socket"); |
| return \-1; |
| } |
| |
| yes = 1; |
| if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, |
| &yes, sizeof(yes)) == \-1) { |
| perror("setsockopt"); |
| close(lfd); |
| return \-1; |
| } |
| |
| memset(&addr, 0, sizeof(addr)); |
| addr.sin_port = htons(listen_port); |
| addr.sin_family = AF_INET; |
| if (bind(lfd, (struct sockaddr *) &addr, sizeof(addr)) == \-1) { |
| perror("bind"); |
| close(lfd); |
| return \-1; |
| } |
| |
| printf("accepting connections on port %d\en", listen_port); |
| listen(lfd, 10); |
| return lfd; |
| } |
| |
| static int |
| connect_socket(int connect_port, char *address) |
| { |
| struct sockaddr_in addr; |
| int cfd; |
| |
| cfd = socket(AF_INET, SOCK_STREAM, 0); |
| if (cfd == \-1) { |
| perror("socket"); |
| return \-1; |
| } |
| |
| memset(&addr, 0, sizeof(addr)); |
| addr.sin_port = htons(connect_port); |
| addr.sin_family = AF_INET; |
| |
| if (!inet_aton(address, (struct in_addr *) &addr.sin_addr.s_addr)) { |
| fprintf(stderr, "inet_aton(): bad IP address format\en"); |
| close(cfd); |
| return \-1; |
| } |
| |
| if (connect(cfd, (struct sockaddr *) &addr, sizeof(addr)) == \-1) { |
| perror("connect()"); |
| shutdown(cfd, SHUT_RDWR); |
| close(cfd); |
| return \-1; |
| } |
| return cfd; |
| } |
| |
| #define SHUT_FD1 do { \e |
| if (fd1 >= 0) { \e |
| shutdown(fd1, SHUT_RDWR); \e |
| close(fd1); \e |
| fd1 = \-1; \e |
| } \e |
| } while (0) |
| |
| #define SHUT_FD2 do { \e |
| if (fd2 >= 0) { \e |
| shutdown(fd2, SHUT_RDWR); \e |
| close(fd2); \e |
| fd2 = \-1; \e |
| } \e |
| } while (0) |
| |
| #define BUF_SIZE 1024 |
| |
| int |
| main(int argc, char *argv[]) |
| { |
| int h; |
| int fd1 = \-1, fd2 = \-1; |
| char buf1[BUF_SIZE], buf2[BUF_SIZE]; |
| int buf1_avail = 0, buf1_written = 0; |
| int buf2_avail = 0, buf2_written = 0; |
| |
| if (argc != 4) { |
| fprintf(stderr, "Usage\en\etfwd <listen\-port> " |
| "<forward\-to\-port> <forward\-to\-ip\-address>\en"); |
| exit(EXIT_FAILURE); |
| } |
| |
| signal(SIGPIPE, SIG_IGN); |
| |
| forward_port = atoi(argv[2]); |
| |
| h = listen_socket(atoi(argv[1])); |
| if (h == \-1) |
| exit(EXIT_FAILURE); |
| |
| for (;;) { |
| int ready, nfds = 0; |
| ssize_t nbytes; |
| fd_set readfds, writefds, exceptfds; |
| |
| FD_ZERO(&readfds); |
| FD_ZERO(&writefds); |
| FD_ZERO(&exceptfds); |
| FD_SET(h, &readfds); |
| nfds = max(nfds, h); |
| |
| if (fd1 > 0 && buf1_avail < BUF_SIZE) |
| FD_SET(fd1, &readfds); |
| /* Note: nfds is updated below, when fd1 is added to |
| exceptfds. */ |
| if (fd2 > 0 && buf2_avail < BUF_SIZE) |
| FD_SET(fd2, &readfds); |
| |
| if (fd1 > 0 && buf2_avail \- buf2_written > 0) |
| FD_SET(fd1, &writefds); |
| if (fd2 > 0 && buf1_avail \- buf1_written > 0) |
| FD_SET(fd2, &writefds); |
| |
| if (fd1 > 0) { |
| FD_SET(fd1, &exceptfds); |
| nfds = max(nfds, fd1); |
| } |
| if (fd2 > 0) { |
| FD_SET(fd2, &exceptfds); |
| nfds = max(nfds, fd2); |
| } |
| |
| ready = select(nfds + 1, &readfds, &writefds, &exceptfds, NULL); |
| |
| if (ready == \-1 && errno == EINTR) |
| continue; |
| |
| if (ready == \-1) { |
| perror("select()"); |
| exit(EXIT_FAILURE); |
| } |
| |
| if (FD_ISSET(h, &readfds)) { |
| socklen_t addrlen; |
| struct sockaddr_in client_addr; |
| int fd; |
| |
| addrlen = sizeof(client_addr); |
| memset(&client_addr, 0, addrlen); |
| fd = accept(h, (struct sockaddr *) &client_addr, &addrlen); |
| if (fd == \-1) { |
| perror("accept()"); |
| } else { |
| SHUT_FD1; |
| SHUT_FD2; |
| buf1_avail = buf1_written = 0; |
| buf2_avail = buf2_written = 0; |
| fd1 = fd; |
| fd2 = connect_socket(forward_port, argv[3]); |
| if (fd2 == \-1) |
| SHUT_FD1; |
| else |
| printf("connect from %s\en", |
| inet_ntoa(client_addr.sin_addr)); |
| |
| /* Skip any events on the old, closed file |
| descriptors. */ |
| |
| continue; |
| } |
| } |
| |
| /* NB: read OOB data before normal reads. */ |
| |
| if (fd1 > 0 && FD_ISSET(fd1, &exceptfds)) { |
| char c; |
| |
| nbytes = recv(fd1, &c, 1, MSG_OOB); |
| if (nbytes < 1) |
| SHUT_FD1; |
| else |
| send(fd2, &c, 1, MSG_OOB); |
| } |
| if (fd2 > 0 && FD_ISSET(fd2, &exceptfds)) { |
| char c; |
| |
| nbytes = recv(fd2, &c, 1, MSG_OOB); |
| if (nbytes < 1) |
| SHUT_FD2; |
| else |
| send(fd1, &c, 1, MSG_OOB); |
| } |
| if (fd1 > 0 && FD_ISSET(fd1, &readfds)) { |
| nbytes = read(fd1, buf1 + buf1_avail, |
| BUF_SIZE \- buf1_avail); |
| if (nbytes < 1) |
| SHUT_FD1; |
| else |
| buf1_avail += nbytes; |
| } |
| if (fd2 > 0 && FD_ISSET(fd2, &readfds)) { |
| nbytes = read(fd2, buf2 + buf2_avail, |
| BUF_SIZE \- buf2_avail); |
| if (nbytes < 1) |
| SHUT_FD2; |
| else |
| buf2_avail += nbytes; |
| } |
| if (fd1 > 0 && FD_ISSET(fd1, &writefds) && buf2_avail > 0) { |
| nbytes = write(fd1, buf2 + buf2_written, |
| buf2_avail \- buf2_written); |
| if (nbytes < 1) |
| SHUT_FD1; |
| else |
| buf2_written += nbytes; |
| } |
| if (fd2 > 0 && FD_ISSET(fd2, &writefds) && buf1_avail > 0) { |
| nbytes = write(fd2, buf1 + buf1_written, |
| buf1_avail \- buf1_written); |
| if (nbytes < 1) |
| SHUT_FD2; |
| else |
| buf1_written += nbytes; |
| } |
| |
| /* Check if write data has caught read data. */ |
| |
| if (buf1_written == buf1_avail) |
| buf1_written = buf1_avail = 0; |
| if (buf2_written == buf2_avail) |
| buf2_written = buf2_avail = 0; |
| |
| /* One side has closed the connection, keep |
| writing to the other side until empty. */ |
| |
| if (fd1 < 0 && buf1_avail \- buf1_written == 0) |
| SHUT_FD2; |
| if (fd2 < 0 && buf2_avail \- buf2_written == 0) |
| SHUT_FD1; |
| } |
| exit(EXIT_SUCCESS); |
| } |
| .EE |
| .PP |
| The above program properly forwards most kinds of TCP connections |
| including OOB signal data transmitted by \fBtelnet\fP servers. |
| It handles the tricky problem of having data flow in both directions |
| simultaneously. |
| You might think it more efficient to use a |
| .BR fork (2) |
| call and devote a thread to each stream. |
| This becomes more tricky than you might suspect. |
| Another idea is to set nonblocking I/O using |
| .BR fcntl (2). |
| This also has its problems because you end up using |
| inefficient timeouts. |
| .PP |
| The program does not handle more than one simultaneous connection at a |
| time, although it could easily be extended to do this with a linked list |
| of buffers\(emone for each connection. |
| At the moment, new |
| connections cause the current connection to be dropped. |
| .SH SEE ALSO |
| .BR accept (2), |
| .BR connect (2), |
| .BR poll (2), |
| .BR read (2), |
| .BR recv (2), |
| .BR select (2), |
| .BR send (2), |
| .BR sigprocmask (2), |
| .BR write (2), |
| .BR epoll (7) |
| .\" .SH AUTHORS |
| .\" This man page was written by Paul Sheer. |