| .\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>. |
| .\" and Copyright (C) 2008 Michael Kerrisk <mtk.manpages@gmail.com> |
| .\" Note also that many pieces are drawn from the kernel source file |
| .\" Documentation/networking/ip-sysctl.txt. |
| .\" |
| .\" %%%LICENSE_START(VERBATIM_ONE_PARA) |
| .\" Permission is granted to distribute possibly modified copies |
| .\" of this page provided the header is included verbatim, |
| .\" and in case of nontrivial modification author and date |
| .\" of the modification is added to the header. |
| .\" %%%LICENSE_END |
| .\" |
| .\" 2.4 Updates by Nivedita Singhvi 4/20/02 <nivedita@us.ibm.com>. |
| .\" Modified, 2004-11-11, Michael Kerrisk and Andries Brouwer |
| .\" Updated details of interaction of TCP_CORK and TCP_NODELAY. |
| .\" |
| .\" 2008-11-21, mtk, many, many updates. |
| .\" The descriptions of /proc files and socket options should now |
| .\" be more or less up to date and complete as at Linux 2.6.27 |
| .\" (other than the remaining FIXMEs in the page source below). |
| .\" |
| .\" FIXME The following need to be documented |
| .\" TCP_MD5SIG (2.6.20) |
| .\" commit cfb6eeb4c860592edd123fdea908d23c6ad1c7dc |
| .\" Author was yoshfuji@linux-ipv6.org |
| .\" Needs CONFIG_TCP_MD5SIG |
| .\" From net/inet/Kconfig: |
| .\" bool "TCP: MD5 Signature Option support (RFC2385) (EXPERIMENTAL)" |
| .\" RFC2385 specifies a method of giving MD5 protection to TCP sessions. |
| .\" Its main (only?) use is to protect BGP sessions between core routers |
| .\" on the Internet. |
| .\" |
| .\" There is a TCP_MD5SIG option documented in FreeBSD's tcp(4), |
| .\" but probably many details are different on Linux |
| .\" http://thread.gmane.org/gmane.linux.network/47490 |
| .\" http://www.daemon-systems.org/man/tcp.4.html |
| .\" http://article.gmane.org/gmane.os.netbsd.devel.network/3767/match=tcp_md5sig+freebsd |
| .\" |
| .\" TCP_COOKIE_TRANSACTIONS (2.6.33) |
| .\" commit 519855c508b9a17878c0977a3cdefc09b59b30df |
| .\" Author: William Allen Simpson <william.allen.simpson@gmail.com> |
| .\" commit e56fb50f2b7958b931c8a2fc0966061b3f3c8f3a |
| .\" Author: William Allen Simpson <william.allen.simpson@gmail.com> |
| .\" |
| .\" REMOVED in Linux 3.10 |
| .\" commit 1a2c6181c4a1922021b4d7df373bba612c3e5f04 |
| .\" Author: Christoph Paasch <christoph.paasch@uclouvain.be> |
| .\" |
| .\" TCP_THIN_LINEAR_TIMEOUTS (2.6.34) |
| .\" commit 36e31b0af58728071e8023cf8e20c5166b700717 |
| .\" Author: Andreas Petlund <apetlund@simula.no> |
| .\" |
| .\" TCP_THIN_DUPACK (2.6.34) |
| .\" commit 7e38017557bc0b87434d184f8804cadb102bb903 |
| .\" Author: Andreas Petlund <apetlund@simula.no> |
| .\" |
| .\" TCP_REPAIR (3.5) |
| .\" commit ee9952831cfd0bbe834f4a26489d7dce74582e37 |
| .\" Author: Pavel Emelyanov <xemul@parallels.com> |
| .\" See also |
| .\" http://criu.org/TCP_connection |
| .\" https://lwn.net/Articles/495304/ |
| .\" |
| .\" TCP_REPAIR_QUEUE (3.5) |
| .\" commit ee9952831cfd0bbe834f4a26489d7dce74582e37 |
| .\" Author: Pavel Emelyanov <xemul@parallels.com> |
| .\" |
| .\" TCP_QUEUE_SEQ (3.5) |
| .\" commit ee9952831cfd0bbe834f4a26489d7dce74582e37 |
| .\" Author: Pavel Emelyanov <xemul@parallels.com> |
| .\" |
| .\" TCP_REPAIR_OPTIONS (3.5) |
| .\" commit b139ba4e90dccbf4cd4efb112af96a5c9e0b098c |
| .\" Author: Pavel Emelyanov <xemul@parallels.com> |
| .\" |
| .\" TCP_FASTOPEN (3.6) |
| .\" (Fast Open server side implementation completed in 3.7) |
| .\" http://lwn.net/Articles/508865/ |
| .\" |
| .\" TCP_TIMESTAMP (3.9) |
| .\" commit 93be6ce0e91b6a94783e012b1857a347a5e6e9f2 |
| .\" Author: Andrey Vagin <avagin@openvz.org> |
| .\" |
| .\" TCP_NOTSENT_LOWAT (3.12) |
| .\" commit c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 |
| .\" Author: Eric Dumazet <edumazet@google.com> |
| .\" |
| .\" TCP_CC_INFO (4.1) |
| .\" commit 6e9250f59ef9efb932c84850cd221f22c2a03c4a |
| .\" Author: Eric Dumazet <edumazet@google.com> |
| .\" |
| .\" TCP_SAVE_SYN, TCP_SAVED_SYN (4.2) |
| .\" commit cd8ae85299d54155702a56811b2e035e63064d3d |
| .\" Author: Eric Dumazet <edumazet@google.com> |
| .\" |
| .TH TCP 7 2021-03-22 "Linux" "Linux Programmer's Manual" |
| .SH NAME |
| tcp \- TCP protocol |
| .SH SYNOPSIS |
| .nf |
| .B #include <sys/socket.h> |
| .B #include <netinet/in.h> |
| .B #include <netinet/tcp.h> |
| .PP |
| .B tcp_socket = socket(AF_INET, SOCK_STREAM, 0); |
| .fi |
| .SH DESCRIPTION |
| This is an implementation of the TCP protocol defined in |
| RFC\ 793, RFC\ 1122 and RFC\ 2001 with the NewReno and SACK |
| extensions. |
| It provides a reliable, stream-oriented, |
| full-duplex connection between two sockets on top of |
| .BR ip (7), |
| for both v4 and v6 versions. |
| TCP guarantees that the data arrives in order and |
| retransmits lost packets. |
| It generates and checks a per-packet checksum to catch |
| transmission errors. |
| TCP does not preserve record boundaries. |
| .PP |
| A newly created TCP socket has no remote or local address and is not |
| fully specified. |
| To create an outgoing TCP connection use |
| .BR connect (2) |
| to establish a connection to another TCP socket. |
| To receive new incoming connections, first |
| .BR bind (2) |
| the socket to a local address and port and then call |
| .BR listen (2) |
| to put the socket into the listening state. |
| After that a new socket for each incoming connection can be accepted using |
| .BR accept (2). |
| A socket which has had |
| .BR accept (2) |
| or |
| .BR connect (2) |
| successfully called on it is fully specified and may transmit data. |
| Data cannot be transmitted on listening or not yet connected sockets. |
| .PP |
| Linux supports RFC\ 1323 TCP high performance |
| extensions. |
| These include Protection Against Wrapped |
| Sequence Numbers (PAWS), Window Scaling and Timestamps. |
| Window scaling allows the use |
| of large (> 64\ kB) TCP windows in order to support links with high |
| latency or bandwidth. |
| To make use of them, the send and receive buffer sizes must be increased. |
| They can be set globally with the |
| .I /proc/sys/net/ipv4/tcp_wmem |
| and |
| .I /proc/sys/net/ipv4/tcp_rmem |
| files, or on individual sockets by using the |
| .B SO_SNDBUF |
| and |
| .B SO_RCVBUF |
| socket options with the |
| .BR setsockopt (2) |
| call. |
| .PP |
| The maximum sizes for socket buffers declared via the |
| .B SO_SNDBUF |
| and |
| .B SO_RCVBUF |
| mechanisms are limited by the values in the |
| .I /proc/sys/net/core/rmem_max |
| and |
| .I /proc/sys/net/core/wmem_max |
| files. |
| Note that TCP actually allocates twice the size of |
| the buffer requested in the |
| .BR setsockopt (2) |
| call, and so a succeeding |
| .BR getsockopt (2) |
| call will not return the same size of buffer as requested in the |
| .BR setsockopt (2) |
| call. |
| TCP uses the extra space for administrative purposes and internal |
| kernel structures, and the |
| .I /proc |
| file values reflect the |
| larger sizes compared to the actual TCP windows. |
| On individual connections, the socket buffer size must be set prior to the |
| .BR listen (2) |
| or |
| .BR connect (2) |
| calls in order to have it take effect. |
| See |
| .BR socket (7) |
| for more information. |
| .PP |
| TCP supports urgent data. |
| Urgent data is used to signal the |
| receiver that some important message is part of the data |
| stream and that it should be processed as soon as possible. |
| To send urgent data specify the |
| .B MSG_OOB |
| option to |
| .BR send (2). |
| When urgent data is received, the kernel sends a |
| .B SIGURG |
| signal to the process or process group that has been set as the |
| socket "owner" using the |
| .B SIOCSPGRP |
| or |
| .B FIOSETOWN |
| ioctls (or the POSIX.1-specified |
| .BR fcntl (2) |
| .B F_SETOWN |
| operation). |
| When the |
| .B SO_OOBINLINE |
| socket option is enabled, urgent data is put into the normal |
| data stream (a program can test for its location using the |
| .B SIOCATMARK |
| ioctl described below), |
| otherwise it can be received only when the |
| .B MSG_OOB |
| flag is set for |
| .BR recv (2) |
| or |
| .BR recvmsg (2). |
| .PP |
| When out-of-band data is present, |
| .BR select (2) |
| indicates the file descriptor as having an exceptional condition and |
| .I poll (2) |
| indicates a |
| .B POLLPRI |
| event. |
| .PP |
| Linux 2.4 introduced a number of changes for improved |
| throughput and scaling, as well as enhanced functionality. |
| Some of these features include support for zero-copy |
| .BR sendfile (2), |
| Explicit Congestion Notification, new |
| management of TIME_WAIT sockets, keep-alive socket options |
| and support for Duplicate SACK extensions. |
| .SS Address formats |
| TCP is built on top of IP (see |
| .BR ip (7)). |
| The address formats defined by |
| .BR ip (7) |
| apply to TCP. |
| TCP supports point-to-point communication only; |
| broadcasting and multicasting are not |
| supported. |
| .SS /proc interfaces |
| System-wide TCP parameter settings can be accessed by files in the directory |
| .IR /proc/sys/net/ipv4/ . |
| In addition, most IP |
| .I /proc |
| interfaces also apply to TCP; see |
| .BR ip (7). |
| Variables described as |
| .I Boolean |
| take an integer value, with a nonzero value ("true") meaning that |
| the corresponding option is enabled, and a zero value ("false") |
| meaning that the option is disabled. |
| .TP |
| .IR tcp_abc " (Integer; default: 0; Linux 2.6.15 to Linux 3.8)" |
| .\" Since 2.6.15; removed in 3.9 |
| .\" commit ca2eb5679f8ddffff60156af42595df44a315ef0 |
| .\" The following is from 2.6.28-rc4: Documentation/networking/ip-sysctl.txt |
| Control the Appropriate Byte Count (ABC), defined in RFC 3465. |
| ABC is a way of increasing the congestion window |
| .RI ( cwnd ) |
| more slowly in response to partial acknowledgements. |
| Possible values are: |
| .RS |
| .IP 0 3 |
| increase |
| .I cwnd |
| once per acknowledgement (no ABC) |
| .IP 1 |
| increase |
| .I cwnd |
| once per acknowledgement of full sized segment |
| .IP 2 |
| allow increase |
| .I cwnd |
| by two if acknowledgement is |
| of two segments to compensate for delayed acknowledgements. |
| .RE |
| .TP |
| .IR tcp_abort_on_overflow " (Boolean; default: disabled; since Linux 2.4)" |
| .\" Since 2.3.41 |
| Enable resetting connections if the listening service is too |
| slow and unable to keep up and accept them. |
| It means that if overflow occurred due |
| to a burst, the connection will recover. |
| Enable this option |
| .I only |
| if you are really sure that the listening daemon |
| cannot be tuned to accept connections faster. |
| Enabling this option can harm the clients of your server. |
| .TP |
| .IR tcp_adv_win_scale " (integer; default: 2; since Linux 2.4)" |
| .\" Since 2.4.0-test7 |
| Count buffering overhead as |
| .IR "bytes/2^tcp_adv_win_scale" , |
| if |
| .I tcp_adv_win_scale |
| is greater than 0; or |
| .IR "bytes\-bytes/2^(\-tcp_adv_win_scale)" , |
| if |
| .I tcp_adv_win_scale |
| is less than or equal to zero. |
| .IP |
| The socket receive buffer space is shared between the |
| application and kernel. |
| TCP maintains part of the buffer as |
| the TCP window, this is the size of the receive window |
| advertised to the other end. |
| The rest of the space is used |
| as the "application" buffer, used to isolate the network |
| from scheduling and application latencies. |
| The |
| .I tcp_adv_win_scale |
| default value of 2 implies that the space |
| used for the application buffer is one fourth that of the total. |
| .TP |
| .IR tcp_allowed_congestion_control " (String; default: see text; since Linux 2.4.20)" |
| .\" The following is from 2.6.28-rc4: Documentation/networking/ip-sysctl.txt |
| Show/set the congestion control algorithm choices available to unprivileged |
| processes (see the description of the |
| .B TCP_CONGESTION |
| socket option). |
| The items in the list are separated by white space and |
| terminated by a newline character. |
| The list is a subset of those listed in |
| .IR tcp_available_congestion_control . |
| The default value for this list is "reno" plus the default setting of |
| .IR tcp_congestion_control . |
| .TP |
| .IR tcp_autocorking " (Boolean; default: enabled; since Linux 3.14)" |
| .\" commit f54b311142a92ea2e42598e347b84e1655caf8e3 |
| .\" Text heavily based on Documentation/networking/ip-sysctl.txt |
| If this option is enabled, the kernel tries to coalesce small writes |
| (from consecutive |
| .BR write (2) |
| and |
| .BR sendmsg (2) |
| calls) as much as possible, |
| in order to decrease the total number of sent packets. |
| Coalescing is done if at least one prior packet for the flow |
| is waiting in Qdisc queues or device transmit queue. |
| Applications can still use the |
| .B TCP_CORK |
| socket option to obtain optimal behavior |
| when they know how/when to uncork their sockets. |
| .TP |
| .IR tcp_available_congestion_control " (String; read-only; since Linux 2.4.20)" |
| .\" The following is from 2.6.28-rc4: Documentation/networking/ip-sysctl.txt |
| Show a list of the congestion-control algorithms |
| that are registered. |
| The items in the list are separated by white space and |
| terminated by a newline character. |
| This list is a limiting set for the list in |
| .IR tcp_allowed_congestion_control . |
| More congestion-control algorithms may be available as modules, |
| but not loaded. |
| .TP |
| .IR tcp_app_win " (integer; default: 31; since Linux 2.4)" |
| .\" Since 2.4.0-test7 |
| This variable defines how many |
| bytes of the TCP window are reserved for buffering overhead. |
| .IP |
| A maximum of (\fIwindow/2^tcp_app_win\fP, mss) bytes in the window |
| are reserved for the application buffer. |
| A value of 0 implies that no amount is reserved. |
| .\" |
| .\" The following is from 2.6.28-rc4: Documentation/networking/ip-sysctl.txt |
| .TP |
| .IR tcp_base_mss " (Integer; default: 512; since Linux 2.6.17)" |
| The initial value of |
| .I search_low |
| to be used by the packetization layer Path MTU discovery (MTU probing). |
| If MTU probing is enabled, |
| this is the initial MSS used by the connection. |
| .\" |
| .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt |
| .TP |
| .IR tcp_bic " (Boolean; default: disabled; Linux 2.4.27/2.6.6 to 2.6.13)" |
| Enable BIC TCP congestion control algorithm. |
| BIC-TCP is a sender-side-only change that ensures a linear RTT |
| fairness under large windows while offering both scalability and |
| bounded TCP-friendliness. |
| The protocol combines two schemes |
| called additive increase and binary search increase. |
| When the congestion window is large, additive increase with a large |
| increment ensures linear RTT fairness as well as good scalability. |
| Under small congestion windows, binary search |
| increase provides TCP friendliness. |
| .\" |
| .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt |
| .TP |
| .IR tcp_bic_low_window " (integer; default: 14; Linux 2.4.27/2.6.6 to 2.6.13)" |
| Set the threshold window (in packets) where BIC TCP starts to |
| adjust the congestion window. |
| Below this threshold BIC TCP behaves the same as the default TCP Reno. |
| .\" |
| .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt |
| .TP |
| .IR tcp_bic_fast_convergence " (Boolean; default: enabled; Linux 2.4.27/2.6.6 to 2.6.13)" |
| Force BIC TCP to more quickly respond to changes in congestion window. |
| Allows two flows sharing the same connection to converge more rapidly. |
| .TP |
| .IR tcp_congestion_control " (String; default: see text; since Linux 2.4.13)" |
| .\" The following is from 2.6.28-rc4: Documentation/networking/ip-sysctl.txt |
| Set the default congestion-control algorithm to be used for new connections. |
| The algorithm "reno" is always available, |
| but additional choices may be available depending on kernel configuration. |
| The default value for this file is set as part of kernel configuration. |
| .TP |
| .IR tcp_dma_copybreak " (integer; default: 4096; since Linux 2.6.24)" |
| Lower limit, in bytes, of the size of socket reads that will be |
| offloaded to a DMA copy engine, if one is present in the system |
| and the kernel was configured with the |
| .B CONFIG_NET_DMA |
| option. |
| .TP |
| .IR tcp_dsack " (Boolean; default: enabled; since Linux 2.4)" |
| .\" Since 2.4.0-test7 |
| Enable RFC\ 2883 TCP Duplicate SACK support. |
| .TP |
| .IR tcp_ecn " (Integer; default: see below; since Linux 2.4)" |
| .\" Since 2.4.0-test7 |
| Enable RFC\ 3168 Explicit Congestion Notification. |
| .IP |
| This file can have one of the following values: |
| .RS |
| .IP 0 |
| Disable ECN. |
| Neither initiate nor accept ECN. |
| This was the default up to and including Linux 2.6.30. |
| .IP 1 |
| Enable ECN when requested by incoming connections and also |
| request ECN on outgoing connection attempts. |
| .IP 2 |
| .\" commit 255cac91c3c9ce7dca7713b93ab03c75b7902e0e |
| Enable ECN when requested by incoming connections, |
| but do not request ECN on outgoing connections. |
| This value is supported, and is the default, since Linux 2.6.31. |
| .RE |
| .IP |
| When enabled, connectivity to some destinations could be affected |
| due to older, misbehaving middle boxes along the path, causing |
| connections to be dropped. |
| However, to facilitate and encourage deployment with option 1, and |
| to work around such buggy equipment, the |
| .B tcp_ecn_fallback |
| option has been introduced. |
| .TP |
| .IR tcp_ecn_fallback " (Boolean; default: enabled; since Linux 4.1)" |
| .\" commit 492135557dc090a1abb2cfbe1a412757e3ed68ab |
| Enable RFC\ 3168, Section 6.1.1.1. fallback. |
| When enabled, outgoing ECN-setup SYNs that time out within the |
| normal SYN retransmission timeout will be resent with CWR and |
| ECE cleared. |
| .TP |
| .IR tcp_fack " (Boolean; default: enabled; since Linux 2.2)" |
| .\" Since 2.1.92 |
| Enable TCP Forward Acknowledgement support. |
| .TP |
| .IR tcp_fin_timeout " (integer; default: 60; since Linux 2.2)" |
| .\" Since 2.1.53 |
| This specifies how many seconds to wait for a final FIN packet before the |
| socket is forcibly closed. |
| This is strictly a violation of the TCP specification, |
| but required to prevent denial-of-service attacks. |
| In Linux 2.2, the default value was 180. |
| .\" |
| .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt |
| .TP |
| .IR tcp_frto " (integer; default: see below; since Linux 2.4.21/2.6)" |
| .\" Since 2.4.21/2.5.43 |
| Enable F-RTO, an enhanced recovery algorithm for TCP retransmission |
| timeouts (RTOs). |
| It is particularly beneficial in wireless environments |
| where packet loss is typically due to random radio interference |
| rather than intermediate router congestion. |
| See RFC 4138 for more details. |
| .IP |
| This file can have one of the following values: |
| .RS |
| .IP 0 3 |
| Disabled. |
| This was the default up to and including Linux 2.6.23. |
| .IP 1 |
| The basic version F-RTO algorithm is enabled. |
| .IP 2 |
| .\" commit c96fd3d461fa495400df24be3b3b66f0e0b152f9 |
| Enable SACK-enhanced F-RTO if flow uses SACK. |
| The basic version can be used also when |
| SACK is in use though in that case scenario(s) exists where F-RTO |
| interacts badly with the packet counting of the SACK-enabled TCP flow. |
| This value is the default since Linux 2.6.24. |
| .RE |
| .IP |
| Before Linux 2.6.22, this parameter was a Boolean value, |
| supporting just values 0 and 1 above. |
| .TP |
| .IR tcp_frto_response " (integer; default: 0; since Linux 2.6.22)" |
| When F-RTO has detected that a TCP retransmission timeout was spurious |
| (i.e., the timeout would have been avoided had TCP set a |
| longer retransmission timeout), |
| TCP has several options concerning what to do next. |
| Possible values are: |
| .RS |
| .IP 0 3 |
| Rate halving based; a smooth and conservative response, |
| results in halved congestion window |
| .RI ( cwnd ) |
| and slow-start threshold |
| .RI ( ssthresh ) |
| after one RTT. |
| .IP 1 |
| Very conservative response; not recommended because even |
| though being valid, it interacts poorly with the rest of Linux TCP; halves |
| .I cwnd |
| and |
| .I ssthresh |
| immediately. |
| .IP 2 |
| Aggressive response; undoes congestion-control measures |
| that are now known to be unnecessary |
| (ignoring the possibility of a lost retransmission that would require |
| TCP to be more cautious); |
| .I cwnd |
| and |
| .I ssthresh |
| are restored to the values prior to timeout. |
| .RE |
| .TP |
| .IR tcp_keepalive_intvl " (integer; default: 75; since Linux 2.4)" |
| .\" Since 2.3.18 |
| The number of seconds between TCP keep-alive probes. |
| .TP |
| .IR tcp_keepalive_probes " (integer; default: 9; since Linux 2.2)" |
| .\" Since 2.1.43 |
| The maximum number of TCP keep-alive probes to send |
| before giving up and killing the connection if |
| no response is obtained from the other end. |
| .TP |
| .IR tcp_keepalive_time " (integer; default: 7200; since Linux 2.2)" |
| .\" Since 2.1.43 |
| The number of seconds a connection needs to be idle |
| before TCP begins sending out keep-alive probes. |
| Keep-alives are sent only when the |
| .B SO_KEEPALIVE |
| socket option is enabled. |
| The default value is 7200 seconds (2 hours). |
| An idle connection is terminated after |
| approximately an additional 11 minutes (9 probes an interval |
| of 75 seconds apart) when keep-alive is enabled. |
| .IP |
| Note that underlying connection tracking mechanisms and |
| application timeouts may be much shorter. |
| .\" |
| .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt |
| .TP |
| .IR tcp_low_latency " (Boolean; default: disabled; since Linux 2.4.21/2.6; \ |
| obsolete since Linux 4.14)" |
| .\" Since 2.4.21/2.5.60 |
| If enabled, the TCP stack makes decisions that prefer lower |
| latency as opposed to higher throughput. |
| It this option is disabled, then higher throughput is preferred. |
| An example of an application where this default should be |
| changed would be a Beowulf compute cluster. |
| Since Linux 4.14, |
| .\" commit b6690b14386698ce2c19309abad3f17656bdfaea |
| this file still exists, but its value is ignored. |
| .TP |
| .IR tcp_max_orphans " (integer; default: see below; since Linux 2.4)" |
| .\" Since 2.3.41 |
| The maximum number of orphaned (not attached to any user file |
| handle) TCP sockets allowed in the system. |
| When this number is exceeded, |
| the orphaned connection is reset and a warning is printed. |
| This limit exists only to prevent simple denial-of-service attacks. |
| Lowering this limit is not recommended. |
| Network conditions might require you to increase the number of |
| orphans allowed, but note that each orphan can eat up to \(ti64\ kB |
| of unswappable memory. |
| The default initial value is set equal to the kernel parameter NR_FILE. |
| This initial default is adjusted depending on the memory in the system. |
| .TP |
| .IR tcp_max_syn_backlog " (integer; default: see below; since Linux 2.2)" |
| .\" Since 2.1.53 |
| The maximum number of queued connection requests which have |
| still not received an acknowledgement from the connecting client. |
| If this number is exceeded, the kernel will begin |
| dropping requests. |
| The default value of 256 is increased to |
| 1024 when the memory present in the system is adequate or |
| greater (>= 128\ MB), and reduced to 128 for those systems with |
| very low memory (<= 32\ MB). |
| .IP |
| Prior to Linux 2.6.20, |
| .\" commit 72a3effaf633bcae9034b7e176bdbd78d64a71db |
| it was recommended that if this needed to be increased above 1024, |
| the size of the SYNACK hash table |
| .RB ( TCP_SYNQ_HSIZE ) |
| in |
| .I include/net/tcp.h |
| should be modified to keep |
| .IP |
| TCP_SYNQ_HSIZE * 16 <= tcp_max_syn_backlog |
| .IP |
| and the kernel should be |
| recompiled. |
| In Linux 2.6.20, the fixed sized |
| .B TCP_SYNQ_HSIZE |
| was removed in favor of dynamic sizing. |
| .TP |
| .IR tcp_max_tw_buckets " (integer; default: see below; since Linux 2.4)" |
| .\" Since 2.3.41 |
| The maximum number of sockets in TIME_WAIT state allowed in |
| the system. |
| This limit exists only to prevent simple denial-of-service attacks. |
| The default value of NR_FILE*2 is adjusted |
| depending on the memory in the system. |
| If this number is |
| exceeded, the socket is closed and a warning is printed. |
| .TP |
| .IR tcp_moderate_rcvbuf " (Boolean; default: enabled; since Linux 2.4.17/2.6.7)" |
| .\" The following is from 2.6.28-rc4: Documentation/networking/ip-sysctl.txt |
| If enabled, TCP performs receive buffer auto-tuning, |
| attempting to automatically size the buffer (no greater than |
| .IR tcp_rmem[2] ) |
| to match the size required by the path for full throughput. |
| .TP |
| .IR tcp_mem " (since Linux 2.4)" |
| .\" Since 2.4.0-test7 |
| This is a vector of 3 integers: [low, pressure, high]. |
| These bounds, measured in units of the system page size, |
| are used by TCP to track its memory usage. |
| The defaults are calculated at boot time from the amount of |
| available memory. |
| (TCP can only use |
| .I "low memory" |
| for this, which is limited to around 900 megabytes on 32-bit systems. |
| 64-bit systems do not suffer this limitation.) |
| .RS |
| .TP |
| .I low |
| TCP doesn't regulate its memory allocation when the number |
| of pages it has allocated globally is below this number. |
| .TP |
| .I pressure |
| When the amount of memory allocated by TCP |
| exceeds this number of pages, TCP moderates its memory consumption. |
| This memory pressure state is exited |
| once the number of pages allocated falls below |
| the |
| .I low |
| mark. |
| .TP |
| .I high |
| The maximum number of pages, globally, that TCP will allocate. |
| This value overrides any other limits imposed by the kernel. |
| .RE |
| .TP |
| .IR tcp_mtu_probing " (integer; default: 0; since Linux 2.6.17)" |
| .\" The following is from 2.6.28-rc4: Documentation/networking/ip-sysctl.txt |
| This parameter controls TCP Packetization-Layer Path MTU Discovery. |
| The following values may be assigned to the file: |
| .RS |
| .IP 0 3 |
| Disabled |
| .IP 1 |
| Disabled by default, enabled when an ICMP black hole detected |
| .IP 2 |
| Always enabled, use initial MSS of |
| .IR tcp_base_mss . |
| .RE |
| .TP |
| .IR tcp_no_metrics_save " (Boolean; default: disabled; since Linux 2.6.6)" |
| .\" The following is from 2.6.28-rc4: Documentation/networking/ip-sysctl.txt |
| By default, TCP saves various connection metrics in the route cache |
| when the connection closes, so that connections established in the |
| near future can use these to set initial conditions. |
| Usually, this increases overall performance, |
| but it may sometimes cause performance degradation. |
| If |
| .I tcp_no_metrics_save |
| is enabled, TCP will not cache metrics on closing connections. |
| .TP |
| .IR tcp_orphan_retries " (integer; default: 8; since Linux 2.4)" |
| .\" Since 2.3.41 |
| The maximum number of attempts made to probe the other |
| end of a connection which has been closed by our end. |
| .TP |
| .IR tcp_reordering " (integer; default: 3; since Linux 2.4)" |
| .\" Since 2.4.0-test7 |
| The maximum a packet can be reordered in a TCP packet stream |
| without TCP assuming packet loss and going into slow start. |
| It is not advisable to change this number. |
| This is a packet reordering detection metric designed to |
| minimize unnecessary back off and retransmits provoked by |
| reordering of packets on a connection. |
| .TP |
| .IR tcp_retrans_collapse " (Boolean; default: enabled; since Linux 2.2)" |
| .\" Since 2.1.96 |
| Try to send full-sized packets during retransmit. |
| .TP |
| .IR tcp_retries1 " (integer; default: 3; since Linux 2.2)" |
| .\" Since 2.1.43 |
| The number of times TCP will attempt to retransmit a |
| packet on an established connection normally, |
| without the extra effort of getting the network layers involved. |
| Once we exceed this number of |
| retransmits, we first have the network layer |
| update the route if possible before each new retransmit. |
| The default is the RFC specified minimum of 3. |
| .TP |
| .IR tcp_retries2 " (integer; default: 15; since Linux 2.2)" |
| .\" Since 2.1.43 |
| The maximum number of times a TCP packet is retransmitted |
| in established state before giving up. |
| The default value is 15, which corresponds to a duration of |
| approximately between 13 to 30 minutes, depending |
| on the retransmission timeout. |
| The RFC\ 1122 specified |
| minimum limit of 100 seconds is typically deemed too short. |
| .TP |
| .IR tcp_rfc1337 " (Boolean; default: disabled; since Linux 2.2)" |
| .\" Since 2.1.90 |
| Enable TCP behavior conformant with RFC\ 1337. |
| When disabled, |
| if a RST is received in TIME_WAIT state, we close |
| the socket immediately without waiting for the end |
| of the TIME_WAIT period. |
| .TP |
| .IR tcp_rmem " (since Linux 2.4)" |
| .\" Since 2.4.0-test7 |
| This is a vector of 3 integers: [min, default, max]. |
| These parameters are used by TCP to regulate receive buffer sizes. |
| TCP dynamically adjusts the size of the |
| receive buffer from the defaults listed below, in the range |
| of these values, depending on memory available in the system. |
| .RS |
| .TP |
| .I min |
| minimum size of the receive buffer used by each TCP socket. |
| The default value is the system page size. |
| (On Linux 2.4, the default value is 4\ kB, lowered to |
| .B PAGE_SIZE |
| bytes in low-memory systems.) |
| This value |
| is used to ensure that in memory pressure mode, |
| allocations below this size will still succeed. |
| This is not |
| used to bound the size of the receive buffer declared |
| using |
| .B SO_RCVBUF |
| on a socket. |
| .TP |
| .I default |
| the default size of the receive buffer for a TCP socket. |
| This value overwrites the initial default buffer size from |
| the generic global |
| .I net.core.rmem_default |
| defined for all protocols. |
| The default value is 87380 bytes. |
| (On Linux 2.4, this will be lowered to 43689 in low-memory systems.) |
| If larger receive buffer sizes are desired, this value should |
| be increased (to affect all sockets). |
| To employ large TCP windows, the |
| .I net.ipv4.tcp_window_scaling |
| must be enabled (default). |
| .TP |
| .I max |
| the maximum size of the receive buffer used by each TCP socket. |
| This value does not override the global |
| .IR net.core.rmem_max . |
| This is not used to limit the size of the receive buffer declared using |
| .B SO_RCVBUF |
| on a socket. |
| The default value is calculated using the formula |
| .IP |
| max(87380, min(4\ MB, \fItcp_mem\fP[1]*PAGE_SIZE/128)) |
| .IP |
| (On Linux 2.4, the default is 87380*2 bytes, |
| lowered to 87380 in low-memory systems). |
| .RE |
| .TP |
| .IR tcp_sack " (Boolean; default: enabled; since Linux 2.2)" |
| .\" Since 2.1.36 |
| Enable RFC\ 2018 TCP Selective Acknowledgements. |
| .TP |
| .IR tcp_slow_start_after_idle " (Boolean; default: enabled; since Linux 2.6.18)" |
| .\" The following is from 2.6.28-rc4: Documentation/networking/ip-sysctl.txt |
| If enabled, provide RFC 2861 behavior and time out the congestion |
| window after an idle period. |
| An idle period is defined as the current RTO (retransmission timeout). |
| If disabled, the congestion window will not |
| be timed out after an idle period. |
| .TP |
| .IR tcp_stdurg " (Boolean; default: disabled; since Linux 2.2)" |
| .\" Since 2.1.44 |
| If this option is enabled, then use the RFC\ 1122 interpretation |
| of the TCP urgent-pointer field. |
| .\" RFC 793 was ambiguous in its specification of the meaning of the |
| .\" urgent pointer. RFC 1122 (and RFC 961) fixed on a particular |
| .\" resolution of this ambiguity (unfortunately the "wrong" one). |
| According to this interpretation, the urgent pointer points |
| to the last byte of urgent data. |
| If this option is disabled, then use the BSD-compatible interpretation of |
| the urgent pointer: |
| the urgent pointer points to the first byte after the urgent data. |
| Enabling this option may lead to interoperability problems. |
| .TP |
| .IR tcp_syn_retries " (integer; default: 6; since Linux 2.2)" |
| .\" Since 2.1.38 |
| The maximum number of times initial SYNs for an active TCP |
| connection attempt will be retransmitted. |
| This value should not be higher than 255. |
| The default value is 6, which corresponds to retrying for up to |
| approximately 127 seconds. |
| Before Linux 3.7, |
| .\" commit 6c9ff979d1921e9fd05d89e1383121c2503759b9 |
| the default value was 5, which |
| (in conjunction with calculation based on other kernel parameters) |
| corresponded to approximately 180 seconds. |
| .TP |
| .IR tcp_synack_retries " (integer; default: 5; since Linux 2.2)" |
| .\" Since 2.1.38 |
| The maximum number of times a SYN/ACK segment |
| for a passive TCP connection will be retransmitted. |
| This number should not be higher than 255. |
| .TP |
| .IR tcp_syncookies " (integer; default: 1; since Linux 2.2)" |
| .\" Since 2.1.43 |
| Enable TCP syncookies. |
| The kernel must be compiled with |
| .BR CONFIG_SYN_COOKIES . |
| The syncookies feature attempts to protect a |
| socket from a SYN flood attack. |
| This should be used as a last resort, if at all. |
| This is a violation of the TCP protocol, |
| and conflicts with other areas of TCP such as TCP extensions. |
| It can cause problems for clients and relays. |
| It is not recommended as a tuning mechanism for heavily |
| loaded servers to help with overloaded or misconfigured conditions. |
| For recommended alternatives see |
| .IR tcp_max_syn_backlog , |
| .IR tcp_synack_retries , |
| and |
| .IR tcp_abort_on_overflow . |
| Set to one of the following values: |
| .RS |
| .IP 0 3 |
| Disable TCP syncookies. |
| .IP 1 |
| Send out syncookies when the syn backlog queue of a socket overflows. |
| .IP 2 |
| (since Linux 3.12) |
| .\" commit 5ad37d5deee1ff7150a2d0602370101de158ad86 |
| Send out syncookies unconditionally. |
| This can be useful for network testing. |
| .RE |
| .TP |
| .IR tcp_timestamps " (integer; default: 1; since Linux 2.2)" |
| .\" Since 2.1.36 |
| Set to one of the following values to enable or disable RFC\ 1323 |
| TCP timestamps: |
| .RS |
| .IP 0 3 |
| Disable timestamps. |
| .IP 1 |
| Enable timestamps as defined in RFC1323 and use random offset for |
| each connection rather than only using the current time. |
| .IP 2 |
| As for the value 1, but without random offsets. |
| .\" commit 25429d7b7dca01dc4f17205de023a30ca09390d0 |
| Setting |
| .I tcp_timestamps |
| to this value is meaningful since Linux 4.10. |
| .RE |
| .TP |
| .IR tcp_tso_win_divisor " (integer; default: 3; since Linux 2.6.9)" |
| This parameter controls what percentage of the congestion window |
| can be consumed by a single TCP Segmentation Offload (TSO) frame. |
| The setting of this parameter is a tradeoff between burstiness and |
| building larger TSO frames. |
| .TP |
| .IR tcp_tw_recycle " (Boolean; default: disabled; Linux 2.4 to 4.11)" |
| .\" Since 2.3.15 |
| .\" removed in 4.12; commit 4396e46187ca5070219b81773c4e65088dac50cc |
| Enable fast recycling of TIME_WAIT sockets. |
| Enabling this option is |
| not recommended as the remote IP may not use monotonically increasing |
| timestamps (devices behind NAT, devices with per-connection timestamp |
| offsets). |
| See RFC 1323 (PAWS) and RFC 6191. |
| .\" |
| .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt |
| .TP |
| .IR tcp_tw_reuse " (Boolean; default: disabled; since Linux 2.4.19/2.6)" |
| .\" Since 2.4.19/2.5.43 |
| Allow to reuse TIME_WAIT sockets for new connections when it is |
| safe from protocol viewpoint. |
| It should not be changed without advice/request of technical experts. |
| .\" |
| .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt |
| .TP |
| .IR tcp_vegas_cong_avoid " (Boolean; default: disabled; Linux 2.2 to 2.6.13)" |
| .\" Since 2.1.8; removed in 2.6.13 |
| Enable TCP Vegas congestion avoidance algorithm. |
| TCP Vegas is a sender-side-only change to TCP that anticipates |
| the onset of congestion by estimating the bandwidth. |
| TCP Vegas adjusts the sending rate by modifying the congestion window. |
| TCP Vegas should provide less packet loss, but it is |
| not as aggressive as TCP Reno. |
| .\" |
| .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt |
| .TP |
| .IR tcp_westwood " (Boolean; default: disabled; Linux 2.4.26/2.6.3 to 2.6.13)" |
| Enable TCP Westwood+ congestion control algorithm. |
| TCP Westwood+ is a sender-side-only modification of the TCP Reno |
| protocol stack that optimizes the performance of TCP congestion control. |
| It is based on end-to-end bandwidth estimation to set |
| congestion window and slow start threshold after a congestion episode. |
| Using this estimation, TCP Westwood+ adaptively sets a |
| slow start threshold and a congestion window which takes into |
| account the bandwidth used at the time congestion is experienced. |
| TCP Westwood+ significantly increases fairness with respect to |
| TCP Reno in wired networks and throughput over wireless links. |
| .TP |
| .IR tcp_window_scaling " (Boolean; default: enabled; since Linux 2.2)" |
| .\" Since 2.1.36 |
| Enable RFC\ 1323 TCP window scaling. |
| This feature allows the use of a large window |
| (> 64\ kB) on a TCP connection, should the other end support it. |
| Normally, the 16 bit window length field in the TCP header |
| limits the window size to less than 64\ kB. |
| If larger windows are desired, applications can increase the size of |
| their socket buffers and the window scaling option will be employed. |
| If |
| .I tcp_window_scaling |
| is disabled, TCP will not negotiate the use of window |
| scaling with the other end during connection setup. |
| .TP |
| .IR tcp_wmem " (since Linux 2.4)" |
| .\" Since 2.4.0-test7 |
| This is a vector of 3 integers: [min, default, max]. |
| These parameters are used by TCP to regulate send buffer sizes. |
| TCP dynamically adjusts the size of the send buffer from the |
| default values listed below, in the range of these values, |
| depending on memory available. |
| .RS |
| .TP |
| .I min |
| Minimum size of the send buffer used by each TCP socket. |
| The default value is the system page size. |
| (On Linux 2.4, the default value is 4\ kB.) |
| This value is used to ensure that in memory pressure mode, |
| allocations below this size will still succeed. |
| This is not used to bound the size of the send buffer declared using |
| .B SO_SNDBUF |
| on a socket. |
| .TP |
| .I default |
| The default size of the send buffer for a TCP socket. |
| This value overwrites the initial default buffer size from |
| the generic global |
| .I /proc/sys/net/core/wmem_default |
| defined for all protocols. |
| The default value is 16\ kB. |
| .\" True in Linux 2.4 and 2.6 |
| If larger send buffer sizes are desired, this value |
| should be increased (to affect all sockets). |
| To employ large TCP windows, the |
| .I /proc/sys/net/ipv4/tcp_window_scaling |
| must be set to a nonzero value (default). |
| .TP |
| .I max |
| The maximum size of the send buffer used by each TCP socket. |
| This value does not override the value in |
| .IR /proc/sys/net/core/wmem_max . |
| This is not used to limit the size of the send buffer declared using |
| .B SO_SNDBUF |
| on a socket. |
| The default value is calculated using the formula |
| .IP |
| max(65536, min(4\ MB, \fItcp_mem\fP[1]*PAGE_SIZE/128)) |
| .IP |
| (On Linux 2.4, the default value is 128\ kB, |
| lowered 64\ kB depending on low-memory systems.) |
| .RE |
| .TP |
| .IR tcp_workaround_signed_windows " (Boolean; default: disabled; since Linux 2.6.26)" |
| If enabled, assume that no receipt of a window-scaling option means that the |
| remote TCP is broken and treats the window as a signed quantity. |
| If disabled, assume that the remote TCP is not broken even if we do |
| not receive a window scaling option from it. |
| .SS Socket options |
| To set or get a TCP socket option, call |
| .BR getsockopt (2) |
| to read or |
| .BR setsockopt (2) |
| to write the option with the option level argument set to |
| .BR IPPROTO_TCP . |
| Unless otherwise noted, |
| .I optval |
| is a pointer to an |
| .IR int . |
| .\" or SOL_TCP on Linux |
| In addition, |
| most |
| .B IPPROTO_IP |
| socket options are valid on TCP sockets. |
| For more information see |
| .BR ip (7). |
| .PP |
| Following is a list of TCP-specific socket options. |
| For details of some other socket options that are also applicable |
| for TCP sockets, see |
| .BR socket (7). |
| .TP |
| .BR TCP_CONGESTION " (since Linux 2.6.13)" |
| .\" commit 5f8ef48d240963093451bcf83df89f1a1364f51d |
| .\" Author: Stephen Hemminger <shemminger@osdl.org> |
| The argument for this option is a string. |
| This option allows the caller to set the TCP congestion control |
| algorithm to be used, on a per-socket basis. |
| Unprivileged processes are restricted to choosing one of the algorithms in |
| .IR tcp_allowed_congestion_control |
| (described above). |
| Privileged processes |
| .RB ( CAP_NET_ADMIN ) |
| can choose from any of the available congestion-control algorithms |
| (see the description of |
| .IR tcp_available_congestion_control |
| above). |
| .TP |
| .BR TCP_CORK " (since Linux 2.2)" |
| .\" precisely: since 2.1.127 |
| If set, don't send out partial frames. |
| All queued partial frames are sent when the option is cleared again. |
| This is useful for prepending headers before calling |
| .BR sendfile (2), |
| or for throughput optimization. |
| As currently implemented, there is a 200 millisecond ceiling on the time |
| for which output is corked by |
| .BR TCP_CORK . |
| If this ceiling is reached, then queued data is automatically transmitted. |
| This option can be combined with |
| .B TCP_NODELAY |
| only since Linux 2.5.71. |
| This option should not be used in code intended to be portable. |
| .TP |
| .BR TCP_DEFER_ACCEPT " (since Linux 2.4)" |
| .\" Precisely: since 2.3.38 |
| .\" Useful references: |
| .\" http://www.techrepublic.com/article/take-advantage-of-tcp-ip-options-to-optimize-data-transmission/ |
| .\" http://unix.stackexchange.com/questions/94104/real-world-use-of-tcp-defer-accept |
| Allow a listener to be awakened only when data arrives on the socket. |
| Takes an integer value (seconds), this can |
| bound the maximum number of attempts TCP will make to |
| complete the connection. |
| This option should not be used in code intended to be portable. |
| .TP |
| .BR TCP_INFO " (since Linux 2.4)" |
| Used to collect information about this socket. |
| The kernel returns a \fIstruct tcp_info\fP as defined in the file |
| .IR /usr/include/linux/tcp.h . |
| This option should not be used in code intended to be portable. |
| .TP |
| .BR TCP_KEEPCNT " (since Linux 2.4)" |
| .\" Precisely: since 2.3.18 |
| The maximum number of keepalive probes TCP should send |
| before dropping the connection. |
| This option should not be |
| used in code intended to be portable. |
| .TP |
| .BR TCP_KEEPIDLE " (since Linux 2.4)" |
| .\" Precisely: since 2.3.18 |
| The time (in seconds) the connection needs to remain idle |
| before TCP starts sending keepalive probes, if the socket |
| option |
| .B SO_KEEPALIVE |
| has been set on this socket. |
| This option should not be used in code intended to be portable. |
| .TP |
| .BR TCP_KEEPINTVL " (since Linux 2.4)" |
| .\" Precisely: since 2.3.18 |
| The time (in seconds) between individual keepalive probes. |
| This option should not be used in code intended to be portable. |
| .TP |
| .BR TCP_LINGER2 " (since Linux 2.4)" |
| .\" Precisely: since 2.3.41 |
| The lifetime of orphaned FIN_WAIT2 state sockets. |
| This option can be used to override the system-wide setting in the file |
| .I /proc/sys/net/ipv4/tcp_fin_timeout |
| for this socket. |
| This is not to be confused with the |
| .BR socket (7) |
| level option |
| .BR SO_LINGER . |
| This option should not be used in code intended to be portable. |
| .TP |
| .B TCP_MAXSEG |
| .\" Present in Linux 1.0 |
| The maximum segment size for outgoing TCP packets. |
| In Linux 2.2 and earlier, and in Linux 2.6.28 and later, |
| if this option is set before connection establishment, it also |
| changes the MSS value announced to the other end in the initial packet. |
| Values greater than the (eventual) interface MTU have no effect. |
| TCP will also impose |
| its minimum and maximum bounds over the value provided. |
| .TP |
| .B TCP_NODELAY |
| .\" Present in Linux 1.0 |
| If set, disable the Nagle algorithm. |
| This means that segments |
| are always sent as soon as possible, even if there is only a |
| small amount of data. |
| When not set, data is buffered until there |
| is a sufficient amount to send out, thereby avoiding the |
| frequent sending of small packets, which results in poor |
| utilization of the network. |
| This option is overridden by |
| .BR TCP_CORK ; |
| however, setting this option forces an explicit flush of |
| pending output, even if |
| .B TCP_CORK |
| is currently set. |
| .TP |
| .BR TCP_QUICKACK " (since Linux 2.4.4)" |
| Enable quickack mode if set or disable quickack |
| mode if cleared. |
| In quickack mode, acks are sent |
| immediately, rather than delayed if needed in accordance |
| to normal TCP operation. |
| This flag is not permanent, |
| it only enables a switch to or from quickack mode. |
| Subsequent operation of the TCP protocol will |
| once again enter/leave quickack mode depending on |
| internal protocol processing and factors such as |
| delayed ack timeouts occurring and data transfer. |
| This option should not be used in code intended to be |
| portable. |
| .TP |
| .BR TCP_SYNCNT " (since Linux 2.4)" |
| .\" Precisely: since 2.3.18 |
| Set the number of SYN retransmits that TCP should send before |
| aborting the attempt to connect. |
| It cannot exceed 255. |
| This option should not be used in code intended to be portable. |
| .TP |
| .BR TCP_USER_TIMEOUT " (since Linux 2.6.37)" |
| .\" commit dca43c75e7e545694a9dd6288553f55c53e2a3a3 |
| .\" Author: Jerry Chu <hkchu@google.com> |
| .\" The following text taken nearly verbatim from Jerry Chu's (excellent) |
| .\" commit message. |
| .\" |
| This option takes an |
| .IR "unsigned int" |
| as an argument. |
| When the value is greater than 0, |
| it specifies the maximum amount of time in milliseconds that transmitted |
| data may remain unacknowledged, or bufferred data may remain untransmitted |
| (due to zero window size) before TCP will forcibly close the |
| corresponding connection and return |
| .B ETIMEDOUT |
| to the application. |
| If the option value is specified as 0, |
| TCP will use the system default. |
| .IP |
| Increasing user timeouts allows a TCP connection to survive extended |
| periods without end-to-end connectivity. |
| Decreasing user timeouts |
| allows applications to "fail fast", if so desired. |
| Otherwise, failure may take up to 20 minutes with |
| the current system defaults in a normal WAN environment. |
| .IP |
| This option can be set during any state of a TCP connection, |
| but is effective only during the synchronized states of a connection |
| (ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, and LAST-ACK). |
| Moreover, when used with the TCP keepalive |
| .RB ( SO_KEEPALIVE ) |
| option, |
| .B TCP_USER_TIMEOUT |
| will override keepalive to determine when to close a |
| connection due to keepalive failure. |
| .IP |
| The option has no effect on when TCP retransmits a packet, |
| nor when a keepalive probe is sent. |
| .IP |
| This option, like many others, will be inherited by the socket returned by |
| .BR accept (2), |
| if it was set on the listening socket. |
| .IP |
| Further details on the user timeout feature can be found in |
| RFC\ 793 and RFC\ 5482 ("TCP User Timeout Option"). |
| .TP |
| .BR TCP_WINDOW_CLAMP " (since Linux 2.4)" |
| .\" Precisely: since 2.3.41 |
| Bound the size of the advertised window to this value. |
| The kernel imposes a minimum size of SOCK_MIN_RCVBUF/2. |
| This option should not be used in code intended to be |
| portable. |
| .SS Sockets API |
| TCP provides limited support for out-of-band data, |
| in the form of (a single byte of) urgent data. |
| In Linux this means if the other end sends newer out-of-band |
| data the older urgent data is inserted as normal data into |
| the stream (even when |
| .B SO_OOBINLINE |
| is not set). |
| This differs from BSD-based stacks. |
| .PP |
| Linux uses the BSD compatible interpretation of the urgent |
| pointer field by default. |
| This violates RFC\ 1122, but is |
| required for interoperability with other stacks. |
| It can be changed via |
| .IR /proc/sys/net/ipv4/tcp_stdurg . |
| .PP |
| It is possible to peek at out-of-band data using the |
| .BR recv (2) |
| .B MSG_PEEK |
| flag. |
| .PP |
| Since version 2.4, Linux supports the use of |
| .B MSG_TRUNC |
| in the |
| .I flags |
| argument of |
| .BR recv (2) |
| (and |
| .BR recvmsg (2)). |
| This flag causes the received bytes of data to be discarded, |
| rather than passed back in a caller-supplied buffer. |
| Since Linux 2.4.4, |
| .BR MSG_TRUNC |
| also has this effect when used in conjunction with |
| .BR MSG_OOB |
| to receive out-of-band data. |
| .SS Ioctls |
| The following |
| .BR ioctl (2) |
| calls return information in |
| .IR value . |
| The correct syntax is: |
| .PP |
| .RS |
| .nf |
| .BI int " value"; |
| .IB error " = ioctl(" tcp_socket ", " ioctl_type ", &" value ");" |
| .fi |
| .RE |
| .PP |
| .I ioctl_type |
| is one of the following: |
| .TP |
| .B SIOCINQ |
| Returns the amount of queued unread data in the receive buffer. |
| The socket must not be in LISTEN state, otherwise an error |
| .RB ( EINVAL ) |
| is returned. |
| .B SIOCINQ |
| is defined in |
| .IR <linux/sockios.h> . |
| .\" FIXME http://sources.redhat.com/bugzilla/show_bug.cgi?id=12002, |
| .\" filed 2010-09-10, may cause SIOCINQ to be defined in glibc headers |
| Alternatively, |
| you can use the synonymous |
| .BR FIONREAD , |
| defined in |
| .IR <sys/ioctl.h> . |
| .TP |
| .B SIOCATMARK |
| Returns true (i.e., |
| .I value |
| is nonzero) if the inbound data stream is at the urgent mark. |
| .IP |
| If the |
| .B SO_OOBINLINE |
| socket option is set, and |
| .B SIOCATMARK |
| returns true, then the |
| next read from the socket will return the urgent data. |
| If the |
| .B SO_OOBINLINE |
| socket option is not set, and |
| .B SIOCATMARK |
| returns true, then the |
| next read from the socket will return the bytes following |
| the urgent data (to actually read the urgent data requires the |
| .B recv(MSG_OOB) |
| flag). |
| .IP |
| Note that a read never reads across the urgent mark. |
| If an application is informed of the presence of urgent data via |
| .BR select (2) |
| (using the |
| .I exceptfds |
| argument) or through delivery of a |
| .B SIGURG |
| signal, |
| then it can advance up to the mark using a loop which repeatedly tests |
| .B SIOCATMARK |
| and performs a read (requesting any number of bytes) as long as |
| .B SIOCATMARK |
| returns false. |
| .TP |
| .B SIOCOUTQ |
| Returns the amount of unsent data in the socket send queue. |
| The socket must not be in LISTEN state, otherwise an error |
| .RB ( EINVAL ) |
| is returned. |
| .B SIOCOUTQ |
| is defined in |
| .IR <linux/sockios.h> . |
| .\" FIXME . http://sources.redhat.com/bugzilla/show_bug.cgi?id=12002, |
| .\" filed 2010-09-10, may cause SIOCOUTQ to be defined in glibc headers |
| Alternatively, |
| you can use the synonymous |
| .BR TIOCOUTQ , |
| defined in |
| .IR <sys/ioctl.h> . |
| .SS Error handling |
| When a network error occurs, TCP tries to resend the packet. |
| If it doesn't succeed after some time, either |
| .B ETIMEDOUT |
| or the last received error on this connection is reported. |
| .PP |
| Some applications require a quicker error notification. |
| This can be enabled with the |
| .B IPPROTO_IP |
| level |
| .B IP_RECVERR |
| socket option. |
| When this option is enabled, all incoming |
| errors are immediately passed to the user program. |
| Use this option with care \(em it makes TCP less tolerant to routing |
| changes and other normal network conditions. |
| .SH ERRORS |
| .TP |
| .B EAFNOTSUPPORT |
| Passed socket address type in |
| .I sin_family |
| was not |
| .BR AF_INET . |
| .TP |
| .B EPIPE |
| The other end closed the socket unexpectedly or a read is |
| executed on a shut down socket. |
| .TP |
| .B ETIMEDOUT |
| The other end didn't acknowledge retransmitted data after some time. |
| .PP |
| Any errors defined for |
| .BR ip (7) |
| or the generic socket layer may also be returned for TCP. |
| .SH VERSIONS |
| Support for Explicit Congestion Notification, zero-copy |
| .BR sendfile (2), |
| reordering support and some SACK extensions |
| (DSACK) were introduced in 2.4. |
| Support for forward acknowledgement (FACK), TIME_WAIT recycling, |
| and per-connection keepalive socket options were introduced in 2.3. |
| .SH BUGS |
| Not all errors are documented. |
| .PP |
| IPv6 is not described. |
| .\" Only a single Linux kernel version is described |
| .\" Info for 2.2 was lost. Should be added again, |
| .\" or put into a separate page. |
| .\" .SH AUTHORS |
| .\" This man page was originally written by Andi Kleen. |
| .\" It was updated for 2.4 by Nivedita Singhvi with input from |
| .\" Alexey Kuznetsov's Documentation/networking/ip-sysctl.txt |
| .\" document. |
| .SH SEE ALSO |
| .BR accept (2), |
| .BR bind (2), |
| .BR connect (2), |
| .BR getsockopt (2), |
| .BR listen (2), |
| .BR recvmsg (2), |
| .BR sendfile (2), |
| .BR sendmsg (2), |
| .BR socket (2), |
| .BR ip (7), |
| .BR socket (7) |
| .PP |
| The kernel source file |
| .IR Documentation/networking/ip\-sysctl.txt . |
| .PP |
| RFC\ 793 for the TCP specification. |
| .br |
| RFC\ 1122 for the TCP requirements and a description of the Nagle algorithm. |
| .br |
| RFC\ 1323 for TCP timestamp and window scaling options. |
| .br |
| RFC\ 1337 for a description of TIME_WAIT assassination hazards. |
| .br |
| RFC\ 3168 for a description of Explicit Congestion Notification. |
| .br |
| RFC\ 2581 for TCP congestion control algorithms. |
| .br |
| RFC\ 2018 and RFC\ 2883 for SACK and extensions to SACK. |