Guide to IP Layer Network Administration with Linux

Version 0.4.5

Author: Martin A. Brown

"Mar 2007"

Revision History
Revision 0.4.52007-03-31MAB
corrected DocBook build environment; new mail address
Revision 0.4.42003-04-26MAB
added index, began packet filtering chapter
Revision 0.4.32003-04-14MAB
ongoing editing, ARP/NAT fixes, routing content
Revision 0.4.22003-03-16MAB
ongoing editing; unreleased version
Revision 0.4.12003-02-19MAB
major routing revision; better use of callouts
Revision 0.4.02003-02-11MAB
major NAT revs; add inline scripts; outline FIB
Revision 0.3.92003-02-05MAB
fleshed out bonding; added bridging chapter
Revision 0.3.82003-02-03MAB
move to linux-ip.net; use TLDP XSL stylesheets
Revision 0.3.72003-02-02MAB
major editing on ARP; minor editing on routing
Revision 0.3.62003-01-30MAB
switch to XSLT processing; minor revs; CVS
Revision 0.3.52003-01-08MAB
ARP flux complete; ARP filtering touched
Revision 0.3.42003-01-06MAB
ARP complete; bridging added; ip neigh complete
Revision 0.3.32003-01-05MAB
split into 3 parts; ARP chapter begun
Revision 0.3.22002-12-29MAB
links updated; minor editing
Revision 0.3.12002-11-26MAB
edited: intro, snat, nat; split advanced in two
Revision 0.3.02002-11-14MAB
chapters finally have good HTML names
Revision 0.2.92002-11-11MAB
routing chapter heavily edited
Revision 0.2.82002-11-07MAB
basic chapter heavily edited
Revision 0.2.72002-11-04MAB
routing chapter finished; links rearranged
Revision 0.2.62002-10-29MAB
routing chapter continued
Revision 0.2.52002-10-28MAB
routing chapter partly complete
Revision 0.2.42002-10-08MAB
advanced routing additions and overview
Revision 0.2.32002-09-30MAB
minor editing; worked on tools/netstat; advanced routing
Revision 0.2.22002-09-24MAB
formalized revisioning; finished basic networking; started netstat
Revision 0.2.12002-09-21MAB
added network map to incomplete rough draft
Revision 0.22002-09-20MAB
incomplete rough draft released on LARTC list
Revision 0.12002-08-04MAB
rough draft begun

Abstract

This guide provides an overview of many of the tools available for IP network administration of the linux operating system, kernels in the 2.2 and 2.4 series. It covers Ethernet, ARP, IP routing, NAT, and other topics central to the management of IP networks.


Table of Contents

Introduction
1. Target Audience, Assumptions, and Recommendations
2. Conventions
3. Bugs and Roadmap
4. Technical Note and Summary of Approach
5. Acknowledgements and Request for Remarks
1. Concepts
1. Basic IP Connectivity
1.1. IP Networking Control Files
1.2. Reading Routes and IP Information
1.2.1. Sending Packets to the Local Network
1.2.2. Sending Packets to Unknown Networks Through the Default Gateway
1.2.3. Static Routes to Networks
1.3. Changing IP Addresses and Routes
1.3.1. Changing the IP on a machine
1.3.2. Setting the Default Route
1.3.3. Adding and removing a static route
1.4. Conclusion
2. Ethernet
2.1. Address Resolution Protocol (ARP)
2.1.1. Overview of Address Resolution Protocol
2.1.2. The ARP cache
2.1.3. ARP Suppression
2.1.4. The ARP Flux Problem
2.2. Proxy ARP
2.3. ARP filtering
2.4. Connecting to an Ethernet 802.1q VLAN
2.5. Link Aggregation and High Availability with Bonding
2.5.1. Link Aggregation
2.5.2. High Availability
3. Bridging
3.1. Concepts of Bridging
3.2. Bridging and Spanning Tree Protocol
3.3. Bridging and Packet Filtering
3.4. Traffic Control with a Bridge
3.5. ebtables
4. IP Routing
4.1. Introduction to Linux Routing
4.2. Routing to Locally Connected Networks
4.3. Sending Packets Through a Gateway
4.4. Operating as a Router
4.5. Route Selection
4.5.1. The Common Case
4.5.2. The Whole Story
4.5.3. Summary
4.6. Source Address Selection
4.7. Routing Cache
4.8. Routing Tables
4.8.1. Routing Table Entries (Routes)
4.8.2. The Local Routing Table
4.8.3. The Main Routing Table
4.9. Routing Policy Database (RPDB)
4.10. ICMP and Routing
4.10.1. MTU, MSS, and ICMP
4.10.2. ICMP Redirects and Routing
5. Network Address Translation (NAT)
5.1. Rationale for and Introduction to NAT
5.2. Application Layer Protocols with Embedded Network Information
5.3. Stateless NAT with iproute2
5.3.1. Stateless NAT Packet Capture and Introduction
5.3.2. Stateless NAT Practicum
5.3.3. Conditional Stateless NAT
5.4. Stateless NAT and Packet Filtering
5.5. Destination NAT with netfilter (DNAT)
5.5.1. Port Address Translation with DNAT
5.6. Port Address Translation (PAT) from Userspace
5.7. Transparent PAT from Userspace
6. Masquerading and Source Network Address Translation
6.1. Concepts of Source NAT
6.1.1. Differences Between SNAT and Masquerading
6.1.2. Double SNAT/Masquerading
6.2. Issues with SNAT/Masquerading and Inbound Traffic
6.3. Where Masquerading and SNAT Break
7. Packet Filtering
7.1. Rationale for and Introduction to Packet Filtering
7.1.1. History of Linux Packet Filter Support
7.2. Limits and Weaknesses of Packet Filtering
7.2.1. Limits of the Usefulness of Packet Filtering
7.2.2. Weaknesses of Packet Filtering
7.2.3. Complex Network Layer Stateless Packet Filters
7.3. General Packet Filter Requirements
7.4. The Netfilter Architecture
7.4.1. Packet Filtering with iptables
7.5. Packet Filtering with ipchains
7.5.1. Packet Mangling with ipchains
7.6. Protecting a Host
7.7. Protecting a Network
7.8. Further Resources
8. Statefulness and Statelessness
8.1.
8.2. Statelessness of IP Routing
8.3. Netfilter Connection Tracking
8.3.1.
8.3.2.
2. Cookbook
9. Advanced IP Management
9.1. Multiple IPs and the ARP Problem
9.2. Multiple IP Networks on one Ethernet Segment
9.3. Breaking a network in two with proxy ARP
9.4. Multiple IPs on an Interface
9.5. Multiple connections to the same Ethernet
9.6. Multihomed Hosts
9.7. Binding to Non-local Addresses
10. Advanced IP Routing
10.1. Introduction to Policy Routing
10.2. Overview of Routing and Packet Filter Interactions
10.3. Using the Routing Policy Database and Multiple Routing Tables
10.3.1. Using Type of Service Policy Routing
10.3.2. Using fwmark for Policy Routing
10.3.3. Policy Routing and NAT
10.4. Multiple Connections to the Internet
10.4.1. Outbound traffic Using Multiple Connections to the Internet
10.4.2. Inbound traffic Using Multiple Connections to the Internet
10.4.3. Using Multiple Connections to the Internet for Inbound and Outbound Connections
11. Scripts for Managing IP
11.1. Proxy ARP Scripts
11.2. NAT Scripts
12. Troubleshooting
12.1. Introduction to Troubleshooting
12.2. Troubleshooting at the Ethernet Layer
12.3. Troubleshooting at the IP Layer
12.4. Handling and Diagnosing Routing Problems
12.5. Identifying Problems with TCP Sessions
12.6. DNS Troubleshooting
3. Appendices and Reference
A. An Example Network and Description
A.1. Example Network Map and General Notes
A.2. Example Network Addressing Charts
B. Ethernet Layer Tools
B.1. arp
B.2. arping
B.3. ip link
B.3.1. Displaying link layer characteristics with ip link show
B.3.2. Changing link layer characteristics with ip link set
B.3.3. Deactivating a device with ip link set
B.3.4. Activating a device with ip link set
B.3.5. Using ip link set to change the MTU
B.3.6. Changing the device name with ip link set
B.3.7. Changing hardware or Ethernet broadcast address with ip link set
B.4. ip neighbor
B.5. mii-tool
C. IP Address Management
C.1. ifconfig
C.1.1. Displaying interface information with ifconfig
C.1.2. Bringing down an interface with ifconfig
C.1.3. Bringing up an interface with ifconfig
C.1.4. Reading ifconfig output
C.1.5. Changing MTU with ifconfig
C.1.6. Changing device flags with ifconfig
C.1.7. General remarks about ifconfig
C.2. ip address
C.2.1. Displaying interface information with ip address show
C.2.2. Using ip address add to configure IP address information
C.2.3. Using ip address del to remove IP addresses from an interface
C.2.4. Removing all IP address information from an interface with ip address flush
C.2.5. Conclusion
D. IP Route Management
D.1. route
D.1.1. Displaying the routing table with route
D.1.2. Reading route's output
D.1.3. Using route to display the routing cache
D.1.4. Creating a static route with route add
D.1.5. Creating a default route with route add default
D.1.6. Removing routes with route del
D.2. ip route
D.2.1. Displaying a routing table with ip route show
D.2.2. Displaying the routing cache with ip route show cache
D.2.3. Using ip route add to populate a routing table
D.2.4. Adding a default route with ip route add default
D.2.5. Setting up NAT with ip route add nat
D.2.6. Removing routes with ip route del
D.2.7. Altering existing routes with ip route change
D.2.8. Programmatically fetching route information with ip route get
D.2.9. Clearing routing tables with ip route flush
D.2.10. ip route flush cache
D.2.11. Summary of the use of ip route
D.3. ip rule
D.3.1. ip rule show
D.3.2. Displaying the RPDB with ip rule show
D.3.3. Adding a rule to the RPDB with ip rule add
D.3.4. ip rule add nat
D.3.5. ip rule del
E. Tunnels and VPNs
E.1. Lightweight encrypted tunnel with CIPE
E.2. GRE tunnels with ip tunnel
E.3. All manner of tunnels with ssh
E.4. IPSec implementation via FreeS/WAN
E.5. IPSec implementation in the kernel
E.6. PPTP
F. Sockets; Servers and Clients
F.1. telnet
F.2. nc
F.3. socat
F.4. tcpclient
F.5. xinetd
F.6. tcpserver
F.7. redir
G. Diagnostic Tools
G.1. ping
G.1.1. Using ping to test reachability
G.1.2. Using ping to stress a network
G.1.3. Recording a network route with ping
G.1.4. Setting the TTL on a ping packet
G.1.5. Setting ToS for a diagnostic ping
G.1.6. Specifying a source address for ping
G.1.7. Summary on the use of ping
G.2. traceroute
G.2.1. Using traceroute
G.2.2. Telling traceroute to use ICMP echo request instead of UDP
G.2.3. Setting ToS with traceroute
G.2.4. Summary on the use of traceroute
G.3. mtr
G.4. netstat
G.4.1. Displaying socket status with netstat
G.4.2. Displaying the main routing table with netstat
G.4.3. Displaying network interface statistics with netstat command
G.4.4. Displaying network stack statistics with netstat
G.4.5. Displaying the masquerading table with netstat
G.5. tcpdump
G.5.1. Using tcpdump to view ARP messages
G.5.2. Using tcpdump to see ICMP unreachable messages
G.5.3. Using tcpdump to watch TCP sessions
G.5.4. Reading and writing tcpdump data
G.5.5. Understanding fragmentation as reported by tcpdump
G.5.6. Other options to the tcpdump command
G.6. tcpflow
G.7. tcpreplay
H. Miscellany
H.1. ipcalc and other IP addressing calculators
H.2. Some general remarks about iproute2 tools
H.3. Brief introduction to sysctl
I. Links to other Resources
I.1. Links to Documentation
I.1.1. Linux Networking Introduction and Overview Material
I.1.2. Linux Security and Network Security
I.1.3. General IP Networking Resources
I.1.4. Masquerading topics
I.1.5. Network Address Translation
I.1.6. iproute2 documentation
I.1.7. Netfilter Resources
I.1.8. ipchains Resources
I.1.9. ipfwadm Resources
I.1.10. General Systems References
I.1.11. Bridging
I.1.12. Traffic Control
I.1.13. IPv4 Multicast
I.1.14. Miscellaneous Linux IP Resources
I.2. Links to Software
I.2.1. Basic Utilities
I.2.2. Virtual Private Networking software
I.2.3. Traffic Control queueing disciplines and command line tools
I.2.4. Interfaces to lower layer tools
I.2.5. Packet sniffing and diagnostic tools
J. GNU Free Documentation License
J.1. PREAMBLE
J.2. APPLICABILITY AND DEFINITIONS
J.3. VERBATIM COPYING
J.4. COPYING IN QUANTITY
J.5. MODIFICATIONS
J.6. COMBINING DOCUMENTS
J.7. COLLECTIONS OF DOCUMENTS
J.8. AGGREGATION WITH INDEPENDENT WORKS
J.9. TRANSLATION
J.10. TERMINATION
J.11. FUTURE REVISIONS OF THIS LICENSE
J.12. ADDENDUM: How to use this License for your documents
Reference Bibliography and Recommended Reading
Index

List of Tables

2.1. Active ARP cache entry states
4.1. Keys used for hash table lookups during route selection
5.1. Filtering an iproute2 NAT packet with ipchains
A.1. Example Network; Network Addressing
A.2. Example Network; Host Addressing
B.1. ip link link layer device states
B.2. Ethernet Port Speed Abbreviations
C.1. Interface Flags
C.2. IP Scope under ip address
G.1. Possible Session States in netstat output
H.1. iproute2 Synonyms

List of Examples

1.1. Sample ifconfig output
1.2. Testing reachability of a locally connected host with ping
1.3. Testing reachability of non-local hosts
1.4. Sample routing table with a static route
1.5. ifconfig and route output before the change
1.6. Bringing down a network interface with ifconfig
1.7. Bringing up an Ethernet interface with ifconfig
1.8. Adding a default route with route
1.9. Adding a static route with route
1.10. Removing a static network route and adding a static host route
2.1. ARP conversation captured with tcpdump
2.2. Gratuitous ARP reply frames
2.3. Unsolicited ARP request frames
2.4. Duplicate Address Detection with ARP
2.5. ARP cache listings with arp and ip neighbor
2.6. ARP cache timeout
2.7. ARP flux
2.8. Correction of ARP flux with conf/$DEV/arp_filter
2.9. Correction of ARP flux with net/$DEV/hidden
2.10. Proxy ARP Network Diagram
2.11. Bringing up a VLAN interface
2.12. Link aggregation bonding
2.13. High availability bonding
4.1. Classes of IP addresses
4.2. Using ipcalc to display IP information
4.3. Identifying the locally connected networks with route
4.4. Routing Selection Algorithm in Pseudo-code
4.5. Listing the Routing Policy Database (RPDB)
4.6. Typical content of /etc/iproute2/rt_tables
4.7. unicast route types
4.8. broadcast route types
4.9. local route types
4.10. nat route types
4.11. unreachable route types
4.12. prohibit route types
4.13. blackhole route types
4.14. throw route types
4.15. Kernel maintenance of the local routing table
4.16. unicast rule type
4.17. nat rule type
4.18. unreachable rule type
4.19. prohibit rule type
4.20. blackhole rule type
4.21. ICMP Redirect on the Wire
5.1. Stateless NAT Packet Capture
5.2. Basic commands to create a stateless NAT
5.3. Conditional Stateless NAT (not performing NAT for a specified destination network)
5.4. Using an ipchains packet filter with stateless NAT
5.5. Using DNAT for all protocols (and ports) on one IP
5.6. Using DNAT for a single port
5.7. Simulating full NAT with SNAT and DNAT
7.1. Blocking a destination and using the REJECT target, cf. Example D.17, “Adding a prohibit route with route add
10.1. Multiple Outbound Internet links, part I; ip route
10.2. Multiple Outbound Internet links, part II; iptables
10.3. Multiple Outbound Internet links, part III; ip rule
10.4. Multiple Internet links, inbound traffic; using iproute2 only
11.1. Proxy ARP SysV initialization script
11.2. Proxy ARP configuration file
11.3. Static NAT SysV initialization script
11.4. Static NAT configuration file
B.1. Displaying the arp table with arp
B.2. Adding arp table entries with arp
B.3. Deleting arp table entries with arp
B.4. Displaying reachability of an IP on the local Ethernet with arping
B.5. Duplicate Address Detection with arping
B.6. Using ip link show
B.7. Using ip link set to change device flags
B.8. Deactivating a link layer device with ip link set
B.9. Activating a link layer device with ip link set
B.10. Using ip link set to change device flags
B.11. Changing the device name with ip link set
B.12. Changing broadcast and hardware addresses with ip link set
B.13. Displaying the ARP cache with ip neighbor show
B.14. Displaying the ARP cache on an interface with ip neighbor show
B.15. Displaying the ARP cache for a particular network with ip neighbor show
B.16. Entering a permanent entry into the ARP cache with ip neighbor add
B.17. Entering a proxy ARP entry with ip neighbor add proxy
B.18. Altering an entry in the ARP cache with ip neighbor change
B.19. Removing an entry from the ARP cache with ip neighbor del
B.20. Removing learned entries from the ARP cache with ip neighbor flush
B.21. Detecting link layer status with mii-tool
B.22. Specifying Ethernet port speeds with mii-tool --advertise
B.23. Forcing Ethernet port speed with mii-tool --force
C.1. Viewing interface information with ifconfig
C.2. Bringing down an interface with ifconfig
C.3. Bringing up an interface with ifconfig
C.4. Changing MTU with ifconfig
C.5. Setting interface flags with ifconfig
C.6. Displaying IP information with ip address
C.7. Adding IP addresses to an interface with ip address
C.8. Removing IP addresses from interfaces with ip address
C.9. Removing all IPs on an interface with ip address flush
D.1. Viewing a simple routing table with route
D.2. Viewing a complex routing table with route
D.3. Viewing the routing cache with route
D.4. Adding a static route to a network route add
D.5. Adding a static route to a host with route add
D.6. Adding a static route to a host on the same media with route add
D.7. Setting the default route with route
D.8. An alternate method of setting the default route with route
D.9. Removing a static host route with route del
D.10. Removing the default route with route del
D.11. Viewing the main routing table with ip route show
D.12. Viewing the local routing table with ip route show table local
D.13. Viewing a routing table with ip route show table
D.14. Displaying the routing cache with ip route show cache
D.15. Displaying statistics from the routing cache with ip -s route show cache
D.16. Adding a static route to a network with route add, cf. Example D.4, “Adding a static route to a network route add
D.17. Adding a prohibit route with route add
D.18. Using from in a routing command with route add
D.19. Using src in a routing command with route add
D.20. Setting the default route with ip route add default
D.21. Creating a NAT route for a single IP with ip route add nat
D.22. Creating a NAT route for an entire network with ip route add nat
D.23. Removing routes with ip route del
D.24. Altering existing routes with ip route change
D.25. Testing routing tables with ip route get
D.26. Removing a specific route and emptying a routing table with ip route flush
D.27. Emptying the routing cache with ip route flush cache
D.28. Displaying the RPDB with ip rule show
D.29. Creating a simple entry in the RPDB with ip rule add
D.30. Creating a complex entry in the RPDB with ip rule add
D.31. Creating a NAT rule with ip rule add nat
D.32. Creating a NAT rule for an entire network with ip rule add nat
D.33. Removing a NAT rule for an entire network with ip rule del nat
F.1. Simple use of nc
F.2. Specifying timeout with nc
F.3. Specifying source address with nc
F.4. Using nc as a server
F.5. Delaying a stream with nc
F.6. Using nc with UDP
F.7. Simple use of socat
F.8. Using socat with proxy connect
F.9. Using socat perform SSL
F.10. Connecting one end of socat to a file descriptor
F.11. Connecting socat to a serial line
F.12. Using a PTY with socat
F.13. Executing a command with socat
F.14. Connecting one socat to another one
F.15. Simple use of tcpclient
F.16. Specifying the local port which tcpclient should request
F.17. Specifying the local IP to which tcpclient should bind
F.18. IP redirection with xinetd
F.19. Publishing a service with xinetd
F.20. Simple use of tcpserver
F.21. Specifying a CDB for tcpserver
F.22. Limiting the number of concurrently accept TCP sessions under tcpserver
F.23. Specifying a UID for tcpserver's spawned processes
F.24. Redirecting a TCP port with redir
F.25. Running redir in transparent mode
F.26. Running redir from another TCP server
F.27. Specifying a source address for redir's client side
G.1. Using ping to test reachability
G.2. Using ping to specify number of packets to send
G.3. Using ping to specify number of packets to send
G.4. Using ping to stress a network
G.5. Using ping to stress a network with large packets
G.6. Recording a network route with ping
G.7. Setting the TTL on a ping packet
G.8. Setting ToS for a diagnostic ping
G.9. Specifying a source address for ping
G.10. Simple usage of traceroute
G.11. Displaying IP socket status with netstat
G.12. Displaying IP socket status details with netstat
G.13. Displaying the main routing table with netstat
G.14. Displaying the routing cache with netstat
G.15. Displaying the masquerading table with netstat
G.16. Viewing an ARP broadcast request and reply with tcpdump
G.17. Viewing a gratuitous ARP packet with tcpdump
G.18. Viewing unicast ARP packets with tcpdump
G.19. tcpdump reporting port unreachable
G.20. tcpdump reporting host unreachable
G.21. tcpdump reporting net unreachable
G.22. Monitoring TCP window sizes with tcpdump
G.23. Examining TCP flags with tcpdump
G.24. Examining TCP acknowledgement numbers with tcpdump
G.25. Writing tcpdump data to a file
G.26. Reading tcpdump data from a file
G.27. Causing tcpdump to use a line buffer
G.28. Understanding fragmentation as reported by tcpdump
G.29. Specifying interface with tcpdump
G.30. Timestamp related options to tcpdump

Introduction

This guide is as an overview of the IP networking capabilities of linux kernels 2.2 and 2.4. The target audience is any beginning to advanced network administrator who wants practical examples and explanation of rumoured features of linux. As the Internet is lousy with documentation on the nooks and crannies of linux networking support, I have tried to provide links to existing documentation on IP networking with linux.

The documentation you'll find here covers kernels 2.2 and 2.4, although a good number of the examples and concepts may also apply to older kernels. In the event that I cover a feature that is only present or supported under a particular kernel, I'll identify which kernel supports that feature.

1. Target Audience, Assumptions, and Recommendations

I assume a few things about the reader. First, the reader has a basic understanding (at least) of IP addressing and networking. If this is not the case, or the reader has some trouble following my networking examples, I have provided a section of links to IP layer tutorials and general introductory documentation in the appendix. Second, I assume the reader is comfortable with command line tools and the Linux, Unix, or BSD environments. Finally, I assume the reader has working network cards and a Linux OS. For assistance with Ethernet cards, the there exists a good Ethernet HOWTO.

The examples I give are intended as tutorial examples only. The user should understand and accept the ramifications of using these examples on his/her own machines. I recommend that before running any example on a production machine, the user test in a controlled environment. I accept no responsibility for damage, misconfiguration or loss of any kind as a result of referring to this documentation. Proceed with caution at your own risk.

This guide has been written primarily as a companion reference to IP networking on Ethernets. Although I do allude to other link layer types occasionally in this book, the focus has been IP as used in Ethernet. Ethernet is one of the most common networking devices supported under linux, and is practically ubiquitous.

2. Conventions

This text was written in DocBook with vim. All formatting has been applied by xsltproc based on DocBook and LDP XSL stylesheets. Typeface formatting and display conventions are similar to most printed and electronically distributed technical documentation. A brief summary of these conventions follows below.

The interactive shell prompt will look like

[root@hostname]#

for the root user and

[user@hostname]$

for non-root users, although most of the operations we will be discussing will require root privileges.

Any commands to be entered by the user will always appear like

{ echo "Hi, I am exiting with a non-zero exit code."; exit 1 }

Output by any program will look something like this:

Hi, I am exiting with a non-zero exit code.

Where possible, an additional convention I have used is the suppression of all hostname lookup. DNS and other naming based schemes often confuse the novice and expert alike, particularly when the name resolver is slow or unreachable. Since the focus of this guide is IP layer networking, DNS names will be used only where absolutely unambiguous.

3. Bugs and Roadmap

Perhaps this should be called things that are wrong with this document, or things which should be improved. See the src/ROADMAP for notes on what is likely to be forthcoming in subsequent releases.

The internal document linking, while good, but could be better. Especially lame is the lack of an index. External links should be used more commonly where appropriate instead of sending users to the links page.

If you are looking for LARTC topics, you may find some LAR topics here, but you should try the LARTC page itself if you have questions that are more TC than LAR. Consult Appendix I, Links to other Resources for further references to available documentation.

4. Technical Note and Summary of Approach

There are many tools available under linux which are also available under other unix-like operating systems, but there are additional tools and specific tools which are available only to users of linux. This guide represents an effort to identify some of these tools. The most concrete example of the difference between linux only tools and generally available unix-like tools is the difference between the traditional ifconfig and route commands, available under most variants of unix, and the iproute2 command suite, written specificially for linux.

Because this guide concerns itself with the features, strengths, and peculiarities of IP networking with linux, the iproute2 command suite assumes a prominent role. The iproute2 tools expose the strength, flexibility and potential of the linux networking stack.

Many of the tools introduced and concepts introduced are also detailed in other HOWTOs and guides available at The Linux Documentation Project in addition to many other places on the Internet and in printed books.

5. Acknowledgements and Request for Remarks

As with many human endeavours, this work is made possible by the efforts of others. For me, this effort represents almost four years of learning and network administration. The knowledge collected here is in large measure a repackaging of disparate resources and my own experiences over time. Without the greater linux community, I would not be able to provide this resource.

I would like to take this opportunity to make a plug for my employer, SecurePipe, Inc. which has provided me stable and challenging employment for these (almost) four years. SecurePipe is a managed security services provider specializing in managed firewall, VPN, and IDS services to small and medium sized companies. They offer me the opportunity to hone my networking skills and explore areas of linux networking unknown to me. Thanks also to SecurePipe, Inc. for hosting this cost-free on their servers.

Over the course of the project, many people have contributed suggestions, modifications, corrections and additions. I'll acknowledge them briefly here. For full acknowledgements, see src/ACKNOWLEDGEMENTS in the DocBook source tree.

  • Russ Herrold, 2002-09-22

  • Yann Hirou, 2002-09-26

  • Julian Anastasov, 2002-10-29

  • Bert Hubert, 2002-11-14

  • Tony Kapela, 2002-11-30

  • George Georgalis, 2003-01-11

  • Alex Russell, 2003-02-02

  • giovanni, 2003-02-06

  • Gilles Douillet, 2003-02-28

Please feel free to point out any irregularities, factual errors, typographical errors, or logical gaps in this documentation. If you have rants or raves about this documentation, please mail me directly at .

Now, let's begin! Let me welcome you to the pleasure and reliability of IP networking with linux.

Part 1. Concepts

Table of Contents

1. Basic IP Connectivity
1.1. IP Networking Control Files
1.2. Reading Routes and IP Information
1.2.1. Sending Packets to the Local Network
1.2.2. Sending Packets to Unknown Networks Through the Default Gateway
1.2.3. Static Routes to Networks
1.3. Changing IP Addresses and Routes
1.3.1. Changing the IP on a machine
1.3.2. Setting the Default Route
1.3.3. Adding and removing a static route
1.4. Conclusion
2. Ethernet
2.1. Address Resolution Protocol (ARP)
2.1.1. Overview of Address Resolution Protocol
2.1.2. The ARP cache
2.1.3. ARP Suppression
2.1.4. The ARP Flux Problem
2.2. Proxy ARP
2.3. ARP filtering
2.4. Connecting to an Ethernet 802.1q VLAN
2.5. Link Aggregation and High Availability with Bonding
2.5.1. Link Aggregation
2.5.2. High Availability
3. Bridging
3.1. Concepts of Bridging
3.2. Bridging and Spanning Tree Protocol
3.3. Bridging and Packet Filtering
3.4. Traffic Control with a Bridge
3.5. ebtables
4. IP Routing
4.1. Introduction to Linux Routing
4.2. Routing to Locally Connected Networks
4.3. Sending Packets Through a Gateway
4.4. Operating as a Router
4.5. Route Selection
4.5.1. The Common Case
4.5.2. The Whole Story
4.5.3. Summary
4.6. Source Address Selection
4.7. Routing Cache
4.8. Routing Tables
4.8.1. Routing Table Entries (Routes)
4.8.2. The Local Routing Table
4.8.3. The Main Routing Table
4.9. Routing Policy Database (RPDB)
4.10. ICMP and Routing
4.10.1. MTU, MSS, and ICMP
4.10.2. ICMP Redirects and Routing
5. Network Address Translation (NAT)
5.1. Rationale for and Introduction to NAT
5.2. Application Layer Protocols with Embedded Network Information
5.3. Stateless NAT with iproute2
5.3.1. Stateless NAT Packet Capture and Introduction
5.3.2. Stateless NAT Practicum
5.3.3. Conditional Stateless NAT
5.4. Stateless NAT and Packet Filtering
5.5. Destination NAT with netfilter (DNAT)
5.5.1. Port Address Translation with DNAT
5.6. Port Address Translation (PAT) from Userspace
5.7. Transparent PAT from Userspace
6. Masquerading and Source Network Address Translation
6.1. Concepts of Source NAT
6.1.1. Differences Between SNAT and Masquerading
6.1.2. Double SNAT/Masquerading
6.2. Issues with SNAT/Masquerading and Inbound Traffic
6.3. Where Masquerading and SNAT Break
7. Packet Filtering
7.1. Rationale for and Introduction to Packet Filtering
7.1.1. History of Linux Packet Filter Support
7.2. Limits and Weaknesses of Packet Filtering
7.2.1. Limits of the Usefulness of Packet Filtering
7.2.2. Weaknesses of Packet Filtering
7.2.3. Complex Network Layer Stateless Packet Filters
7.3. General Packet Filter Requirements
7.4. The Netfilter Architecture
7.4.1. Packet Filtering with iptables
7.5. Packet Filtering with ipchains
7.5.1. Packet Mangling with ipchains
7.6. Protecting a Host
7.7. Protecting a Network
7.8. Further Resources
8. Statefulness and Statelessness
8.1.
8.2. Statelessness of IP Routing
8.3. Netfilter Connection Tracking
8.3.1.
8.3.2.

Chapter 1. Basic IP Connectivity

Internet Protocol (IP) networking is now among the most common networking technologies in use today. The IP stack under linux is mature, robust and reliable. This chapter covers the basics of configuring a linux machine or multiple linux machines to join an IP network.

This chapter covers a quick overview of the locations of the networking control files on different distributions of linux. The remainder of the chapter is devoted to outlining the basics of IP networking with linux.

These basics are written in a more tutorial style than the remainder of the first part of the book. Reading and understanding IP addressing and routing information is a key skill to master when beginning with linux. Naturally, the next step is to alter the IP configuration of a machine. This chapter will introduce these two key skills in a tutorial style. Subsequent chapters will engage specific subtopics of linux networking in a more thorough and less tutorial manner.

1.1. IP Networking Control Files

Different linux distribution vendors put their networking configuration files in different places in the filesystem. Here is a brief summary of the locations of the IP networking configuration information under a few common linux distributions along with links to further documentation.

Location of networking configuration files

The format of the networking configuration files differs significantly from distribution to distribution, yet the tools used by these scripts are the same. This documentation will focus on these tools and how they instruct the kernel to alter interface and route information. Consult the distribution's documentation for questions of file format and order of operation.

For the remainder of this document, many examples refer to machines in a hypothetical network. Refer to the example network description for the network map and addressing scheme.

1.2. Reading Routes and IP Information

Assuming an already configured machine named tristan, let's look at the IP addressing and routing table. Next we'll examine how the machine communicates with computers (hosts) on the locally reachable network. We'll then send packets through our default gateway to other networks. After learning what a default route is, we'll look at a static route.

One of the first things to learn about a machine attached to an IP network is its IP address. We'll begin by looking at a machine named tristan on the main desktop network (192.168.99.0/24).

The machine tristan is alive on IP 192.168.99.35 and has been properly configured by the system administrator. By examining the route and ifconfig output we can learn a good deal about the network to which tristan is connected [1].

Example 1.1. Sample ifconfig output

[root@tristan]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:80:C8:F8:4A:51  
          inet addr:192.168.99.35  Bcast:192.168.99.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:27849718 errors:1 dropped:0 overruns:0 frame:0
          TX packets:29968044 errors:5 dropped:0 overruns:2 carrier:3
          collisions:0 txqueuelen:100 
          RX bytes:943447653 (899.7 Mb)  TX bytes:2599122310 (2478.7 Mb)
          Interrupt:9 Base address:0x1000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:7028982 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7028982 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1206918001 (1151.0 Mb)  TX bytes:1206918001 (1151.0 Mb)

[root@tristan]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.99.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
127.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 lo
0.0.0.0         192.168.99.254  0.0.0.0         UG    0      0        0 eth0
      

For the moment, ignore the loopback interface (lo) and concentrate on the Ethernet interface. Examine the output of the ifconfig command. We can learn a great deal about the IP network to which we are connected simply by reading the ifconfig output. For a thorough discussion of ifconfig, see Section C.1, “ifconfig.

The IP address active on tristan is 192.168.99.35. This means that any IP packets created by tristan will have a source address of 192.168.99.35. Similarly any packet received by tristan will have the destination address of 192.168.99.35. When creating an outbound packet tristan will set the destination address to the server's IP. This gives the remote host and the networking devices in between these hosts enough information to carry packets between the two devices.

Because tristan will advertise that it accepts packets with a destination address of 192.168.99.35, any frames (packets) appearing on the Ethernet bound for 192.168.99.35 will reach tristan. The process of communicating the ownership of an IP address is called ARP. Read Section 2.1.1, “Overview of Address Resolution Protocol” for a complete discussion of this process.

This is fundamental to IP networking. It is fundamental that a host be able to generate and receive packets on an IP address assigned to it. This IP address is a unique identifier for the machine on the network to which it is connected.

Common traffic to and from machines today is unicast IP traffic. Unicast traffic is essentially a conversation between two hosts. Though there may be routers between them, the two hosts are carrying on a private conversation. Examples of common unicast traffic are protocols such as HTTP (web), SMTP (sending mail), POP3 (fetching mail), IRC (chat), SSH (secure shell), and LDAP (directory access). To participate in any of these kinds of traffic, tristan will send and receive packets on 192.168.99.35.

In contrast to unicast traffic, there is another common IP networking technique called broadcasting. Broadcast traffic is a way of addressing all hosts in a given network range with a single destination IP address. To continue the analogy of the unicast conversation, a broadcast is more like shouting in a room. Occasionally, network administrators will refer to broadcast techniques and broadcasting as "chatty network traffic".

Broadcast techniques are used at the Ethernet layer and the IP layer, so the cautious person talks about Ethernet broadcasts or IP broadcast. Refer to Section 2.1.1, “Overview of Address Resolution Protocol”, for more information on a common use of broadcast Ethernet frames.

IP Broadcast techniques can be used to share information with all partners on a network or to discover characteristics of other members of a network. SMB (Server Message Block) as implemented by Microsoft products and the samba package makes extensive use of broadcasting techniques for discovery and information sharing. Dynamic Host Configuration Protocol (DHCP) also makes use of broadcasting techniques to manage IP addressing.

The IP broadcast address is, usually, correctly derived from the IP address and network mask although it can be easily be set explicitly to a different address. Because the broadcast address is used for autodiscovery (e.g, SMB under some protocols, an incorrect broadcast address can inhibit a machine's ability to participate in networked communication [2].

The netmask on the interface should match the netmask in the routing table for the locally connected network. Typically, the route and the IP interface definition are calculated from the same configuration data so they should match perfectly.

If you are at all confused about how to address a network or how to read either the traditional notation or the CIDR notation for network addressing, see one of the CIDR/netmask references in Section I.1.3, “General IP Networking Resources”.

1.2.1. Sending Packets to the Local Network

We can see from the output above that the IP address 192.168.99.35 falls inside the address space 192.168.99.0/24. We also note that the machine tristan will route packets bound for 192.168.99.0/24 directly onto the Ethernet attached to eth0. This line in the routing table identifies a network available on the Ethernet attached to eth0 ("Iface") by its network address ("Destination") and size ("Genmask").

Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.99.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
      

Every host on the 192.168.99.0/24 network should share the network address and netmask specified above. No two hosts should share the same IP address.

Currently, there are two hosts connected to the example desktop network. Both tristan and masq-gw are connected to 192.168.99.0/24. Thus, 192.168.99.254 (masq-gw) should be reachable from tristan. Success of this test provides evidence that tristan is configured properly. N.B., Assume that the network administrator has properly configured masq-gw. Since the default gateway in any network is an important host, testing reachability of the default gateway also has a value in determining the proper operation of the local network.

The ping tool, designed to take advantage of Internet Control Message Protocol (ICMP), can be used to test reachability of IP addresses. For a command summary and examples of the use of ping, see Section G.1, “ping.

Example 1.2. Testing reachability of a locally connected host with ping

[root@tristan]# ping -c 1 -n 192.168.99.254
PING 192.168.99.254 (192.168.99.254) from 192.168.99.35 : 56(84) bytes of data.

--- 192.168.99.254 ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss
PING 192.168.99.254 (192.168.99.254) from 192.168.99.35 : 56(84) bytes of data.
64 bytes from 192.168.99.254: icmp_seq=0 ttl=255 time=238 usec

--- 192.168.99.254 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/mdev = 0.238/0.238/0.238/0.000 ms
        

1.2.2. Sending Packets to Unknown Networks Through the Default Gateway

In Section 1.2.1, “Sending Packets to the Local Network”, we verified that hosts connected to the same local network can reach each other and, importantly, the default gateway. Now, let's see what happens to packets which have a destination address outside the locally connected network.

Assuming that the network administrator allows ping packets from the desktop network into the public network, ping can be invoked with the record route option to show the path the packet travels from tristan to wan-gw and back.

Example 1.3. Testing reachability of non-local hosts

[root@tristan]# ping -R -c 1 -n 205.254.211.254
PING 205.254.211.254 (205.254.211.254) from 192.168.99.35 : 56(84) bytes of data.

--- 205.254.211.254 ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss
PING 205.254.211.254 (205.254.211.254) from 192.168.99.35 : 56(84) bytes of data.
64 bytes from 205.254.211.254: icmp_seq=0 ttl=255 time=238 usec
RR:     192.168.99.35        1
        205.254.211.179      2
        205.254.211.254      3
        205.254.211.254
        192.168.99.254       4
        192.168.99.35        5

--- 192.168.99.254 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/mdev = 0.238/0.238/0.238/0.000 ms
          
1 As the packet passes through the IP stack on tristan, before hitting the Ethernet, tristan adds its IP to the list of IPs in the option field in the header.
2 This is masq-gw's public IP address.
3 Our intended destination! (Anybody know why there are two entries in the record route output?)
4 This is masq-gw's private IP address.
5 And finally, tristan will add its IP to the option field in the header of the IP packet just before the packet reaches the calling ping program.

By testing reachability of the local network 192.168.99.0/24 and an IP address outside our local network, we have verified the basic elements of IP connectivity.

To summarize this section, we have:

  • identified the IP address, network address and netmask in use on tristan using the tools ifconfig and route

  • verified that tristan can reach its default gateway

  • tested that packets bound for destinations outside our local network reach the intended destination and return

1.2.3. Static Routes to Networks

Static routes instruct the kernel to route packets for a known destination host or network to a router or gateway different from the default gateway. In the example network, the desktop machine tristan would need a static route to reach hosts in the 192.168.98.0/24 network. Note that the branch office network is reachable over an ISDN line. The ISDN router's IP in tristan's network is 192.168.99.1. This means that there are two gateways in the example desktop network, one connected to a small branch office network, and the other connected to the Internet.

Without a static route to the branch office network, tristan would use masq-gw as the gateway, which is not the most efficient path for packets bound for morgan. Let's examine why a static route would be better here.

If tristan generates a packet bound for morgan and sends the packet to the default gateway, masq-gw will forward the packet to isdn-router as well as generate an ICMP redirect message to tristan. This ICMP redirect message tells tristan to send future packets with a destination address of 192.168.98.82 (morgan) directly to isdn-router. For a fuller discussion of ICMP redirect, see Section 4.10.2, “ICMP Redirects and Routing”.

The absence of a static route has caused two extra packets to be generated on the Ethernet for no benefit. Not only that, but tristan will eventually expire the temporary route entry [3] for 192.168.98.82, which means that subsequent packets bound for morgan will repeat this process [4].

To solve this problem, add a static route to tristan's routing table. Below is a modified routing table (see Section 1.3, “Changing IP Addresses and Routes” to learn how to change the routing table).

Example 1.4. Sample routing table with a static route

[root@tristan]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.99.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
192.168.98.0    192.168.99.1    255.255.255.0   UG    0      0        0 eth0
127.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 lo
0.0.0.0         192.168.99.254  0.0.0.0         UG    0      0        0 eth0
        

According to this routing table, any packets with a destination address in the 192.168.98.0/24 network will be routed to the gateway 192.168.99.1 instead of the default gateway. This will prevent unnecessary ICMP redirect messages.

These are the basic tools for inspecting the IP address and the routes on a linux machine. Understanding the output of these tools will help you understand how machines fit into simple networks, and will be a base on which you can build an understanding of more complex networks.



[1] For BSD and UNIX users, the idiom netstat -rn may be more familiar than the common route -n on a linux machine. Both of these commands provide the same basic information although the formatting is a bit different. For a fuller discussion of these, see either Section G.4, “netstat or Section D.1, “route. For access to all of the routing features of the linux kernel, use ip route instead.

[2] An incorrect broadcast address often highlights a mismatch of the configured IP address and netmask on an interface. If in doubt, be sure to use an IP calculator to set the correct netmask and broadcast addresses.

[3] If the machine is a linux machine, then the temporary route entry is stored in the routing cache. Consult Section 4.7, “Routing Cache” for more information on the routing cache.

[4] It is quite reasonable to ignore ICMP redirect messages from unknown hosts on the Internet, but ICMP redirect messages on a LAN indicate that a host has mismatched netmasks or missing static routes.

1.3. Changing IP Addresses and Routes

This section introduces changing the IP address on an interface, changing the default gateway, and adding and removing a static route. With the knowledge of ifconfig and route output it's a small step to learn how to change IP configuration with these same tools.

1.3.1. Changing the IP on a machine

For a practical example, let's say that the branch office server, morgan, needs to visit the main office for some hardware maintenance. Since the services on the machine are not in use, it's a convenient time to fetch some software updates, after configuring the machine to join the LAN.

Once the machine is booted and connected to the Ethernet, it's ready for IP reconfiguration. In order to join an IP network, the following information is required. Refer to the network map and appendix to gather the required information below.

  • An unused IP address (Use 192.168.99.14.)

  • netmask (What's your guess?)

  • IP address of the default gateway (What's your guess?)

  • network address [5] (What's your guess?)

  • The IP address of a name resolver. (Use the IP of the default gateway here [6]. )

Example 1.5. ifconfig and route output before the change

[root@morgan]# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:80:C8:F8:4A:53  
          inet addr:192.168.98.82  Bcast:192.168.98.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Interrupt:9 Base address:0x5000 

[root@morgan]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.98.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
127.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 lo
0.0.0.0         192.168.98.254  0.0.0.0         UG    0      0        0 eth0
        

The process of readdressing for the new network involves three steps. It is clear in Example 1.5, “ifconfig and route output before the change”, that morgan is configured for a different network than the main office desktop network. First, the active interface must be brought down, then a new address must be configured on the interface and brought up, and finally a new default route must be added. If the networking configuration is correct and the process is successful, the machine should be able to connect to local and non-local destinations.

Example 1.6. Bringing down a network interface with ifconfig

[root@morgan]# ifconfig eth0 down
        

This is a fast way to stop networking on a single-homed machine such as a server or workstation. On multi-homed hosts, other interfaces on the machine would be unaffected by this command. This method of bringing down an interface has some serious side effects, which should be understood. Here is a summary of the side effects of bringing down an interface.

Side effects of bringing down an interface with ifconfig

  • all IP addresses on the specified interface are deactivated and removed

  • any connections established to or from IPs on the specified interface are broken [7]

  • all routes to any destinations through the specified interface are removed from the routing tables

  • the link layer device is deactivated

The next step, bringing up the interface, requires the new networking configuration information. It's a good habit to check the interface after configuration to verify settings.

Example 1.7. Bringing up an Ethernet interface with ifconfig

[root@morgan]# ifconfig eth0 192.168.99.14 netmask 255.255.255.0 up
[root@morgan]# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:80:C8:F8:4A:53  
          inet addr:192.168.99.14  Bcast:192.168.99.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Interrupt:9 Base address:0x5000 

        

The second call to ifconfig allows verification of the IP addressing information. The currently configured IP address on eth0 is 192.168.99.14. Bringing up an interface also has a small set of side effects.

Side effects of bringing up an interface

  • the link layer device is activated

  • the requested IP address is assigned to the specified interface

  • all local, network, and broadcast routes implied by the IP configuration are added to the routing tables

Use ping to verify the reachability of other locally connected hosts or skip directly to setting the default gateway.

1.3.2. Setting the Default Route

It should come as no surprise to a close reader (hint), that the default route was removed at the execution of ifconfig eth0 down. The crucial final step is configuring the default route.

Example 1.8. Adding a default route with route

[root@morgan]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.99.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
127.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 lo
[root@morgan]# route add default gw 192.168.99.254
[root@morgan]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.99.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
127.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 lo
0.0.0.0         192.168.99.254  0.0.0.0         UG    0      0        0 eth0
        

The routing table on morgan should look exactly like the initial routing table on tristan. Compare the routing tables in