SSH, SOCKS, and cURL

Port forwarding using SSH tunnels is a convenient way to circumvent well-intentioned firewall rules, or to access resources on otherwise unaddressable networks, particularly those behind NAT (with addresses such as 192.168.0.1).

However, it has a shortcoming in that it only allows us to address a specific host and port on the remote end of the connection; if we forward a local port to machine A on the remote subnet, we can’t also reach machine B unless we forward another port. Fetching documents from a single server therefore works just fine, but browsing multiple resources over the endpoint is a hassle.

The proper way to do this, if possible, is to have a VPN connection into the appropriate network, whether via a virtual interface or a network route through an IPsec tunnel. In cases where this isn’t possible or practicable, we can use a SOCKS proxy set up via an SSH connection to delegate all kinds of network connections through a remote machine, using its exact network stack, provided our client application supports it.

Being command-line junkies, we’ll show how to set the tunnel up with ssh and to retrieve resources on it via curl, but of course graphical browsers are able to use SOCKS proxies as well.

As an added benefit, using this for browsing implicitly encrypts all of the traffic up to the remote endpoint of the SSH connection, including the addresses of the machines you’re contacting; it’s thus a useful way to protect unencrypted traffic from snoopers on your local network, or to circumvent firewall policies.

Establishing the tunnel

First of all we’ll make an SSH connection to the machine we’d like to act as a SOCKS proxy, which has access to the network services that we don’t. Perhaps it’s the only publically addressable machine in the network.

$ ssh -fN -D localhost:8001 remote.example.com

In this example, we’re backgrounding the connection immediately with -f, and explicitly saying we don’t intend to run a command or shell with -N. We’re only interested in establishing the tunnel.

Of course, if you do want a shell as well, you can leave these options out:

$ ssh -D localhost:8001 remote.example.com

If the tunnel setup fails, check that AllowTcpForwarding is set to yes in /etc/ssh/sshd_config on the remote machine:

AllowTcpForwarding yes

Note that in both cases we use localhost rather than 127.0.0.1, in order to establish both IPv4 and IPv6 sockets if appropriate.

We can then check that the tunnel is established with ss on GNU/Linux:

# ss dst :8001
State      Recv-Q Send-Q   Local Address:Port       Peer Address:Port
ESTAB      0      0            127.0.0.1:45666         127.0.0.1:8001
ESTAB      0      0            127.0.0.1:45656         127.0.0.1:8001
ESTAB      0      0            127.0.0.1:45654         127.0.0.1:8001

Requesting documents

Now that we have a SOCKS proxy running on the far end of the tunnel, we can use it to retrieve documents from some of the servers that are otherwise inaccessible. For example, when we were trying to run this from the client side, we found it wouldn’t work:

$ curl http://private.example/contacts.html
curl: (6) Couldn't resolve host 'private.example'

This is because the example subnet is on a remote and unroutable LAN. If its name comes from a private DNS server, we may not even be able to resolve its address, let alone retrieve the document.

We can fix both problems with our local SOCKS proxy, by pointing curl to it with its --proxy option:

$ curl --proxy socks5h://localhost:8001 http://private.example/contacts.html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
    <head>
        <title>Contacts</title>
...

Older versions of curl may need to use the --socks5-hostname option:

$ curl --socks5-hostname localhost:8001 http://private.example/contacts.html

This not only tunnels our HTTP request through to remote.example.com and returns any response, it does the DNS lookup on the other end too. This means we can not only retrieve documents from remote servers, we can resolve their hostnames too, even if our client side can’t contact the appropriate DNS server on its own. This is what the h suffix does in the socks5h:// URI syntax above.

We can configure graphical web browsers to use the SOCKS proxy in the same way, optionally including DNS resolution:

Browsers are not the only application that can use SOCKS proxies; many IM clients such as Pidgin and Bitlbee can use them too, for example.

Making things more permanent

If this all works for you and you’d like to set up the SOCKS proxy on the far end each time you connect, you can add it to your ssh_config file in $HOME/.ssh/config:

Host remote.example.com
    DynamicForward localhost:8001

With this done, you should only need to type the hostname of the machine to get a shell and to set up the dynamic forward in the background:

$ ssh remote.example.com

Testing HTTP/1.1 responses

Before changing DNS A records for a website, it’s prudent to check that the webserver with the IP address to which you’re going to change the records will actually serve a website with the relevant hostname; that is, if it’s an Apache HTTPD webserver, that it has a valid VirtualHost definition for the site.

If you don’t actually have administrative access to the webserver to check this, there are many basic ways to test it; from the command line, three of the most useful include using curl, wget, or plain old telnet. For each, the method comprises manipulating the HTTP/1.1 request of the target webserver such that the website you want to test is used as the hostname in the Host header.

Using curl

Perhaps the quickest and tidiest way to check this from a Unix command line is using curl, the binary frontend to the libcurl library. You do this by making an HTTP/1.1 request of the target server’s IP address, while including an explicitly specified value for the Host. This is done using the -H option:

$ curl -H "Host: sanctum.geek.nz" 120.138.30.239

This spits out quite a lot of information, including some on stderr, so you may choose to filter it and just check for the <title> tag, with a little bit of context, to make sure the site you expected really is being returned as the appropriate response:

$ !! 2>/dev/null | grep -C3 '<title>'
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width" />
<title>Arabesque | Systems, Tools, and Terminal Science</title>
<link rel="profile" href="http://gmpg.org/xfn/11" />
<link rel="stylesheet" type="text/css" media="all" href="https://sanctum...
<link rel="pingback" href="https://blog.sanctum.geek.nz/xmlrpc.php" />

Using wget

An equivalent to the curl method can be achieved using the --header option for the commonly available wget:

$ wget --header="Host: sanctum.geek.nz/arabesque" 69.163.229.57 -q -O -

Using telnet

If you don’t have curl available, Telnet works just as well on both Windows and Unix-like systems, though it’s a little more awkward to work with, as you have to type the request and its headers straight into the TCP session:

$ telnet 69.163.229.57 80
Trying 69.163.229.57...
Connected to 69.163.229.57.
Escape character is '^]'.
GET / HTTP/1.1
Host: sanctum.geek.nz/arabesque

Note that you need to press Enter twice after writing in the hostname to check to complete the HTTP request. If this spits the HTML of your expected page back at you and closes the connection, then you’ve got some indication that things are configured correctly.

Yet another option to test this, particularly if you want to actually view the site in a browser, is to change your system’s hosts file to force DNS resolution to be different for the appropriate hostname on your local system.

Thanks to commenter Jaime Herazo for suggesting the wget method in the comments, and commenter sam for suggesting the -C option for grep.