Skip to main content
University of Michigan Innovation Partnerships
University of Michigan Innovation Partnerships

Censys, Mapping the Internet One Device at a Time

3/16/2018

Let’s go back in history for a moment, back to 2015: Security researchers from Austrian security company SEC Consult found that various connected devices, such as home routers, network storage devices, IP cameras, and mobile- and Internet-connected phones, were re-using the same hard-coded cryptographic keys. The researchers extracted more than 580 unique keys from more than 4,000 devices from 70 manufacturers, and found almost half of them were actively being used across different devices.

In the case of SSH keys, if someone could remotely access one device, then that same key could be used to access other devices, even if that device came from a different manufacturer. In the SEC Consult analysis, 80 SSH host keys were used by at least 900,000 devices. Websites using HTTPS rely on security certificates to encrypt traffic sent between users and the server. If someone had one server’s certificate, that person could decrypt a different server’s traffic to extract usernames, passwords, and other sensitive information. The researchers found that 150 HTTPS server certificates were being used on over 3.2 million hosts.

While the research highlighted some security weaknesses in Internet-connected devices, it also showed how researchers use specialized Internet search engines to find vulnerable devices on the Internet. SEC Consult used Censys, a project launched by the University of Michigan that scans the Internet every day to create a complete inventory of every connected device, to find the hosts using the hard-coded keys.

“The Censys data lets users take advantage of Internet-wide scanning without having to do their own scans,” says University of Michigan computer science and engineering professor J. Alex Halderman. “It is like a map, or a scale model, of the Internet in a box that you can interact with in realtime to learn about the things inside.”

Just as people use search engines such as Google, Bing, and DuckDuckGo to find specific information on the Internet, there are special search engines that can find information about Internet-connected devices and networks. Think webcams, printers, smart light bulbs, industrial control systems, monitoring systems. Information about these devices is just a special search query away.

Different tools scan the Internet differently, but the general idea is the same: scan all IP addresses to find the devices, and then collect available information about the machine’s software and hardware. Think of the tools like a census-taker going door-to-door and asking questions to whoever opens the door. The search engines won’t have much to say about devices that don’t answer questions, but it turns out devices are very chatty and share all kinds of details such as the device type, operating system version, and what kind of software is installed. These search tools aren’t going past the doors or launching any attacks to collect information; everything is visible from the “street.”

“We never log into devices. We never try to change their configuration,” says Zakir Durumeric, an assistant professor of computer science at Stanford University and the CTO and co-founder of Censys.

Mapping the Internet is a little different from mapping the physical world. In the physical world, two people can go down a given street and take pictures, and the resulting photographs will look the same as each other, with some variations because of the time of day or the date the pictures were taken. Internet maps have more room for variation, because the tools focus on different things.

Imagine that census-taker noting the color of the doors, another noticing that certain neighborhoods use the same building materials, and another tracking what kind of roofs were on each building. The resulting maps will be similar, with the same intersections, location of buildings, and number of people, but they will show different details.

Censys is just one of many. Shodan and Project Sonar are two other well-known search engines that scan the Internet for device information.

How Internet Scanning Works

Every device on the Internet has some kind of a unique identifier so that other machines can find it. Internet-wide scanners focus on the IPv4 address space, with its 4.3 billion addresses. Censys uses open-source network scanner ZMap and application layer scanner ZGrab to look at—pings—every single IP address to find out what kind of device it has been assigned to. The device responds with different types of information, such as software details about what kind of encryption it uses or how it is configured.

The device may be configured to be chattier than the default. There are industrial control systems that, when probed, will say, “I’m a water pump, I’m located at this intersection,” to make it easier for the device owners to do their jobs. Network protocols give a lot of information even before the user has logged in. For example, connecting to a device’s SSH port will provide you with the server operating system and the version of SSH it is running, even before you enter the password. Not all devices are that explicit, but what it doesn’t say can be just as revealing.
“The older version that was unpatched would send back a slightly different response than the version that was patched,” says Durumeric.

Put all this information together, and you wind up with a map of almost everything on the Internet.

Almost everything, because if the device is behind a router or some other Internet gateway, then the scanner won’t be able to see it. For many home users, the scanner will see only the router modem provided by the ISP, and not the Samsung smart TV, the Dell laptop, and the Kindle Fire that’s in the home. But there are plenty of other devices that are directly connected to the Internet. People over the years have found ATMs, bank safes, water park facility controls, IP cameras used in Las Vegas casinos, and even industrial control systems for a nuclear plant using these search engines.

This seems like a treasure trove of information for attackers, but the people behind these scanning projects say these maps help defenders protect the systems under their control. The scanners help organizations see what parts of their infrastructure are visible when someone connects to a specific IP address and network port. That part of the attack surface, understanding what kind of vulnerabilities are visible from the outside, is the most crucial part of defense. Researchers can use these tools to look up software or configuration details associated with a vulnerability to discover how widespread the problem is, what kind of devices have the issue, who owns the device, and sometimes, the approximate location.

“We can see what is publicly visible from the outside of your network, and basically what kind of door you have, what’s in the window, things like that,” Halderman says. “It’s a Google Street View of the Internet. You’re driving around. What do you see?”
While there are lots of overlaps in the collected data, each search engine has its own focus and are used in different ways. For example, Shadowserver, run by the volunteer watchdog group Shadowserver Foundation, collects information about malicious activity such as malware, spam, botnets, and fraud and shares the resulting reports with law enforcement and industry partners. The scanning results can be used to identify a botnet’s server structure and identify associated domains.

“A lot of people are just, ‘Oh yeah, just scan the Internet,'” says Bob Rudis, chief security data scientist at Rapid7. “It’s not as easy as anyone really thinks it is.”

Shodan, founded in 2009 by John Matherly, scans continuously and provides all the information in one big searchable data set. It grabs service banners from devices which contains information such as server software, device type, brand name, and even software version numbers. It can be used to find devices on the Internet that hasn’t been correctly configured to prevent unauthorized access.

Project Sonar was launched by security researcher HD Moore back in 2014 and is currently managed by security consultancy Rapid7. Sonar performs other types of scanning to enhance the data gathered by the IP address scanner such as VHost scans and DNS-based lookups. VHost scans look at all the domains associated with an IP address, giving information such as how many websites are hosted on a single address and how that server is using content delivery networks such as Cloudflare and Akamai. Expanding the type of scans performed means there are more things to count, but it provides a more broad, diverse picture of the Internet.

“Going just off of IP address isn’t super useful unless you know exactly who owns it, who owns that particular IP address, especially since IP addresses are changing pretty quickly,” says Moore. “You really need to have a really solid attribution engine that can match domains to companies, and companies to IP addresses and IP ranges.”

Back in 2014, when Sonar first launched, it took over a day to identify the targets, scan the Internet for the machines, collect the results, and make the data available in a format researchers could use. Now, the same process takes a handful of hours. Censys relies on fast servers and multiple network connections to be able to scan the whole Internet, every day, in four hours.

“[IP address scan] just gives you one view of the Internet. If you were just looking at a map and all you have was roads, sure you could get from point A to point B, and you would know where things are, but you wouldn’t know what else is around,” says Rudis. “The other type of scans gives the Google Maps view, like what buildings are here, and where the coffee shops are. Each different type of scan gives us another layer on this map of the Internet that lets us understand things in a different perspective.”

The search engines provide both researchers and enterprise defenders with information about what is plugged in and vulnerable. For many defenders, the challenge with patching vulnerabilities stems from not always knowing what devices they have. Having a large view of all the systems can be helpful in finding things that they didn’t know they had.

“You can’t patch what you don’t know you have but the attacker might well know you have it. Might find it. Might exploit it,” says Halderman.

There’s also value in knowing what the infrastructure used to look like, and being able to compare it to what it looks like now. Most people easily see the value of saving backups and keeping track of historical content on the Internet, but until recently, there hasn’t been a lot of interest in preserving infrastructure. Having a recorded history of what was running when and where is really important, especially since it helps to find out whether particular issues might be getting better or worse over time. For example, it is possible to use the Censys data to find out the percent of the Internet that is using HTTPS. Or Sonar to find out how many devices exposed SMB on the Internet prior to the WannaCry ransomware attack last year and how many devices still have SMB exposed.

While this doesn’t have direct security implications, having this kind of historical data with Internet-wide scanning was helpful during the furor over the email server used by former Secretary of State Hillary Clinton, says Moore. The Internet scan data helped identify where that mail server was, what software it was running, and what kind of services were open on that server.
“Having a recorded history of what was actually on it, what services it runs, … keeps everybody a little more honest,” Moore says.

How else can enterprise defenders use the Internet maps to secure their users? Google last year announced plans for Chrome to stop trusting Symantec as a certificate authority and reject all certificates issued by Symantec (including Symantec-owned brands Thawte, VeriSign, Equifax, GeoTrust, and RapidSSL). Google announced a phased approach, with the first set of certificates being impacted with Chrome 66. If sites using a SSL/TLS certificate from Symantec that was issued before June 1, 2016, don’t replace their certificates by March 15, Chrome Beta users will start seeing errors when navigating to those sites.

“Hey right now, all things being equal, with our last certificate scan at [port] 443, is this gonna be a really, really bad, terrible thing?”

Well, the available scan data says the mass-revocation won’t necessarily break the Internet, but the number of affected sites are still pretty high. If the sites don’t take care of the certificates, there will still be a large swath of the Internet that Chrome users won’t be able to reach. Enterprise security teams can use the scan results to go back to their web teams and paint the picture of what may happen.

“You can paint the picture, ‘Hey, you really should take care of this because Google is gonna definitely do this to you,'” says Rudis.

[by Fahmida Y. Rashid for DUO Security’s Decipher]