|
Guides: Network Guide ( lu.vernalex.com )
The purpose of this guide is to explain various network and Internet topics, and to provide help for my host lookup utility. The topics covered here are explained in a way that, I hope, most people can understand. I have simplified a few of the subjects so you should realize the descriptions of the topics here do not cover everything about the topics. Hopefully the information here can give you a good basic understanding of how the Internet works, and what the information returned by my utility means. If you have any questions or if can think of a topic I should add or expand then please contact me through the menu option.
IP Address
Every computer on a standard (TCP version 4) network is represented by four (4) numbers separated by periods that are called Internet Protocol (IP) addresses. These numbers that are separated by the periods range from 0 to 255. In computer terms each one of these numbers represents a byte of information as there are 8 bits in a byte, and converting from binary to decimal means there are 2^8 or 256 numbers (remember that 0 is counted). They appear in the form of www.xxx.yyy.zzz where the repeated letters count as those numbers from 0 to 255.
The purpose of these addresses is to provide Internet routing. Much like a home address allows the post office to route mail from one address to another adress, Internet servers bounce data along a path of IP enabled computers from a source to a destination address. In computer terms the servers between the two endpoints and including the endpoints are called hops. It takes a number of Internet hops, usually around 15-20, to go from a typical user to a typical Internet site such as google.com.
Whenever your computer connects to another computer through the TCP standard (a method used to route data), it sends the destination its IP address so that the destination can return data to it. This is much like a required return address. However, this return address is required because even if the destination doesn't superficially need to return any data, it must because TCP requires a delivery receipt so the source knows not to try to resend the data. If no receipt is received after a determined amount of time then the data is said to have timed out.
As you can see the number of IP addresses is not unlimited, and in fact there are only 4,294,967,296 total available addresses (because 256^4 = 4294967296). That doesn't even allow every person in the world to have their own IP address since there are over 6 billion people, and that doesn't even consider computer labs with multiple computers, IP enabled household equipment, network routing equipment, other forms of servers, etc.. And since the United States created the Internet about half of these addresses were reserved for the US. Which means that the countries of India and China that account for more than half the world's population do not nearly have enough addresses to accomidate their population.
These IP addresses are assigned to you by your Internet Service Provider (ISP). Normally through a service called DHCP your computer automatically receives its proper IP address whenever you originally connect to your ISP. This assignment usually entails a lease period that allows you to use that IP address for a period of time, and when it expires then someone else could receive that address. However, normally when the lease is near completion your computer will normally attempt to release that address for another set period of time. The governing body of all Internet addresses though is the Internet Corporation of Assigned Names and Numbers (ICANN), which is a non-profit corporation that designates specific Internet ranges for a particular business. Since their task is so large they often delegate tasks to other businesses for Internet addressing, but ICANN maintains the global Internet catalogs.
For you this means that every site you connect to on the Internet knows your computer's IP address and often the related site will write your IP into a log for statistical analysis. This is the only way standard Internet servers can specifically identify you and the address is fairly private because external sites can at most identify your ISP. However, your ISP may (and most likely does) keep logs of which IPs are assigned to which accounts you could be tracked if enough time is spent doing so.
Since your IP is typically dynamic rather than static, this means you could often have a different Internet address. This is especially common with dial-up connections as they almost never retain your IP from connection session to connection session, but broadband often has extended leases that let you keep your IPs unless you disconnect from their service for an extended period of time.
On a network if one user wishes to create a data tunnel to another computer the source must have a routable path to the destination. It is important to realize that only the computer receiving an initial data tunnel has to be routable (this concept will be clearer when I cover NAT). The destination in this case would be called a server and the source would be called the client.
Going back to the issue of the IP pool, there is a way around the limit. The answer here is through a system called Network Address Translation (NAT). Many businesses use this for their Internal network, and so do most homes. The reason for this is that it's cheaper than leasing the shrinking number of available IPs. This is a concept that was designed during the initial days of the Internet and is the only thing keeping version 4 of TCP viable. NAT allows you to distinguish a local area network (LAN) from a wide area network (WAN), where the wide area network in this situation would be the Internet. Particular IP ranges have been reserved for exactly this purpose, and they are non-Internet routable (RFC1597). These ranges are 10.0.0.0 to 10.255.255.255 (16,777,216 addresses), 172.16.0.0 to 172.31.255.255 (15,663,104 addresses), and 192.168.0.0 to 192.168.0.255 (5,767,168 addresses). This means there are less addresses available to the Internet, but in the long run it means there are more. Through the use of a NAT enabled router you can use the internal network IPs for your private LAN, without having to register these addresses with anyone. The real benefit though is that computers with these addresses can still access the Internet through the NAT router. From the Internet's perspective only the NAT enabled router consumes an IP, and all the traffic from the computers behind it comes from the router itself. The problem though is that since the computers exist on non-Internet routable addresses, they can reach the Internet and request information, but they cannot receive unsolicited connections. This means these internal computers cannot run servers because when the NAT router receives unsolicited connections it does not know which computer behind the router it was meant for, so it discards the data. This flaw is often sold off as a firewall feature by many companies such as Netgear, SMC, Linksys, etc.. The way to get around this is to either use port forwarding or DMZ (demiliterized zone). DMZ forwards all unsolicited requests it receives to a specific internal LAN IP (although some routers support broadcasting the message to multiple computers on the internal network), whereas port forwarding allows for specific events to go to specific computers such as all web traffic (port 80) to go to a particular internal computer. To better understand this concept you may want to read about ports (later on this page).
DNS Hostname
The concept of a Domain Name Server (DNS) hostname is much simplier than that of an IP, but the two are very interrelated. Basically a DNS is a description for an IP. The concept here is that an IP is hard to remember and that a name is easy to remember. And also a DNS allows for the appearance of a heirachy of Internet sites, whereas an IP is just a number amidst many others. The DNS, like the IP, is unique. You are certain to recognize these as they're what you should be used to typing into your web browser. Things such as microsoft.com or www.microsoft.com or ftp.microsoft.com, etc. are all DNS names.
The heirarchy of these sites begin at what are called top level domains. These are the extensions for a DNS such as edu, com, org, info, biz, etc.. These are maintained by the Internet Corporation of Assigned Names and Numbers (ICANN) just like IPs. Businesses then lease names for domains under the top level domains. So Microsoft reserved microsoft.com under the com top level domain. The business that owns a domain name can then create subdomains off that domain, and can even sell them to others. So Microsoft owns windowsupdate.microsoft.com for example. They can also create further subdomains under those subdomains for other tasks. For example Microsoft has different versions of the Windows Update site and an example would be v4.windowsupdate.microsoft.com.
The way a DNS name works is through resolution (or DNS lookup). When you attempt to connect to a DNS name, whether it be for a HTTP website, FTP site, a game server, etc. your computer connects to the top level domain server (or a cache of a top level domain that is provided by your ISP) and asks what the IP for that site is. Then the top level domain either tells you the IP (if the site is a domain only), or it forwards the request to a domain if it doesn't have the address. This forwarding will happen if your address is a subdomain off a domain, because only the domain will have the IPs for the subdomains. It is then possible it could be forwarded again, and again, etc. until it finds the computer with the IP information for that particular site.
There is another process called reverse lookup (or reverse resolution) that allows your computer to find a DNS name for a particular IP address. The same process is followed for a reverse lookup, except it is done with the IP. Since all DNS servers know which IP range is registered for what Internet site, the request can be forwarded until it reaches a computer with the DNS for a particular IP. This step is often slower and more computer intensive than normal DNS resolution.
Additional Information
HTTP Referer: This is the path that your web browser claims it arrived to the lookup page from. This string is only as valid as your web browser represents it. If you arrived to the lookup page from a search engine then it would most likely contain the search string that located the page. If you arrived to the lookup page from another page on vernalex.com then it is most likely the previous page. If you arrived to the lookup page from a link someone gave you by email, chat client or from one of your favorites then this line would not show up (since it is blank).
Network Class: Realistically this is an outdated term that has lost meaning, but due to the magnitude of the transition this concept is sometimes still used. Originally the Internet was divided into network classes. There were five in total; A, B, C, D and E. A was meant for large computer networks and consisted of all IP addresses between 1.0.0.0 and 127.0.0.0. B was designed for normal networks and ranged from 128.0.0.0 to 191.255.0.0. C was only useful for small networks, 192.0.0.0 to 223.255.255.0. D was meant to be used for multicasting, 224.0.0.0 to 239.255.255.255. And E was reserved for later use, 240.0.0.0 to 255.255.255.255. This meant that there were 127 class A networks, 64 class B networks, 64 class C networks, 16 class D networks, and 16 class E networks. As stated before this concept has been replaced by classless inter-domain routing (CIDR) and allows the Internet to be divided in a more dynamic manner.
Operating System: The lookup page attempts to identify the operating system your web browser reports. The detection of the operating system may be incorrect, or it may not be determinable if your web browser misrepresents itself or if it is an uncommon browser, operating system or a combination of both. Your operating system is the fundamental software that runs your programs on your computer, and for the majority of people this would be some version of Microsoft Windows.
Remote Port: This is the connection pointfrom which your connection to the web server occurred from. Normally the port used is in the thousands and would change every time you reconnected to the page (reload the page may not be good enough to alter it). Read about ports in the other networking topics below.
Reported Agent: The reported agent is the information your web browser willingly passes to a web server. This is done so the remote website can properly serve your needs more, so if it knows you're using an old web browser the website can use legacy HTML or if you're on a Mac it could give preferential treatment to Mac downloads. The problem with the agent is that it is fairly cryptic, the syntax of the statement varies from browser to browser, it doesn't always include necessary date. Furthermore, in some browsers the agent can appear in different forms depending on the user, and this makes it a bigger headache for deciphering the correct information.
Web Browser: A web browser is a program that is used to access the hyper text transport protocol (HTTP), which allows the browser to download hyper text markup language (HTML) so it can be parsed on the local user's computer. Hyper text is just a fancy way of saying advanced text and represents the concept of scripted languages such as HTML, Javascript, ActiveX, etc. that allow your browser to represent data, whether it be through text, images, movies, sounds, tablulated data, etc.. There have been many HTML standards written, and these standards are handled through the world wide web consortium (W3C) . However, many HTML specifications were never standardized and were just evolved in a fight between Netscape and Microsoft. Also, no web browser currently supports 100% of the web standards and Internet Explorer is often regarded as the worst for standards compliance even though it is used exclusively by over 90% of the Internet users. Because of this every website will appear differently, sometimes slightly and sometimes totally differently, within other browsers. This creates a problem because many people only consider Internet Explorer when developing websites, and this creates a problem since IE is no longer actively developed and because its page rendering rules are so skewed.
Other Networking Topics
Hardware Address / Heart Address / MAC Address: There are many names for this term, but they all mean the same thing. The media access control (MAC) address is a twelve (12) character hexadecmial number, usually presented with either colons or dashes between every two (2) characters, such as 00-50-8D-B0-3A-3F. If you are not familiar with hex then it is simple a base 16 counting system (0-9, A, B, C, D, E, F) that computers use because it allows for smaller representations than large binary values, but easily converts from the base 2 of binary. Every normal network device from Internet core routers to the network card on your computer are hardwired with a unique MAC address. This value is normally not transmitted over the Internet, but rather is used for local routing and computer identification. The first six digits of the MAC represent the vendor of the device, and the vendor then assigns the rest of the unique numbers. You can even determine the vendor of the device through an online search database, and in the case above that network ID is a made-up (although possibly assigned) Abit network card.
Firewall: A network device or software application used to secure a network from inbound taffic, outbound traffic or a combination of both inbound and outbound traffic that uses a set of rules to determine if particular data is allowed through. Most corporations and large businesses make use of a firewall to secure their corporate network from Internet troubles such as worms. The rules that firewalls use to secure a network are based mainly on allowing or denying specific IPs or IP ranges, allowing or denying specific ports, and allowing or denying specific forms of traffic. Home users often will make use of a personal firewall, either software or hardware, to help protect their computer from unwanted intrusion. Many home routers even come equipped with a firewall of sorts that will accomplish this task without having to fiddle with error-prone software.
Router: A network device that passes network traffic from one network pipeline to another in an attempt to forward data from a source IP to a destination IP. Often a router must make a decision which pipeline of several they should pass data onto, and this is governed by a complex set of rules based on routing tables, quality of service and pipeline capacities. Many home users have severely limited home routers that act as a NAT bridge between the Internet and their home network, but superficially they are still routers. Most of the highend routers used for Internet routing are made by Cisco. Packet: All data on the Internet is transmitted in bursts of Internet traffic that is called a packet, which is stamped with information such as the source IP address, destination IP address, time to live, source port, and the destination port. Depending on the protocol used to transport the data it could contain less or more information in the header (top or start) of the packet.
Protocol: A standardized form of communication used to transport data over a network. There are lots of protocols, but there are only three that are typically Internet routable (so those are the kinds you will most likely care about). These three are transmission control protocol (TCP), user datagram protcol (UDP), and internet message control protocol (ICMP). These all fall under the category of TCP/IP. The differences are pretty clear. TCP is a robust protocol that the TCP layer on your operating system handles. It requires return receipts on all data, so it has lots of crosstalk but it's easy to implement in a program. UDP is a simple protocol that your operating system has little intervention with, and as such is mostly implemented by the program using it. This means this protocol does not require return receipts, and is often used by voice communcation, games, and other time critical programs because it does not have lots of crosstalk that could eat up bandwidth answering all received packets. ICMP is another simple protocol but is mainly only used for ping. This is a response time checker that sends a timestamped request to a server and then the server returns so you can tell the time it takes to get there and back.
TCP Ports (Sockets): A port is a number that represents an entrypoint into a computer in the TCP standard and ranges from 0 to 65535 (16 bit, 2^16). When you attempt to connect to a server your data, in the form of a bundle called a packet, is stamped with two ports. The destination and the source port. The source port is the beginning of the data stream, and the destination port is the end of the data stream. The server must be listening on that designated port, meaning that the system is waiting for a packet stamped with that destination port, in order to receive that data. If the server isn't listening on the port the source computer is either told that, or it receives no response. A list of standard ports exist, such as 80 for web and 21 for FTP. Your web browser understands that when you type in http://www.microsoft.com that you actually mean http://www.microsoft.com:80 (and you can test and find out that both of those sites work, because they're the same). Servers can be set for any port, so a website could exist on port 79, but it would be annoying because everyone would have to specify that port after the site. Time to Live: The maximum number of formal router hops a data packet will survive before an attempt to return it to the host occurs. This exists because it is possible a packet could be handed off infinitely to other routers and end up going around in circles if no server can find the destination for the packet. So once the packet expires the source is notified, if possible, and if not the packet is discarded. This alleviates the potential of clogging the Internet with lost everliving data packets. This should make it clear that there is no best path from a source to a destination, and the Internet routers only make best guesses based on specified routing tables (rules that say which data should go where under a specific condition). |
|