Website Footprinting - Part 2 | CyberWiki - Encyclopedia of Cybersecurity

Website footprinting is the technique of monitoring and analysing a target organization's website for information. Sensitive information, such as the names and contact details of the organization's leaders and details of forthcoming projects, can be found on their website.

This topic is divided into two articles. Continue Reading Part 1.

Web Spiders

Simply providing a URL to the web spider will reveal all of the files and web pages on the target website. The web spider then launches hundreds of requests to the target website and analyses the HTML code of all incoming answers for additional links.

If any new links are detected, the spider adds them to the target list and begins spidering and analysing the new links. This technology allows attackers to locate not only exploitable web-attack surfaces, but also all of the directories, web pages, and files that comprise the target website.

Web spidering fails if the target website has a robots.txt file in its root directory that lists directories that should not be crawled.

Attackers can use tools such as Burp Suite, WebScarab, Web Data Extractor, ParseHub, and SpiderFoot to collect sensitive information from the target website.

Extracting Website Links

Extracting website links is a critical component of website footprinting, in which an attacker examines a target website to establish its internal and external links.

An attacker can use the information acquired to discover the apps, web technologies, and other connected websites that are linked to the target website. Dumping the acquired links can also disclose significant connections and extract URLs of other resources like JavaScript and CSS files.

This data helps attackers identify vulnerabilities in the target website and determine how to exploit the web application.

Attackers can use various online tools or services such as Octoparse, Netpeak Spider, and Link Extractor.

Extracting Website Information from https://archive.org

Archive is a Wayback Machine on the Internet Archive that explores old versions of websites. An attacker can gather information about an organization's web pages since its creation started using this method of investigation.

Because the website https://archive.org keeps track of web pages from their creation, an attacker can obtain information that has been removed from the target website, including web pages, audio files, video files, photos, text, and software programs. This information is used by attackers to conduct phishing and other sorts of web application assaults on the target organisation.

Gathering Worlist from the Target Website

The words used on the target website may expose important information that assists attackers in further exploitation. Attackers compile a list of email addresses associated with the target website. This data enables the attacker to conduct brute-force attacks on the target organisation. An attacker used the CeWL tools to collect a list of terms from the target website and then conducts a brute-force attack on the previously acquired email addresses.

Extracting Metadata of Public Documents

The target organization's website may contain useful material in the form of PDF documents, Microsoft Word files, and other files in various formats. The data mostly contains hidden information about publicly available papers that can be examined to extract information about the target organisation.

An attacker can use this information to execute malicious operations against the target organisation, such as brute-forcing authentication using staff usernames and e-mail addresses, or social engineering to distribute malware that can infect the target system.

Metadata extraction tools such as Metagoofil, Exiftool, and Web Data Extractor automatically extract critical information such as client usernames, operating systems (OS-specific exploits), email addresses (possibly for social engineering), list of software (version and type), list of servers, document creation/modification date, and website authors.

Searching for Contact Details on the Company Website

Attackers can conduct a website search on the target company's website to acquire vital information about the company. Websites are generally used by organisations to inform the public about what they do, what services or products they offer, how to content them, their partner information, location and their branches and so on. Attackers can use this information to launch additional assaults against the target company.

You might be interested in,

We hope this helps. If you have any suggestions or doubts you can add a comment and we will reply as soon as possible.