How to find all pages on a website


In the SEO world, data reigns supreme. Access to it is crucial for making informed decisions and crafting a successful strategy for your project. To win this game, you should focus on two things: staying up-to-**** with the market and having a thorough understanding of your site.

In this article, we’ll focus on getting to know your site, as the greatest opportunities and potential pitfalls sometimes lurk in the most unexpected of places. Without a complete list of the pages on your website, you run the risk of missing out on valuable opportunities for improvement, which also means losing an edge over your competitors. 

Let’s look at some possible reasons why you might need to find all the pages of a website, which tools will help you do this effectively, and exactly how.

Why do I need to find every single page?

Search engines are constantly introducing new algorithms and applying manual penalties to pages and sites. So if you don’t have a thorough understanding of all your website’s pages — you’re tiptoeing through an SEO minefield.

SEO minefield

In order to avert a serious setback, you must keep a close eye on all the pages that make up your website. Doing so will not only enable you to discover pages you already knew about, but will also help you find forgotten pages, pages you had no idea existed and would otherwise not be able to view.

There are several possible scenarios when you have to know how to find all the web pages of a site, such as:

  • Changing website architecture
  • Moving to a new domain or changing the URL structure
  • Finding orphan pages
  • Finding duplicate pages
  • Finding 404 pages
  • Creating redirects
  • Creating a website hreflang file
  • Checking for canonical and noindex tags
  • Setting up internal linking
  • Creating an XML sitemap or robots.txt file

How to find all pages of a website

Now, when it comes down to finding all the web pages that belong to a single website, the following options are available:

  • Use SE Ranking’s Website Audit to find all crawlable web pages.
  • Use Google Search Console to discover pages that are only visible to Google. 
  • Use Google Analytics to detect all pages that have ever been visited.
  • Use Bing Webmaster tools to catch pages that are only visible to Bing. 
  • Download the website pages list using command line commands.
  • Extract website URLs from Sitemap.xml.
  • View all pages of a website via WordPress and other plugins.

Let’s explore how to use these tools to scan a website for all pages.

Finding crawlable pages via SE Ranking’s Website Audit

Let’s start by collecting all the URLs that both people and search engine crawlers can visit by following your site’s internal links. Analyzing such pages should be your top priority as they get the most amount of attention.

To do this, we’re going to first need to access SE Ranking, add a website or select an existing one, and open the Overview tab in the Website Audit Report.

Accessing Website Audit in SE Ranking

Note: The 14-day free trial gives you access to all of SE Ranking’s available tools and features, including Website Audit.

Next, let’s configure the settings to make sure we’re telling the crawler to go through the right pages. To access Website Audit settings, click on the Gear icon in the top right-hand corner:

Accessing Website Audit settings

Under settings, go to the Source of pages for website audit tab, and enable the system to scan Site pages, Subdomains, XML sitemap to verify that we’re only scanning what’s been clearly specified, and are including the site’s subdomains along with all their pages:

Accessing Source of pages for website audit

Then, go to Rules for scanning pages, and enable the Take into account the robots.txt directives option to tell the system to follow the instructions specified in the robots.txt file. Click ‘Apply Changes’ when you’re done:

Accessing Rules for scanning pages

Now, go back to Overview tab and launch the audit with the new settings applied by hitting ‘Restart audit’:

Restarting Website Audit

Once the audit is complete, go to Crawled Pages to view the full list of all crawlable pages:

List of Crawled Pages in Website Audit

Together with a list of all URLs found, you’ll see all of the site’s essential SEO metrics, including:

  • Referring pages: the number of pages on your website that link to the specific URL. 
  • Number of errors on each page.
  • Number of characters in the page’s URL.
  • Type and version of URL protocol.
  • Whether the page is in the sitemap.
  • Server response code on the page.
  • Whether the page is blocked by robots.txt.

The above metrics will appear as default columns in the Crawled Pages tab. However, you can customize the report to meet your specific requirements by adding other parameters. Simply go to the Columns section above the table on the right and enable or disable any metric as desired.

Crawled pages columns

Crawled pages can be sorted by the errors, warnings, and notices that they contain. This will enable you to prioritize pages when fixing issues. You can also compare reports once you have audited your website at least twice. This helps you observe the changes in your crawled pages over time.

Comparing and sorting crawled pages

It’s also possible to use filters to sort out pages by different parameters. For instance, if you only want to see 200-status-code pages, as in those that are working correctly, add a filter like so:

Now it’s time to export the results. Go to Export and choose to save the data either in .xls or .csv format.

Exporting data in Website Audit

RUN A WEBSITE AUDIT

Score your website in 2 minutes.

Enter any website URL to get a detailed report on tech issues and suggested solutions.

Finding all pages via Google Search Console

Another tool that you can use to find all pages on a website is Google Search Console. Keep in mind, however, that GSC will only show you pages that Google can access.

Start by opening up your account and going to the Indexing tab. Choose the Pages report and select ‘All known pages’ instead of ‘All submitted pages’.

Pages in GSC

The green box labeled “Indexed” will give you the number of URLs indexed by Google. Click on ‘View data about indexed pages’ below the graph. From here, you can find all URLs on a domain that Google indexed, as well as the **** of their last crawl. You can export them in Google Sheets or download them as .xls or .csv files.

Indexed pages in GSC

Now, let’s go back to check the Not Indexed pages (pages that were not indexed and won’t appear in Google). Get ready to roll up your sleeves and do a lot of manual work.

As you scroll down, you’ll see a list of reasons why some of the pages on your website aren’t indexed.

Why Google doesn't index pages

You can view different categories, such as redirect errors, pages excluded by ‘noindex’ tag, those blocked by robots.txt, and so on.

By clicking on the error category, you’ll see a list of affected pages. Going through each one of them will give you unfiltered access to every single page that Google has discovered on your site.

Pages blocked from indexing in GSC

Another useful report in GSC for finding all website pages is the Search Results report. This report displays all pages that have received at least one impression in search results. To access it, go to Search Results in the Performance tab, set up the largest possible period, and choose Pages. Finally, export your report.

Pages that get impressions

You can also use the Search Analytics for Sheets add-on to retrieve data from GSC on demand and create automatic backups in Google Sheets.

Finding all pages with pageviews via Google Analytics

You can also find all website pages by carefully studying the data in your Google Analytics account. There is only one condition: your website must be linked to your Google Analytics account from the get-go, so that it can collect data behind the scenes.

The logic here is simple: if someone has ever visited any page of your website, Google Analytics will have the data to prove it. And since these visits are made by people, we should ensure such pages serve a distinct SEO or marketing purpose.

Start by going to Reports → Engagement → Pages and Screens and click on Page path and screen class.

Pages and screens report in GA

You can also click on ‘Views’ to get the arrow pointing up and sort the page URLs from least to most pageviews. Ultimately, the least visited pages will be seen at the top of the list.

Then, share or export the data into a .csv file.

Finding all website pages by using Bing Webmaster Tools

Google is considered a search giant, but Bing is still one of the most popular search engines to ****. As of January 2023, its global search market is 8.85%, making properly optimizing your website for Bing a smart strategy. 

You can also make use of Bing Webmaster Tools to find all pages on a website that are indexed by Bing. The process is straightforward, but before getting started, make sure that you have added and verified your website.

Once your site is set up, go to Site Explorer in the left-hand navigation bar and choose to filter by ‘All URLs’. The report will display all the pages that Bing sees on your site. Click on the Download button to export the data.

Pages in Bing

Finding all pages on a website using the command line

A more techie approach to finding the list of all pages on a website is to use the command line. This method involves interacting with a computer through text commands. 

To get all URLs from a website via the command line, you’ll first need to install Wget. This command line tool allows you to retrieve files from the web. The installation process varies depending on the OS you’re using. For instance, macOS requires that you install the Homebrew package manager first before installing command line tools like Wget.

To install Wget, open a terminal window and type in the following command:

brew install wget

Next, download your website. Enter:

wget -r www.examplesite.com

Wget will download your website recursively. It will start with the main page of the website and all its linked pages, images, and other files. From there, it will follow the links on those pages and download the pages they link to, and will continue this process until it has downloaded the entire website.

Once the website is downloaded, you can ask Wget to list out the URLs by typing:

find www.examplesite.com

Finding all website pages via Sitemap.xml

XML sitemap is also a great way to find all pages on a website because it provides a list of all the pages on it that the owner wants search engines to index. It works like a roadmap, guiding search engine bots through the site’s structure and making it easier for them to understand how everything is organized.

The get the list, follow these steps:

The sitemap can usually be found in the website’s root directory or by typing “/sitemap.xml” at the end of the website’s URL, like in http://sitename.com/sitemap.xml. If the website has multiple sitemaps, they will be listed in the main sitemap file, which is typically named sitemap-index.xml. Here, you can find a list of sitemaps and choose the most relevant one to get a list of the website’s pages.

  • Extract URLs from your sitemap into Google Sheets. 

Though this is another tech-heavy method, it will save you tons of time and effort by importing all URLs into Google Sheets in just a few seconds. However, you will need to do some preparatory work first. Begin by creating a new sheet and then go to Extensions. Next, choose ‘Apps Script’.

Google Sheets extensions

Now you’ll need to copy and paste the following custom JavaScript code into script editor to create a new function:

function sitemap(sitemapUrl,namespace) {

  

  try {

    var xml = UrlFetchApp.fetch(sitemapUrl).getContentText();

    var document = XmlService.parse(xml);

    var root = document.getRootElement()

    var sitemapNameSpace = XmlService.getNamespace(namespace);

    

    var urls = root.getChildren('url', sitemapNameSpace)

    var locs = []

    

    for (var i=0;i <urls.length;i++) {

      locs.push(urls[i].getChild('loc', sitemapNameSpace).getText()) 

    }

    

    return locs  

  } catch (e) {

    return e 

  }

}
Script editor in Google Sheets

From here, save and run the test. The editor will ask for access permission. If the script is implemented successfully, no error messages will appear Instead, you’ll see that the execution has started and completed.

Execute new script

Now you can get back to your Google Sheet and enter the following formula:

=sitemap("Sitemap Url","Namespace Url")

Once you have located your sitemap, you should have the sitemap URL at hand, from which you can directly copy the Namespace URL. In the screenshot below, you can see how this works with SE Ranking’s sitemap:

  • Sitemap URL is in the browser address bar.
  • Namespace URL is in the first line of sitemap content.
Sitemap and Namespace URLs

Copy these addresses and paste them as links into the formula. This is what you should get in the end:

=sitemap("https://seranking.com/sitemap.xml","http://www.google.com/schemas/sitemap/0.9")

Paste this formula into your Google Sheet, and press Enter to get all the URLs extracted from your sitemap.

Site URLs extracted from sitemap

Finding all website pages via WordPress plugins

If your website runs on WordPress, there are plugins that can help you find all of its pages. The two most commonly used options are:

To use these plugins, download and install them to your ‘/wp-content/plugins/’ directory. Once installed, activate them from your Plugins page in WordPress. Once you’ve completed the activation, you can list or export all URLs of your website.

If you use a different CMS, check whether it has built-in plugins for this purpose or has a separate one in the extension directory. For instance, Joomla has many plugins for generating sitemaps to help you get a list of website pages. Similarly, Drupal has a sitemap module that can come in handy when collecting the pages on your site.

How to see all pages of a website on Google

We have discussed various tools that can help you find all the pages on your website, but we’ve saved the simplest option for last—and for good reason. While this method is the easiest, you must be careful when using it. 

Google your site, but do it right. This means using Google search operators, which are special commands that refine the search based on set criteria.

Simply go to Google Search and type “site:website name” like in “site:seranking.com”. You won’t need to add http:// or www, but make sure there are no spaces between the operator and query; otherwise, the results will be incorrect.

Website pages on Google

Still, you must consider that this command wasn’t created to display all indexed pages on your site, to which Google’s John Mueller confirmed.

John Mueller

Search Advocate at Google

The short answer is that a site: query is not meant to be complete, nor used for diagnostics purposes …This query limits the results to a specific website. It’s not meant to be a comprehensive collection of all the pages from that website.

One last tip is to use Google Search Console, but in a slightly different way than was explained in previous sections. Focus on the page’s impressions (the number of times the page was shown in search) but for a short period of time; like seven days, for instance.

Closing thoughts

Locating all of your website’s pages is crucial to improving its SEO and serves as the initial step towards further optimization activities. With this data, you can identify pages requiring improvement, update outdated content, find all website links and fix broken ones, and optimize your overall internal linking structure.

By using the tools and methods described in this article, you can quickly uncover all of the pages on your website and gain valuable insights into its performance. So, take a brief moment to compile a page list to stay on top of your SEO and marketing efforts. Take it from the SE Ranking team: it won’t take much time at all when you’re using our tools.



Source link

Social media & sharing icons powered by UltimatelySocial
error

Enjoy Our Website? Please share :) Thank you!