Selenium | Extract All Links In A Webpage

Selenium Extract Links

Extracting Links from a Webpage

It is quite often that we need to get all the links or links from a specific part in a web-page and selenium WebDriver makes it easier for us. To understand how it works, let’s open our AUT https://www.wikipedia.org/

AutomationUtils | Selenium WebDriver

In above figure, anything which you can see is marked against blue arrows are “links”. In Wikipedia web page they are all scattered around.

Now if you’re asked to display all the links visible in the web page, how will you do that?

We know that we can identify any WebElement in a website using any of the locator technique (see locators in Selenium). Links are a kind of WebElement only. To solve this problem we will follow below technique:

  • Identify the WebElement(in this case Links) to be worked on.
  • Print all the WebElements (using for each loop)

Identify the WebElement

We are tasked with displaying all the links visible on a web page. Since there are more than 1 links in our AUT, we will use findElements(By)  method to extract all the links. findElements(By) always return a List of WebElement, so we will store the output in a List<WebElement> variable.
Try to inspect any link on the website, let’s take an example of very first link at LHS of the web page – English. Below is the result of Inspect Element:
AutomationUtils | Selenium WebDriver
In above figure, we can see that tag is <a>. Begin Inspect Element in other links and you’ll find this similarity for all the links. In HTML, links are always denoted by “a”, aka anchor tag.
If you remember from our locators in Selenium chapter, WebDriver provides a method called 
driver.findElements(By.tagName(“tName”));
Here “tName” is the actual tag name of the Element. In our case this is “a”. We will extract all the tagnames “a” and store them in a List<WebElement>.
Create a method getLinks() which will get all the Links from Wikipedia page.
public void getLinks() {
        
        List<WebElement> links = driver.findElements(By.tagName("a"));
    }

Iterate over “links” WebElements one by one using java for each loop and store each links WebElement variable into a temporary variable “link”.

public void getLinks() {
        
        List<WebElement> links = driver.findElements(By.tagName("a"));
        for (WebElement link : links) {
                
                }
           }

Print all the WebElements

“link” is a reference to the particular link WebElement in for-each loop. Inside this loop we will extract the text of each WebElement using getText() method of selenium. This method will extract the text of the “link” WebElement. Once we get the text, we will simply print it.

public void getLinks() {
        
        List<WebElement> links = driver.findElements(By.tagName("a"));
        for (WebElement link : links) {
                    System.out.println(link.getText());
                }
           }

For the code of main method, refer to the explanation and full code at last of chapter Selenium WebDriver: Interacting with WebElements – Dropdown

Full code is provided below:

package test;

import java.util.List;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import org.openqa.selenium.support.ui.Select;

public class FindLinks {
    public WebDriver driver;

    public static void main(String[] args) {
        FindLinks obj = new FindLinks();
        obj.getUrl();
        obj.getLinks();
        

    }

    public void getUrl() {
        System.setProperty("webdriver.chrome.driver", "/home/dj/Downloads/chromedriver/chromedriver");
        ChromeOptions options = new ChromeOptions();
        options.addArguments("start-maximized"); // open Browser in maximized mode
        options.addArguments("disable-infobars"); // disabling infobars
        options.addArguments("--disable-extensions"); // disabling extensions
        options.addArguments("--disable-gpu"); // applicable to windows os only
        options.addArguments("--disable-dev-shm-usage"); // overcome limited resource problems
        options.addArguments("--no-sandbox"); // Bypass OS security model
        driver = new ChromeDriver(options);
        driver.get("https://www.wikipedia.org/");

    }
    
    public void getLinks() {
        
        //Store all the "a" tagname WebElements to links variable.
        List<WebElement> links = driver.findElements(By.tagName("a"));
        
        //Iterate over all the "links" WebElements using java for-each loop
        for (WebElement link : links) {
            //print the text of each link variable using Selenium's getText() method.
            System.out.println(link.getText());
        }
        
    }


}

Using above method we saw how to extract all the links from a web page, but in real scenario the requirement is usually not this much trivial. In same Wikipedia web page we can see few more links at the bottom of the web page(see first screen shot) and requirement could be to verify if a particular link is present there or not.

In above scenario our logic will still work, but it is not that much optimized. Using above logic we will traverse through all the “links”, starting from the top of web page and complexity of our code will increase un-necessarily as we are not interested in checking top links. Situation may even get much worse if same link appears twice in the web page.

To resolve above complexity, we will focus only on that region where our desired link is present. Try to Inspect the bottom area where all the links are located(from Commons till Meta-Wiki).

AutomationUtils | Selenium WebDriver

In above figure, it is quite evident that bottom links group comes under a class “other-projects” and all the links reside in this group. In order to get the text of all the links in a particular area, we will follow below approach:

  • Identify the group or block where our desired links are embedded.
  • Use the group to traverse all the WebElements inside it, in our case we are interested only in links
Modify above  getLinks() method as following(highlighted in bold):
package test;

import java.util.List;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import org.openqa.selenium.support.ui.Select;

public class FindLinks {
    public WebDriver driver;

    public static void main(String[] args) {
        FindLinks obj = new FindLinks();
        obj.getUrl();
        obj.getLinks();
        

    }

    public void getUrl() {
        System.setProperty("webdriver.chrome.driver", "/home/dj/Downloads/chromedriver/chromedriver");
        ChromeOptions options = new ChromeOptions();
        options.addArguments("start-maximized"); // open Browser in maximized mode
        options.addArguments("disable-infobars"); // disabling infobars
        options.addArguments("--disable-extensions"); // disabling extensions
        options.addArguments("--disable-gpu"); // applicable to windows os only
        options.addArguments("--disable-dev-shm-usage"); // overcome limited resource problems
        options.addArguments("--no-sandbox"); // Bypass OS security model
        driver = new ChromeDriver(options);
        driver.get("https://www.wikipedia.org/");

    }
    
    public void getLinks() {
        
        //Identify the group and store it in a WebElement reference
        WebElement linkGroup = driver.findElement(By.className("other-projects"));
        
        //Use this reference to traverse all the WebElements inside it, in our case we will traverse through all the links within this group
        List<WebElement> links = linkGroup.findElements(By.tagName("a"));
        
        //Print text of each WebElement
        for (WebElement link : links) {
            System.out.println(link.getText());
        }

        
    }


}

Practice more on this using any website and try to extract the link url of any link, this is the exercise which you have to do. Comment below in case any clarification is required or you feel stuck anywhere in solving the exercise.

Thank you for reading.

Author: Dhawal Joshi

A post-graduate in MCA, ISTQB & ITIL certified QA with more than 8 years of experience in QA working with a CMMI Level 5 organization as System Analyst. I started my automation journey with HP UFT(formerly known as QTP) and for the past few years, I am using Selenium for automation. I also have experience in Android Application Development, Java, HTML, and VBScript. When I am not working, I like to spend time with my family, cooking and learning new developments in IT.

1 thought on “Selenium | Extract All Links In A Webpage

Leave a Reply

Your email address will not be published. Required fields are marked *