Boost Your Web Scraping Game with Axios and Proxies


Axios is a JavaScript champ when it comes to fetching data from the web. But sometimes, websites aren't so welcoming—they block your scraper. No worries! You can dodge those blockers using proxies.

In this guide, we'll get into how you can do this with some cool, hands-on examples. From free to premium proxies, we've got it all covered. Let's dive in!

Setting Up Your Project

Step 1: Check If NodeJS and npm Are Installed

First off, you need NodeJS and npm (Node Package Manager) on your computer. Open your terminal (Command Prompt on Windows, Terminal on Mac) and type the following:

node -v
npm -v

If you see version numbers for both, you're good to go! If not, you'll need to install NodeJS and npm first.

Step 2: Create a New Folder

Next, let's make a special folder where all the magic will happen. Type in:

mkdir myAxiosScraper

This makes a new folder named "myAxiosScraper."

Step 3: Enter the Folder

You need to "go into" this folder in the terminal. Simple, just type:

cd myAxiosScraper

Step 4: Initialize Your NodeJS Project

Now, let's set up a new NodeJS project within this folder. Type this:

npm init -y

This will create a file called package.json in your folder. Think of it as the recipe book for your project.

Step 5: Install Axios

Axios is the tool that'll help you get data from websites. To put it in your toolkit, type:

npm install axios

And there you have it! You've successfully set up a NodeJS project and installed Axios. You're all ready to start scraping websites like a pro!


Simple Proxy Setup with Axios

Let's kick things off with a basic example. We'll use httpbin.org as our target website and a fictional proxy IP.

IP: '203.42.142.32', Port: '8080'

const axios = require('axios');
axios.get('https://httpbin.org/ip', {
  proxy: {
    protocol: 'http',
    host: '203.42.142.32',
    port: 8080
  }
})
.then(res => console.log(res.data))
.catch(err => console.log('Oops!', err));

Run it. If you see the IP 203.42.142.32 in the output, you've just sent a request via a proxy. High five!


Handling JSON

Axios is pretty smart; it can handle JSON data natively. But if you're paranoid about running into non-JSON responses, here's how to deal with them:

axios.get('https://httpbin.org/ip', {
  proxy: rotateProxy()
})
.then(res => {
  let data;
  try {
    data = JSON.parse(res.data);
  } catch (e) {
    data = res.data;
  }
  console.log(data);
})
.catch(err => console.log('Uh-oh!', err));

Premium Proxies for Smooth Sailing

Free proxies can be dangerous and therefore we don't recommend using those. For something more reliable, consider paid options. Here's how to set it up (replace the fields with the actual proxy data):

axios.get('https://httpbin.org/ip', {
  proxy: {
    protocol: 'http',
    host: 'premium.proxy.com',
    port: 8080,
    auth: {
      username: 'yourUsername',
      password: 'yourPassword'
    }
  }
})
.then(res => console.log(res.data))
.catch(err => console.log('Yikes!', err));

Going Auto-Pilot with Environment Variables

You can automate proxy settings by storing them as environment variables. Do this in your terminal:

export HTTP_PROXY=http://203.42.142.32:8080

Then, your Axios request becomes:

axios.get('https://httpbin.org/ip')
.then(res => console.log(res.data))
.catch(err => console.log('Oops!', err));

Rotate Proxies Like a DJ Spins Tracks

If you want to level up your proxy rotation, you might want to add features like priority-based selection and error-handling.

Priority-Based Proxy Selection

Keep track of each proxy's performance and choose the best one for your next request.

let performanceMetrics = {
  '203.42.142.32:8080': { successCount: 10, errorCount: 2 },
  '150.24.126.73:8080': { successCount: 5, errorCount: 5 },
};
const getBestProxy = () => {
  // Some logic to pick the best proxy based on performanceMetrics
};

Mixing Proxy Types

You can use different kinds of proxies for better success rates.

const proxies = [
  { ip: '203.42.142.32', port: '8080', type: 'residential' },
  { ip: '150.24.126.73', port: '8080', type: 'datacenter' },
];

Parallel Requests

Use Promise.all to send multiple requests at once, each with a different proxy.

const requests = [axios.get(url1, { proxy: getNextProxy() }), axios.get(url2, { proxy: getNextProxy() })];
Promise.all(requests)
  .then(responses => {
    // Handle responses
  })
  .catch(err => {
    // Handle error
  });

Error Handling

Add retries and timeouts to your axios requests.

axios.get('https://httpbin.org/ip', { proxy: getNextProxy(), timeout: 5000 })
  .then(res => console.log(res.data))
  .catch(err => {
    if (err.code === 'ECONNABORTED') {
      // Retry logic
    }
    console.log('Error!', err);
});

There you go. With these snippets, you're not just rotating proxies; you're optimizing your entire scraping process.


Acing with Premium Proxy Services

Consider going for high-quality, optimized residential proxies. These proxies allow you to easily bypass website blocks and run your Axios project smoothly! Here's an example:

axios.get('https://restrictedwebsite.com', {
  proxy: {
    protocol: 'http',
    host: 'residential.proxy.com',
    port: 8080,
    auth: {
      username: 'yourProviderAPIKey',
      password: 'topSecret'
    }
  }
})
.then(res => console.log(res.data))
.catch(err => console.log('Ouch!', err));

Wrapping It Up

Using Axios with a well-picked proxy can make your web scraping unstoppable. We've gone from setting up a basic proxy to smoothly rotating between multiple proxies. And if you want reliability and efficiency, premium proxies are worth the investment. Now, go out there and scrape the web like a pro!