Wednesday, February 2, 2022

[SOLVED] How to block automated requests

Issue

I want to prevent 'get_file_contents', automatic 'curl' requests and automatic requests for scrape data in my project.

I don't want to do this with WAF or an external system.

How can I parse incoming requests using PHP only?


Solution

You can write your own robot check with recaptcha. Just show the robot check to users without a proper session, and allow content access only to the visitor that passes the test.

For example,

<?php

session_start();

if ($_SERVER['REQUEST_METHOD'] === 'POST' && isset($_POST['recaptcha_response'])) {
    // Build POST request:
    $recaptcha_url = 'https://www.google.com/recaptcha/api/siteverify';
    $recaptcha_secret = 'YOUR_RECAPTCHA_SECRET_KEY';
    $recaptcha_response = $_POST['recaptcha_response'];

    // Make and decode POST request:
    $recaptcha = file_get_contents($recaptcha_url . '?secret=' . $recaptcha_secret . '&response=' . $recaptcha_response);
    $recaptcha = json_decode($recaptcha);

    // Take action based on the score returned:
    $_SESSION['human'] = ($recaptcha->score >= 0.8);
}

if (!isset($_SESSION['human']) || !$_SESSION['human']) {

    echo <<<HTML
<html>
<head>
<title>Human Test</title>
<script src="https://www.google.com/recaptcha/api.js"></script>
<script>
function onSubmit(token) {
  document.getElementById("demo-form").submit();
}
</script>
</head>
<body>
<form method="POST" id="demo-form">
  <button class="g-recaptcha"
    data-sitekey="reCAPTCHA_site_key" 
    data-callback='onSubmit' 
    data-action='submit'>I am a Human :-)</button>
</form>
</body>
</html>
HTML;
    exit;
}


// Whatever content you want to show originally
// ...
// ...

This would block most of the scrapers.



Answered By - Koala Yeung
Answer Checked By - Cary Denson (WPSolving Admin)