• Use PHP to get html & metadata for website

    Following script can be used to get the first part of a remote file to parse the html elements in local script. eregi() function is used here to parse the meta keywords, the meta description and title element.

    Extra rules and regex patterns can be added for more meta elements. We can use this script for adding new links in to a linklist.

    Php script is as below:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    
    <!--?php 
     
    $page_title = "NA";
     
    $meta_descr = "NA";
     
    $meta_keywd = "NA";
     
    if ($handle = @fopen("http://www.your_url.com", "r")) {
     
    $content = "";
     
    while (!feof($handle)) {
     
    $part = fread($handle, 1024);
     
    $content .= $part;
     
    if (eregi("</head-->", $part)) break;
     
    }
     
    fclose($handle);
     
    $lines = preg_split("/\r?\n|\r/", $content); // turn the content in rows
     
    $is_title = false;
     
    $is_descr = false;
     
    $is_keywd = false;
     
    $close_tag = ($xhtml) ? " ?&gt;" : "&gt;"; // new in ver. 1.01
     
    foreach ($lines as $val) {
     
    if (eregi("", $val, $title)) {
     
    $page_title = $title[1];
     
    $is_title = true;
     
    }
     
    if (eregi("<meta name="\&quot;keywords\&quot;" content="\&quot;(.*)\&quot;([[:space:]]?/)?" />", $val, $keywd)) {
     
    $meta_keywd = $keywd[1];
     
    $is_keywd = true;
     
    }
     
    if (eregi("<meta name="\&quot;description\&quot;" content="\&quot;(.*)\&quot;([[:space:]]?/)?" />", $val, $descr)) {
     
    $meta_descr = $descr[1];
     
    $is_descr = true;
     
    }
     
    if ($is_title &amp;&amp; $is_descr &amp;&amp; $is_keywd) break;
     
    }
     
    }
     
    ?&gt;

Comments on this post

Leave a Reply

  • Security Code :


    seven − = 6