Sie sind auf Seite 1von 7

A Simple Twitter Search

Parser with PHP


May 18, 2010/by Armand Niculescu

I was recently asked to aggregate tweets based on their hashtags using PHP (no Ajax),
so I decided to turn this into a small tutorial that will hopefully enable you build all sorts
of XML parsers in PHP.

If you read my posts, you can see that I love simplicity. In all of my solutions, tips and
tutorials, I strive for the simplest code that gets the job done and for the most
straightforward explanation. This tutorial is no exception. It is my sincere hope that
youll not just copy & paste the code in your project, but you will actually understand it
as well and youll be able to modify and extend it to your purposes.
Twitter has a search service at search.twitter.com. The search results are available as an
Atom feed and this is how were going to use it. If youre wondering why Atom instead of
RSS, one can argue that despite the popularity of RSS 2.0, Atom is a superior format.
The anatomy of an Atom search result
Heres how the XML Atom feed for a Twitter search looks like:

<!--?xml version="1.0" encoding="UTF-8"?-->

1
2
2010-05-16T14:08:54Z
3
15
4
5
tag:search.twitter.com,2005:14100024699
6
2010-05-16T14:08:54Z
7
8
9
RT &lt;a href=&quot;http://twitter.com/ivanristic&quot;&gt;@ivanristic&lt;/a&gt;: LibH
10
href=&quot;http://bit.ly/b3uFkK&quot;&gt;http://bit.ly/b3uFkK&lt;/a&gt;
11
2010-05-16T14:08:54Z
12
13
14
15
<twitter:result_type>recent
16
17
18
&lt;a href=&quot;http://www.tweetdeck.com&quot; rel=&quot;nofollow&quot;&gt;TweetDeck
19
en
20
21
opexxx (alex knorr)
22
http://twitter.com/opexxx

Building the parser


My goals for this little parser were as follow:

Show the tweets in the format Full Name: text time

Show the senders avatar

Show relative time, e.g. 5 minutes ago.

Open links in a new window

Limit the number of results (and process just the first page of results)

Filter tweets containing profanity

Style everything with CSS

Work with PHP 5.

So, first I should stress that this code is written for PHP 5, specifically it was not tested
with PHP versions prior to 5.2.0.
I made this into a class, so that you can easily use it in your project:
1 <!--?php class twitter_class {
function twitter_class()
2 'http://search.twitter.com/search.atom?lang=en&q=';
3 }
4
5 function getTweets($q, $limit=15)
6 {
7 }
8

$this--->searchURL =

}
?>

To load and parse an XML file, the easiest method is simplexml_load_file(), however
Twitter is rather picky with request headers and doesnt like if the user agent is not set
the way it likes, so well use curl instead.
$ch= curl_init($this->searchURL . urlencode($q));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
$response = curl_exec($ch);

20
21
22
23

Pretty simple. The search term is encoded and appended to the Twitter search url and
the result is loaded in the $response variable as a string. Also note that were making
the request using the browser user agent.
Parsing the resulted string could not be easier:
27
28
29
30
31
32
33
34
35
36
37
38

$xml = simplexml_load_string($response);
$output = '';
$tweets = 0;
for($i=0; $i<count($xml->entry); $i++)
{
$crtEntry = $xml->entry[$i];
$account = $crtEntry->author->uri;
$image
= $crtEntry->link[1]->attributes()->href;
$tweet
= $crtEntry->content;
}

So we can get the link to the poster account, the image and the tweet itself right away.
To get the name, we need a little parsing. The name is sent this way: username (Full
Name). I prefer to show just the full name, so Im using a simple regexp:
59
60
61

$this->realNamePattern = '/\((.*?)\)/';
preg_match($this->realNamePattern, $crtEntry->author->name, $matches);
$name = $matches[1];

Making the links open in a new window is also simple:


36

$tweet = str_replace('<a href="'," target=""_blank"">content);</a>

Next, its using relative time instead of absolute. This is a matter of personal taste, but
considering how quicky new tweets are added, its worth doing.
For this well use two arrays, one with various interval names, the other with the
number of seconds in that interval, e.g. an hour has 3600 seconds and so on.
9
10

$this->intervalNames = array('second', 'minute', 'hour', 'day', 'week', 'month', 'year');


$this->intervalSeconds = array( 1,
60,
3600,
86400, 604800, 2630880,

31570560);

The idea is this: we calculate the difference in seconds between the current time and
the tweet time and then we start looking in the interval array from the largest to the
smallest value, until our difference is larger than the value read from the array. For
example, if our calculated difference is 173000 seconds, we start with the last value in
the array, that is 31570560 and look until we find the value 86400, which corresponds is
the day interval. Now we know our difference is more than one day but less than one
week. By dividing the difference by the interval length, that is 173000/86400, we get
2.002, thats just a little over two days. If the division is exactly 1, we must use the
singural form, i.e. day, otherwide the plural, days.
So heres the code that does all that:
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85

$time = 'just now';


$secondsPassed = time() - strtotime($crtEntry->published);
if ($secondsPassed>0)
{
// see what interval are we in
for($j = count($this->intervalSeconds)-1; ($j >= 0); $j--)
{
$crtIntervalName = $this->intervalNames[$j];
$crtInterval = $this->intervalSeconds[$j];
if ($secondsPassed >= $crtInterval)
{
$value = floor($secondsPassed / $crtInterval);
if ($value > 1)
$crtIntervalName .= 's';
$time = $value . ' ' . $crtIntervalName . ' ago';
}
}

break;

Finally, its the filtering. Depending on your site audience you may or may not need
such a filter, Im including it just in case.
Youd have a list of banned words in an array, like this:
12

$this->badWords = array('bannedword', 'anotherbannedword');

and the code:


39
40
41
42

$foundBadWord = false;
foreach ($this->badWords as $badWord)
{

43
44
45
46
47
48
49
50
51

if(stristr($tweet, $badWord) !== FALSE)


{
$foundBadWord = true;
break;
}
}
// skip this tweet containing a banned word
if ($foundBadWord)
continue;

Now lets put everything together:

The complete class

1 <!--?php
2 class twitter_class
3 {
4
function twitter_class()
5
{
6
$this--->realNamePattern = '/\((.*?)\)/';
7
$this->searchURL = 'http://search.twitter.com/search.atom?lang=en&q=';
8
9
$this->intervalNames = array('second', 'minute', 'hour', 'day', 'week', 'month', 'year');
10
$this->intervalSeconds = array( 1,
60,
3600,
86400, 604800, 2630880,
11
31570560);
12
13
$this->badWords = array('bannedword', 'anotherbannedword');
14
}
15
16
17 function getTweets($q, $limit=15)
18 {
$output = '';
19
20
// get the seach result
21
$ch= curl_init($this->searchURL . urlencode($q));
22
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
23
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
24
$response = curl_exec($ch);
25
26
if ($response !== FALSE)
27
{
28
29
$xml = simplexml_load_string($response);
30
$output = '';
31
$tweets = 0;
32
33
for($i=0; $i<count($xml->entry); $i++)
34
{
35
$crtEntry = $xml->entry[$i];
36
$account = $crtEntry->author->uri;
37
$image
= $crtEntry->link[1]->attributes()->href;
38
$tweet
= str_replace('<a href=", " target=""_blank"" name="">realNamePattern, $crtEntry
39 >author->name, $matches);

$name = $matches[1];
// get the time passed between now and the time of tweet, don't allow for negative
// (future) values that may have occured if server time is wrong
$time = 'just now';
$secondsPassed = time() - strtotime($crtEntry->published);
if ($secondsPassed>0)
{
// see what interval are we in
for($j = count($this->intervalSeconds)-1; ($j >= 0); $j--)
{
$crtIntervalName = $this->intervalNames[$j];
$crtInterval = $this->intervalSeconds[$j];

40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64 </a>

if ($secondsPassed >= $crtInterval)


{
$value = floor($secondsPassed / $crtInterval);
if ($value > 1)
$crtIntervalName .= 's';
$time = $value . ' ' . $crtIntervalName . ' ago';
}

break;

}
}
$output .= '

. $name . : .
$tweet .
. $time .
1 ';
}
2
3
}
4
else
5
$output = '
. curl_error($ch) .
1 ';
curl_close($ch);
2
return $output;
3
4 }
5}
6 ?>
To use the class in another php file, youd do use it like this:
<!--?php
require('twitter.class.php');
1 $twitter--->getTweets($q, 5);
2 echo $output;
3 ?>

$twitter = new twitter_class();

$q = 'your_query';

This will show the latest 5 tweets for your query.


You can style the results any way you want. Styling is outside the scope of this tutorial
but you can look at the end of the class to see the html tags and classes that are
generated.

Further improvement
Given the quasi-real-time nature of Twitter (depending on the topic, tweets get
published every moment), you may want to use Ajax to load new tweets. You can give
an id to each tweet (usually the timestamp) and modify the PHP to return only tweets
newer than the timestamp. You can use either an Ajax library like JQuery or Flash to load
and show the new tweets and a few seconds later to make a new request specifying the
latest id.

Download the code


Download PHP Twitter Reader

Das könnte Ihnen auch gefallen