twitterscraper-ruby
A gem to scrape https://twitter.com/search. This gem is inspired by taspinar/twitterscraper.
Please feel free to ask @ts_3156 if you have any questions.
Twitter Search API vs. twitterscraper-ruby
Twitter Search API
- The number of tweets: 180 - 450 requests/15 minutes (18,000 - 45,000 tweets/15 minutes)
- The time window: the past 7 days
twitterscraper-ruby
- The number of tweets: Unlimited
- The time window: from 2006-3-21 to today
Installation
First install the library:
```shell script $ gem install twitterscraper-ruby
## Usage
#### Command-line interface:
Returns a collection of relevant tweets matching a specified query.
```shell script
$ twitterscraper --type search --query KEYWORD --start_date 2020-06-01 --end_date 2020-06-30 --lang ja \
--limit 100 --threads 10 --output tweets.json
```
Returns a collection of the most recent tweets posted by the user indicated by the screen_name
```shell script
$ twitterscraper --type user --query SCREEN_NAME --limit 100 --output tweets.json
```
#### From Within Ruby:
```ruby
require 'twitterscraper'
client = Twitterscraper::Client.new(cache: true, proxy: true)
```
Returns a collection of relevant tweets matching a specified query.
```ruby
tweets = client.search(KEYWORD, start_date: '2020-06-01', end_date: '2020-06-30', lang: 'ja', limit: 100, threads: 10)
```
Returns a collection of the most recent tweets posted by the user indicated by the screen_name
```ruby
tweets = client.user_timeline(SCREEN_NAME, limit: 100)
```
## Examples
```shell script
$ twitterscraper --query twitter --limit 1000
$ cat tweets.json | jq . | less
```
## Attributes
### Tweet
```ruby
tweets.each do |tweet|
puts tweet.tweet_id
puts tweet.text
puts tweet.tweet_url
puts tweet.created_at
attr_names = hash.keys
hash = tweet.attrs
json = tweet.to_json
end
```
```json
[
{
"screen_name": "@name",
"name": "Name",
"user_id": 12340000,
"profile_image_url": "https://pbs.twimg.com/profile_images/1826000000/0000.png",
"tweet_id": 1234000000000000,
"text": "Thanks Twitter!",
"links": [],
"hashtags": [],
"image_urls": [],
"video_url": null,
"has_media": null,
"likes": 10,
"retweets": 20,
"replies": 0,
"is_replied": false,
"is_reply_to": false,
"parent_tweet_id": null,
"reply_to_users": [],
"tweet_url": "https://twitter.com/name/status/1234000000000000",
"timestamp": 1594793000,
"created_at": "2020-07-15 00:00:00 +0000"
}
]
```
- screen_name
- name
- user_id
- profile_image_url
- tweet_id
- text
- links
- hashtags
- image_urls
- video_url
- has_media
- likes
- retweets
- replies
- is_replied
- is_reply_to
- parent_tweet_id
- reply_to_users
- tweet_url
- created_at
## Search operators
| Operator | Finds Tweets... |
| ------------- | ------------- |
| watching now | containing both "watching" and "now". This is the default operator. |
| "happy hour" | containing the exact phrase "happy hour". |
| love OR hate | containing either "love" or "hate" (or both). |
| beer -root | containing "beer" but not "root". |
| #haiku | containing the hashtag "haiku". |
| from:interior | sent from Twitter account "interior". |
| to:NASA | a Tweet authored in reply to Twitter account "NASA". |
| @NASA | mentioning Twitter account "NASA". |
| puppy filter:media | containing "puppy" and an image or video. |
| puppy -filter:retweets | containing "puppy", filtering out retweets |
| superhero since:2015-12-21 | containing "superhero" and sent since date "2015-12-21" (year-month-day). |
| puppy until:2015-12-21 | containing "puppy" and sent before the date "2015-12-21". |
Search operators documentation is in [Standard search operators](https://developer.twitter.com/en/docs/tweets/rules-and-filtering/overview/standard-operators).
## CLI Options
| Option | Type | Description | Value |
| ------------- | ------------- | ------------- | ------------- |
| `--help` | string | This option displays a summary of twitterscraper. | |
| `--type` | string | Specify a search type. | search(default) or user |
| `--query` | string | Specify a keyword used during the search. | |
| `--start_date` | string | Used as "since:yyyy-mm-dd for your query. This means "since the date". | |
| `--end_date` | string | Used as "until:yyyy-mm-dd for your query. This means "before the date". | |
| `--lang` | string | Retrieve tweets written in a specific language. | |
| `--limit` | integer | Stop scraping when *at least* the number of tweets indicated with --limit is scraped. | 100 |
| `--order` | string | Sort a order of the results. | desc(default) or asc |
| `--threads` | integer | Set the number of threads twitterscraper-ruby should initiate while scraping for your query. | 2 |
| `--threads_granularity` | string | day or hour | auto |
| `--chart_grouping` | string | day, hour or minute | auto |
| `--proxy` | boolean | Scrape https://twitter.com/search via proxies. | true(default) or false |
| `--cache` | boolean | Enable caching. | true(default) or false |
| `--format` | string | The format of the output. | json(default) or html |
| `--output` | string | The name of the output file. | tweets.json |
| `--verbose` | | Print debug messages. | |
## Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/ts-3156/twitterscraper-ruby. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/ts-3156/twitterscraper-ruby/blob/master/CODE_OF_CONDUCT.md).
## License
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
## Code of Conduct
Everyone interacting in the twitterscraper-ruby project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/ts-3156/twitterscraper-ruby/blob/master/CODE_OF_CONDUCT.md).