Module: Tushare::Stock::NewsEvent
- Extended by:
- Util
- Defined in:
- lib/tushare/stock/news_event.rb
Overview
新闻事件数据接口
Class Method Summary collapse
- ._guba_content(url) ⇒ Object
- ._random(n = 16) ⇒ Object
-
.guba_sina(show_content = false) ⇒ Object
Return ——– DataFrame title, 消息标题 content, 消息内容(show_content=True的情况下) ptime, 发布时间 rcounts,阅读次数.
-
.latest_content(url) ⇒ Object
Return ——– string:返回新闻的文字内容.
-
.latest_news(top = PAGE_NUM[2], show_content = false) ⇒ Object
Return ——– DataFrame classify :新闻类别 title :新闻标题 time :发布时间 url :新闻链接 content:新闻内容(在show_content为True的情况下出现).
-
.notices(code, date = nil) ⇒ Object
Return ——– DataFrame,属性列表: title:信息标题 type:信息类型 date:公告日期 url:信息内容URL.
Instance Method Summary collapse
-
#notice_content(url) ⇒ Object
Return ——– string:信息内容.
Methods included from Util
_code_to_symbol, _write_console, _write_head, check_quarter, check_year, fetch_ftp_file, holiday?, trade_cal
Class Method Details
._guba_content(url) ⇒ Object
141 142 143 144 145 146 147 148 |
# File 'lib/tushare/stock/news_event.rb', line 141 def _guba_content(url) doc = Nokogiri::HTML(open(url), nil, 'gbk') content = doc.css('div.ilt_p p').text ptime = doc.css('div.fl_left.iltp_time span').text rcounts_text = doc.css('div.fl_right.iltp_span span').text rcounts = rcounts_text.gsub(/\D/, '') { content: content, ptime: ptime, rcounts: rcounts } end |
._random(n = 16) ⇒ Object
163 164 165 166 167 |
# File 'lib/tushare/stock/news_event.rb', line 163 def _random(n = 16) start_point = 10 ** (n - 1) end_point = (10 ** n) - 1 rand(start_point..end_point).to_s end |
.guba_sina(show_content = false) ⇒ Object
Return
DataFrame
title, 消息标题
content, 消息内容(show_content=True的情况下)
ptime, 发布时间
rcounts,阅读次数
20170504bug修复 修复获取股吧数据的时候的报错问题
1. 获取最新消息链接的时候,heads中的链接由于连接到直播间,无法进行内容读取
2. 获取的res最新文章数据中的li有可能出现空标签的问题,在代码中加入判断去除空链接bug
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
# File 'lib/tushare/stock/news_event.rb', line 109 def guba_sina(show_content = false) url = format(GUBA_SINA_URL, P_TYPE['http'], DOMAINS['sina']) doc = Nokogiri::HTML(open(url), nil, 'gbk') res = doc.css('ul.list_05 li') heads = doc.css('div.tit_04') result = [] heads.each do |head| object = {} link = head.css('a').first object[:title] = link.content object[:url] = link.attr('href') #object.merge!(_guba_content(object[:url])) object.delete(:content) unless show_content result << object end res.each do |row| object = {} unless row.css('a').length<2 link = row.css('a')[1] object[:title] = link.text object[:url] = link.attr('href') object.merge!(_guba_content(object[:url])) object.delete(:content) unless show_content result << object end end result end |
.latest_content(url) ⇒ Object
Return
string:返回新闻的文字内容
158 159 160 161 |
# File 'lib/tushare/stock/news_event.rb', line 158 def latest_content(url) doc = Nokogiri::HTML(open(url)) doc.css('div#artibody p').text end |
.latest_news(top = PAGE_NUM[2], show_content = false) ⇒ Object
Return
DataFrame
classify :新闻类别
title :新闻标题
time :发布时间
url :新闻链接
content:新闻内容(在show_content为True的情况下出现)
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
# File 'lib/tushare/stock/news_event.rb', line 24 def latest_news(top = PAGE_NUM[2], show_content = false) url = format(LATEST_URL, P_TYPE['http'], DOMAINS['sina'], PAGES['lnews'], top, _random) resp = HTTParty.get(url) events = eval(resp.body.encode('utf-8', 'gbk').sub('var ', '') .delete(' '))[:list] result = [] events.each do |event| object = {} object['classify'] = event[:channel][:title] object['title'] = event[:title] object['url'] = event[:url] object['time'] = Time.at(event[:time]).strftime('%m-%d %H:%M') object['content'] = latest_content(object['url']) if show_content result << object end result end |
.notices(code, date = nil) ⇒ Object
Return
DataFrame,属性列表:
title:信息标题
type:信息类型
date:公告日期
url:信息内容URL
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
# File 'lib/tushare/stock/news_event.rb', line 56 def notices(code, date = nil) return nil if code.nil? symbol = code[0] == '6' ? "sh#{code}" : "sz#{code}" url = format(NOTICE_INFO_URL, P_TYPE['http'], DOMAINS['vsf'], PAGES['ntinfo'], symbol) url = "#{url}&gg_date=#{date}" if date doc = Nokogiri::HTML(open(url), nil, 'gbk') rows = doc.css('table.body_table tbody tr') result = [] rows.each do |row| object = {} a = row.css('th a')[0] tds = row.css('td') object['title'] = a.content object['type'] = tds[0].content object['date'] = tds[1].content object['url'] = "#{P_TYPE['http']}#{DOMAINS['vsf']}#{a.attr('href')}" result << object end result end |
Instance Method Details
#notice_content(url) ⇒ Object
Return
string:信息内容
86 87 88 89 |
# File 'lib/tushare/stock/news_event.rb', line 86 def notice_content(url) doc = Nokogiri::HTML(open(url)) doc.css('div#content pre')[0].content.strip end |