Commit graph

963 commits

Author SHA1 Message Date
Dominik Sander
9d8d371fc8 Order TwitterStreamAgents in setup_workers
Enforcing the order ensures the config hash stays identical when the
Agent configuration does not change.

 #1191
2017-01-31 23:43:08 +01:00
Thiago Talma
51b9bbca17 Allow Redirect Requests (#1881)
* Allow Redirect Requests

* specs

* type cast
2017-01-28 16:58:41 -05:00
Dominik Sander
410ece8682 Merge pull request #1877 from dsander/fix-scenario-import
Fix scenario import when merges are required
2017-01-20 13:29:26 +01:00
Akinori MUSHA
7d1c87ddbd Slightly fix the regexps
The quantifier `+?` does not make sense here, so I guess you actually
meant this.
2017-01-19 17:38:27 +09:00
Dominik Sander
e2917e8854 Fix scenario import when merges are required
Fixes #1875
2017-01-17 22:00:38 +01:00
Andrew Cantino
af5c5b5f62 Odds and ends (#1866)
* Fix brittle spec

* remove unused module

* Fix another flaky test
2017-01-08 13:55:05 -05:00
Akinori MUSHA
a5874da0ae Expose Agent#id to Liquid (and to JavaScriptAgent) 2017-01-05 23:56:42 +09:00
Jin Liu
ed85ad5ed3 Allow weibo publish agent tweet with a picture (#1336)
* can publish with pictures

* refactor and add specs

* Take care of the case where most_recent_event is nil.

* Sleep when necessary
2017-01-03 07:54:42 -05:00
Dominik Sander
51f2d095df Merge pull request #1847 from dsander/tweet-mode-extended
Ensure Twitter REST API calls always get extended tweets
2017-01-03 09:51:39 +01:00
Akinori MUSHA
f8cf6bf9c6 Add a new option include_sort_info
If enabled, all events created by the Agent will have a `sort_info` key
whose value is a hash containing the keys `position` and `count`.

This overrides #1768.
2017-01-01 03:27:24 +09:00
Akinori MUSHA
fe3b43ba56 Merge branch 'master' into fix_http_status_agent 2017-01-01 02:54:59 +09:00
Akinori MUSHA
21c166f455 Do not let values in a received event override option values
There's no need to, because Agent options can interpolate values in an
event payload.
2017-01-01 02:48:33 +09:00
Dominik Sander
d700afa9b7 Ensure Twitter REST API calls always get extended tweets
Tweets that include embedds are truncated by default https://dev.twitter.com/overview/api/upcoming-changes-to-tweets
by passing tweet_mode=extened to the REST API calls we ensure to get the full response including embedded images and
videos
2016-12-30 13:22:37 +01:00
Andrew Cantino
65cea03062 Fix spec failures 2016-12-20 13:32:48 -05:00
Irfan Charania
4e2d1775a6 PhantomJs Cloud Agent (#1503)
* Initial draft of PhantomJsCloudAgent

Generates event with url for fetching html/plainText content

* Add options

* Pass in event instead of url
Fix hash syntax
Remove whitespace
Add mode merge

* Add some tests

* Style changes

- Add link to wiki entry for manually creating agent with full set of
options
2016-12-20 12:44:54 -05:00
Akinori MUSHA
e26c07e75b Fix a merge conflict 2016-11-30 01:24:36 +09:00
Akinori MUSHA
3612ba7333 Add podcast tags to events emitted by RssAgent
The keys are only added when the feed is a podcast.
2016-11-29 23:59:32 +09:00
Andrew Cantino
8a14a57e00 Beeper.io is no more (#1808)
* Beeper.io is no more

* Avoid event propagation or scheduling of missing agents

* Update undefined_agents.html.erb with a link to the wiki
2016-11-27 15:30:50 -05:00
Akinori MUSHA
5f5f3cd38f Do not err if headers is a valid headers hash 2016-11-27 13:38:23 +09:00
Akinori MUSHA
9074f3115e Make sure status is an integer when set 2016-11-27 13:38:23 +09:00
Akinori MUSHA
a94cd7fd6d Allow an empty or null base URI 2016-11-27 13:38:23 +09:00
Akinori MUSHA
3a0c9e6274 Disable automatic URL normalization and absolutization on url
This was discussed in #1766.

For backward compatibility, existing WebsiteAgents with a key named
`url` will be given a `template` to resolve `url`.
2016-11-27 13:38:23 +09:00
Akinori MUSHA
3d91469733 Make WebsiteAgent merge template with the results of extract (#1816)
A new extraction option `hidden` is added so that keys with it gets
excluded from the final payloads while they can be used in `template`.
2016-11-27 13:33:24 +09:00
bobbysteel
f530305edc Add class of service chooser for Google Flights Agent (#1778)
* Add class of service chooser

* Add cabin chooser test

* Fix preferredCabin

* Per @cantino feedback taking out check
2016-11-26 15:13:00 -05:00
Akinori MUSHA
7ac691652b Spec that force_encoding works with encoding declaration in RssAgent 2016-11-26 13:26:09 +09:00
Akinori MUSHA
8d2ebe8fad Add a spec to test case-insensitivity with headers_to_save 2016-11-23 19:14:59 +09:00
Akinori MUSHA
29fd8aca28 Move the spec file for HttpStatusAgent where it belongs 2016-11-23 10:06:37 +09:00
Akinori MUSHA
c9e567edb3 Fix the specs for HttpStatusAgent
- Raplace hand-made mocking for web requests with Webmock

- Stop overriding internal methods of Agent like `interpolated`, because
  that made the specs not reflect actual behavior
2016-11-23 10:06:37 +09:00
Akinori MUSHA
0b3700999b Fix a double-decoding problem in RssAgent
The SAX parser Feedjira uses (Nokogiri::XML::SAX) tries to detect the
encoding of a document from the content even if it is already known
and given.  This results in a content being decoded twice by
WebRequestConcern and the SAX parser if its encoding is declared in
both the Content-Type header and the XML declaration.

This commit makes RssAgent remove the `encoding` attribute from the
XML declaration of a document if the encoding is already known by the
Content-Type header.

Fixes #1797.
2016-11-22 12:14:28 +09:00
Akinori MUSHA
bd9455d5d0 Add a repeat option to extractors
This allows user to include a value that only appears once in a content
in all events created from the content.
2016-11-18 18:47:32 +09:00
Akinori MUSHA
50123dca53 Fix event_description broken in full JSON mode or without a template 2016-11-02 21:47:01 +09:00
Akinori MUSHA
4fe35b2a1f Reproduce #1765 2016-11-01 22:29:56 +09:00
Akinori MUSHA
91f096b16f Merge pull request #1743 from cantino/website_agent_can_interpolate_after_extraction
WebsiteAgent can interpolate after extraction

Incorporating feedback from @cantino and @dsander.
2016-11-01 20:20:37 +09:00
Alex Jordan
651eb50729 Fix another Stubhub HTTP URL 2016-10-31 20:58:06 -07:00
Alex Jordan
77da54ea0c Convert a bunch of HTTP links to HTTPS (#1757) 2016-10-31 19:21:03 -04:00
Akinori MUSHA
58fabb885c Add a new Liquid filter rebase_hrefs 2016-10-29 20:40:52 +09:00
Akinori MUSHA
8b897f5da3 Add Liquid variables _response_.url and _url_ to WebsiteAgent 2016-10-29 20:40:51 +09:00
Akinori MUSHA
fe35df8752 Add a new option template to WebsiteAgent
If given, it is used as a Liquid template for each event created by the
Agent, instead of directly emitting the results of extraction as events.

An existing spec needs to be fixed because WebsiteAgent now has the
`template` option, which may not be a hash of hashes.
2016-10-29 20:40:51 +09:00
Akinori MUSHA
faa2789a0c Fix the order of receivers in the DotHelper specs
This should fix occasional build failure on CI.
2016-10-27 16:31:24 +09:00
Akinori MUSHA
4f93db60e7 Merge pull request #1754 from cantino/ignore_empty_author
Ignore empty author and link entries in RssAgent.

Fixes #1753.
2016-10-27 13:07:21 +09:00
Akinori MUSHA
50b5833a3f Improve encoding detection in WebsiteAgent
Previously, WebsiteAgent always assumed that a content with no charset
specified in the Content-Type header would be encoded in UTF-8.  This
enhancement is to make use of the encoding detector implemented in
Nokogiri for HTML/XML documents, instead of blindly falling back to
UTF-8.

When the document `type` is `html` or `xml`, WebsiteAgent tries to
detect the encoding of a fetched document from the presence of a BOM,
XML declaration, or HTML `meta` tag.

This fixes #1742.
2016-10-27 13:00:37 +09:00
Akinori MUSHA
2bb97b53bc Add failing specs for empty <link> elements 2016-10-27 09:17:40 +09:00
Akinori MUSHA
445665ee3a Add a failing test for #1753 2016-10-27 08:09:57 +09:00
Akinori MUSHA
cb0e8f68f9 Rename onethingwell.atom to .rss because it is actually an RSS file 2016-10-27 07:31:19 +09:00
Akinori MUSHA
7ed40a6901 Use the XPath expression string(.) instead of .//text()
That is the correct way to extract a raw string with all text nodes
concatenated without entity escaping.
2016-10-21 00:23:00 +09:00
Akinori MUSHA
0fcd8e285e Normalize URL in to_uri and uri_expand liquid filters 2016-10-17 15:02:57 +09:00
Dominik Sander
005f01a4ad Merge pull request #1716 from dsander/liquid-as-object
Add as_object Liquid filter
2016-10-14 12:53:24 +02:00
Dominik Sander
d2cbd04ac8 Add as_object Liquid filter
The `as_object` returns the received data/object as is without casting it to a string like liquid normally does. It
can be used as a JSONPath replacement or to emit result of a Liquid filter chain as an array.

`catch` and `throw` needs to be used to break out of Liquid render chain. Liquid aggregates the output of every
expression an array and [joins](https://github.com/Shopify/liquid/blob/v3.0.6/lib/liquid/block.rb#L147) it together that
join makes it impossible to get anything else than a string out of a Liquid template.
2016-10-14 12:33:31 +02:00
Akinori MUSHA
12cecb8392 Merge branch 'data_output_agent_limits_events_after_ordering' 2016-10-07 19:22:42 +09:00
Akinori MUSHA
654da6a4e6 Add a failing test
There needs more than 2 * events_to_show events to check if selection
before limiting actually works.
2016-10-07 19:21:57 +09:00