Skip to content

StorehausSourceTap#213

Open
rubanm wants to merge 1 commit into
twitter:developfrom
rubanm:feature/storehaus_cascading
Open

StorehausSourceTap#213
rubanm wants to merge 1 commit into
twitter:developfrom
rubanm:feature/storehaus_cascading

Conversation

@rubanm
Copy link
Copy Markdown
Contributor

@rubanm rubanm commented Feb 7, 2014

Addresses #208

This is an untested first draft that adds some basic cascading wiring.
Sending it out early for any design feedback as well as any missing big picture items.

I am thinking of having the source tap completely done before moving to the sink tap.

  • add source tap
  • add an example MapStore based source
  • add test
  • integration test with scalding + summingbird

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ks could be weakened to Iterable[K] which can be lazy and not something that can anwer .contains calls.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I think we are going to want a Spool[K] here, or even () => Spool[K] or the equivalent.

In fact, on the source side, maybe what we want is a Spool source, and a way to convert a
(Spool[K], ReadableStore[K, V]) => Spool[(K, Option[V])]

Then we can use this for even more cases.

That said, the sink side really looks like a Writable[K, V] I think.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rubanm @johnynek Thanks for starting this. I'm too new to writing cascading taps but since I'm working on this now, I wondered how one can actually pass in constructor params to InputFormat? Can I use JobConf for this or other external serialization means or is it even done automatically?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndreasPetter I had a brief chat about this with @johnynek sometime back.

Config parameters can be passed via the JobConf I think. Here is an example of how cascading-jdbc does it for JDBCTap:
https://github.com/Cascading/cascading-jdbc/blob/2.5/cascading-jdbc-core/src/main/java/cascading/jdbc/db/DBConfiguration.java

For store creation on the mappers, it looks like we'll need to pass a StoreProvider of some kind which knows how to generate a ReadableStore.

@AndreasPetter
Copy link
Copy Markdown

@rubanm as i would be a thankful user of a Summingbird-batching Storehaus tap (i will need a SinkTap pretty soon), do you already have plans for a final pull request? If i can be of any help with the SinkTap i would be happy to join you working on it.

@rubanm
Copy link
Copy Markdown
Contributor Author

rubanm commented Mar 21, 2014

@AndreasPetter Sorry haven't been able to work on this for a while. You are more than welcome to start on the SinkTap and pull in any code from this PR if that's useful. I can help, and we have @johnynek of course :)

Thanks.

@pkallos
Copy link
Copy Markdown

pkallos commented May 30, 2014

👍 love it

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Nov 16, 2019

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants