IRC log for #dataverse, 2019-05-06

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

All times shown according to UTC.

Time	Nick	Message
01:07		jri joined #dataverse
01:14		sba-usable-sec joined #dataverse
01:59	sba-usable-sec	Hello, I am a researching data security and privacy. Please contribute to science by doing my short 3 minute survey. More information at @ https://de.surveymonkey.com/r/ZHDF96S
05:07		jri joined #dataverse
07:04		jri joined #dataverse
07:38		jri joined #dataverse
08:03		juancorr joined #dataverse
08:36		stefankasberger joined #dataverse
08:40		stefankasberger3 joined #dataverse
10:31		stefankasberger joined #dataverse
10:39		pdurbin joined #dataverse
10:40	pdurbin	I hope everyone had a great weekend.
11:35	stefankasberger	yes, it was. relaxed one. :)
11:36	stefankasberger	@pdurbin: i have a question regarding the download metrics: how are the downloads stored in the database? is it possible to see, from which country (IP) the download came from?
11:38	pdurbin	stefankasberger: there's "sessionid" at http://phoenix.dataverse.org/schemaspy/latest/tables/guestbookresponse.html
11:42	stefankasberger	what is the session id exactly?
11:43	pdurbin	Huh. I thought maybe I'd find the IP address in there. I just ran "select sessionid from guestbookresponse;" and I'm seeing stuff like "edu.harvard.iq.dataverse.DataverseSession58087755".
11:43	pdurbin	Check out this screenshot with time, ip, and country: https://github.com/IQSS/dataverse/issues/2729#issuecomment-154773635
11:46	stefankasberger	so the session id is a FK to another table with informations like IP, time etc? Or is the IP and time stored in the field session id?
11:46	stefankasberger	FK: foreign key
11:49	pdurbin	That's what I was hoping but I'm having trouble finding it. Maybe I have a misunderstanding of how it works? Or it changed? This is what I wrote recently: https://github.com/IQSS/dataverse/blob/v4.13/src/main/java/edu/harvard/iq/dataverse/makedatacount/DatasetMetrics.java#L85
11:49	pdurbin	Are you aware of the recent support for Make Data Count? That's another option for you.
11:50	pdurbin	I'd love to have someone try it out. :)
11:52	pdurbin	But going back to guestbook for a bit, have you tried downloading guestbook data?
11:53	pdurbin	"Guestbooks allow you to collect data about who is downloading the files from your datasets... You are also able to download the data collected from the enabled guestbooks as Excel files to store and use outside of Dataverse." http://guides.dataverse.org/en/4.13/user/dataverse-management.html#dataset-guestbooks
11:58	pdurbin	I don't understand how there's any value in sessionid, strings like "edu.harvard.iq.dataverse.DataverseSession58087755". They're meaningless.
12:03	pdurbin	I just tried downloading guestbook responses as a csv file and there is no IP address in there. I guess I've been mistaken for a long time about how guestbook works. :/
12:04	pdurbin	stefankasberger: but! Again, now there's Make Data Count support (if you set it up) and "countrycode" is stored in the new datasetmetrics table: http://phoenix.dataverse.org/schemaspy/latest/tables/datasetmetrics.html . Does that help?
12:07		xarthisius joined #dataverse
12:07		xarthisius joined #dataverse
12:09	pdurbin	A good starting point for Make Data Count is http://guides.dataverse.org/en/4.13/admin/make-data-count.html
12:15	stefankasberger	yeah, thats helpful, thanks.
12:16	pdurbin	sure
12:53		donsizemore joined #dataverse
13:38	donsizemore	@pdurbin so, you wanted the API test suite run in Jenkins
13:40	donsizemore	@pdurbin i'm looking at run-test-suite.sh and think i could include that pretty easily in ansible.
13:41		pdurbin_m joined #dataverse
13:42	pdurbin_m	donsizemore: that's fantastic. Please let me know if I can help.
13:44	donsizemore	@pdurbin_m also, i picked up certbot in EC2 this morning... certbot won't generate for *.amazonaws.com hostnames. so no vagrant unless I do some port-forwarding, no EC2 hostnames... i may pick it back up at some point.
14:02	pdurbin	donsizemore: you're saying it won't work on EC2 either, right? That's fine.
14:13	donsizemore	@pdurbin so, the test suite can come from your script, that's fine, but it looks like it needs the dataverse source. so ansible will only call it when the branch != release?
14:13	donsizemore	@pdurbin doesn't make sense to deploy a release warfile then test against develop or whatever, and dataverse doesn't maintain versioned branches
14:17	pdurbin	donsizemore: sorry, I'm not following. Let me read that again. :)
14:18	pdurbin	Yes, the api tests require the source code.
14:18	pdurbin	Are you saying you don't have the source code when you use dataverse-ansible to deploy a released version of Dataverse?
14:19	donsizemore	@pdurbin right, by default it just grabs the newest release war
14:19	pdurbin	Ok. Makes sense. Why clone the repo if you don't need it.
14:20	donsizemore	@pdurbin but if you set dataverse_branch to develop or whatever, the test suite would run against that branch
14:20	donsizemore	even if i cloned the repo, by default you'd be deploying a release warfile but running tests against a branch
14:20	pdurbin	That's perfect. That's what we want. We want to run the API test suite on the develop branch, the master branch, feature branches (before they are merged and deleted).
14:20		pameyer joined #dataverse
14:21	pameyer	bjonnh: thanks
14:21	donsizemore	or i could grab the src .zip and run against that
14:21	pdurbin	Do you know who doesn't like regresssion? pameyer
14:21	pameyer	I fail to fully embrace the brokenness sometimes....
14:21	pdurbin	donsizemore: the direction I'm attempting to steer us right now is a replacement of phoenix, which would mean the develop branch.
14:22	donsizemore	@pdurbin i forgot about the release .zip — so test suite against a release can happen as well
14:22	pdurbin	I'd like to get the new Jenkins to have parity with phoenix, then quickly eclipse it. :)
14:22	donsizemore	and @pameyer why are you breaking things?!?
14:23	pameyer	@donsizemore - it's what I do ;)
14:24	pameyer	bjonnh: Merce already made the comment to your google doc that I'd been about to make (based on number of files)
14:24	pameyer	if jenkins has docker, it _should_ be pretty straightforward to get it to parity with phoenix
14:26	pdurbin	stefankasberger: I just confirmed that "sessionid" is basically junk and that IP addresses for downloads are not stored in the database. As I was saying, your best bet is to set up Make Data Count support and pull the data out of the "countrycode" column. There's an API for this.
14:27	pdurbin	pameyer: that reminds me, thanks for confirming that docker-aio is still working for you. I guess I'll try again when I have a minute. It troubles me when the tests fail. :(
14:29	donsizemore	jenkins has docker.
14:31	pdurbin	everyone's asking :)
14:32	pameyer	I vaguely recall getting ~80% through a jenkins / docker-aio setup for running ITs
14:55	pdurbin	nice
15:09	donsizemore	@pdurbin i like the docker-aio solution for jenkins. capture all output, trash the container
15:10	pdurbin	Sounds cheaper than EC2. :)
15:10	donsizemore	@pdurbin hey, i kill my containers at the end of each day!
15:11	pdurbin	heh, I know I know
15:11	pdurbin	thank you for that
15:11	pdurbin	I don't even look at the bill.
15:11	pdurbin	But I'm trying to stay aware of it. :)
15:23	pameyer	I'll see if I can dig it up.
15:40	pdurbin	donsizemore: someday we will probably still want to spin up from ec2 for the "sample data" use case. The ec2-create script is tricky and some day I'm hoping we can use Jenkins as a gui to wrap it. For demos or whatever. Does that make sense?
15:41	donsizemore	@pdurbin we could do that now with "build now"
15:54	pdurbin	right, that's what I do on old jenkins. clicky clicky. I don't have access to ssh in to old jenkins
16:29		jri joined #dataverse
17:22		donsizemore joined #dataverse
17:43	bjonnh	pameyer: cool, thx. We just acknowledged you all at our meeting with NIH
18:08	pdurbin	bjonnh: I finally took a look. I assume you still want feedback so I guess I'll go leave comments on the doc.
18:09	bjonnh	sure
18:09	bjonnh	we just published it as drafs
18:09	bjonnh	draft
18:10	bjonnh	put your name in the authors if you add/correct something
18:10	bjonnh	(unless you don't want to be associated)
18:11	pdurbin	Maybe I'll just clarify a couple things here quick.
18:12	pdurbin	When you say "Use your ORCID only, avoid the others." I thought maybe you meant, "Don't enter the ORCIDs for your co-authors." You don't mean that, do you?
18:12	pdurbin	You probably mean, only use ORCID, don't use other author identifiers.
18:12	bjonnh	yep second one
18:12	pdurbin	Like ISNI or whatever.
18:12	pdurbin	ok
18:23	pdurbin	bjonnh: do you like having your "jdf" files in a zip? These days file hierarchy is supported: https://groups.google.com/d/msg/dataverse-community/8gn5pq0cVc0/MCMQAQHRAQAJ
18:24	pdurbin	So you could create a "jdf" folder if you want. And a "jdx" folder.
18:25	bjonnh	is it available on the harvard instance already?
18:25	pdurbin	yep
18:25	bjonnh	cool
18:25	bjonnh	I have to discuss that with my colleagues, but that would be great
18:26	pdurbin	Merce left a comment about this where you wrote "double zip".
18:27	bjonnh	https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/F34GVS
18:27	bjonnh	I didn't go through yet with Merce comments, planned for this week
18:27	pdurbin	cool
18:29	bjonnh	Grouping by type (1H, 13C…) would make more sense
18:30	pdurbin	You could use file tags for that.
18:30	bjonnh	I proposed that to my colleagues will see if they agree
18:31	pdurbin	I mean, what you've written up is excellent. These are just suggestions. :)
18:33	bjonnh	yeah they make sense
18:34	pdurbin	Oh, I left a comment about a custom metadata block too.
18:40	pdurbin	which would be a lot of work
18:41	pdurbin	pameyer knows :)
18:47	bjonnh	ok let me check that
18:47	bjonnh	(added you in authors)
18:48	bjonnh	yes yes yes yes and yes for the custom metadata
18:48	bjonnh	could use an IRI for the subject
18:48	pdurbin	donsizemore knows too
18:48	bjonnh	the advantage of the IRI approach is that you can normalize
18:49	bjonnh	instead of having users doing PMID, PubMedID, Pubmed , …
18:49	pdurbin	you could facet on the values
18:49	pdurbin	(faceted browse/search)
18:53	pdurbin	like "PDB ID: 1V9Z" or whatever
19:06	bjonnh	yeah mostly about being able to grab all the pubchem id
19:06	bjonnh	etc
19:06	pdurbin	Sure. Oh, that reminds me, there's an issue you might like.
19:07	pdurbin	It has a weird title in my opinion but if you squint and read and focus on "Widespread vocabulary sources" https://github.com/IQSS/dataverse/issues/4772
19:09	bjonnh	yep
19:10	bjonnh	how is the docker integration going? I didn't look recently
19:11	pdurbin	Well, we have a new server at https://jenkins.dataverse.org
19:12	pdurbin	and this morning we started talking about spinning up docker images from Jenkins to run API tests: http://irclog.iq.harvard.edu/dataverse/2019-05-06#i_92469
19:13	pdurbin	bjonnh: is that the kind of integration you mean? There are other efforts to run Dataverse on Docker or Kubernetes in production.
19:19	bjonnh	that whole thing
19:19	bjonnh	glad to see it is going on
19:20	pdurbin	:)
19:20	pdurbin	bjonnh: did you mean for your NMR guide to be specific to Harvard Dataverse? I assume the draw is the free hosting.
19:29	bjonnh	we decided on using harvard because of the pledge
19:29	bjonnh	to keep the data available
19:29	bjonnh	the last thing we want is getting people to put data somewhere and the instance is put down, destroyed…
19:32	pdurbin	nice, is there a url for the pledge? I bet I can find it.
19:37		jri joined #dataverse
19:39	pdurbin	I found it. Someday we'll put it on a harvard.edu domain rather than a dataverse.org domain.
21:30		jri joined #dataverse
21:38		donsizemore joined #dataverse
21:40	donsizemore	@pdurbin for the API test suite, do I need a burrito?
21:42	donsizemore	@pdurbin with toasted coconut and pecan, if i had my preference
22:10		pdurbin_m joined #dataverse
22:31		jri joined #dataverse
23:32		jri joined #dataverse

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.