Time
S
Nick
Message
07:36
jri joined #dataverse
12:11
donsizemore joined #dataverse
13:20
jri joined #dataverse
13:49
jri joined #dataverse
14:02
jri joined #dataverse
14:07
pdurbin joined #dataverse
14:08
jri_ joined #dataverse
14:08
jri joined #dataverse
14:21
pameyer joined #dataverse
14:25
jri joined #dataverse
14:52
pdurbin
bjonnh bricas donsizemore pameyer poikilotherm the community call starts in 10 minutes: https://dataverse.org/community-calls
14:53
poikilotherm
Sry not for me today... Outta here in 5
14:54
pdurbin
These calls used to be an hour earlier. Then UTC daylight savings happened or something.
14:54
jri joined #dataverse
16:39
pameyer
I got interrupted by other stuff; had been thinking to jump on
17:09
pdurbin
It was a pretty quick call. You didn't miss anything.
17:13
pameyer
I could've rambled about impedence mismatches between deaccessioning datasets and obscoleting software to fill time :)
17:45
pdurbin
And while you rambled, maybe I could have played some blues chords.
17:46
pameyer
:)
18:39
pdurbin
pameyer: somehow I think you'll appreciate this: "I get the sense that there are built-in tools for generating metadata for social science data that perhaps do not apply to, say, atmospheric chemistry data."
18:53
pameyer
that does sound a little farmiliar
18:54
pdurbin
I invited my friend who wrote that to join us here. We'll see what happens. :)
19:30
ntallen joined #dataverse
19:31
ntallen
Good afternoon!
19:32
ntallen
I saw a quote go by in the recent log
19:34
pdurbin
ntallen: hi! Remember IRC ? :)
19:34
ntallen
barely!
19:34
pdurbin
heh
19:35
ntallen
So I'm trying to bring my group up to speed on better data management -- in my spare time
19:35
pdurbin
😁 now with emojis
19:35
ntallen
Cool!
19:36
pdurbin
glad to hear it
19:36
ntallen
pdurbin: How far did you travel?
19:37
pdurbin
Not too far. Please tell us your data management woes.
19:38
ntallen
I'm sure you get this all the time with new folks: Dataverse is asesome for data you are ready to publish and keep available in perpetuity...
19:39
pdurbin
yeah
19:39
ntallen
Is there anything like it to facilitate management of new data that isn't ready for prime time
19:39
ntallen
Data that might reasonably expire, does not need a DOI, but needs to be stored safely, backed up, etc.
19:39
pdurbin
Well, sometimes we point people to integrations. Put your data in OSF, for example, and when you're ready, click the "publish in Dataverse" button.
19:40
donsizemore
@ntallen a group here at UNC hosts I-forget-how-many TB of data in Google Drive, then publishes the metadata in UNC Dataverse
19:40
pdurbin
OSF is listed here: http://guides.dataverse.org/en/5.0/admin/integrations.html#getting-data-in
19:41
ntallen
OK, interesting.
19:41
pdurbin
donsizemore: do you have an example dataset you can link to?
19:48
pdurbin
ntallen: another thought I have is that Dataverse has APIs for deposit. So you could keep your not-sure-I-want-to-publish-this data wherever and then selectively call into APIs to publish some of it. These APIs are what make integrations possible, of course.
19:50
ntallen
Right, that still leaves open the question of how to securely manage the data you just collected last night, and presumably every lab has to work out their own solutions independently.
19:50
pameyer
ntallen: one factor is how much data you're talking about
19:52
pameyer
it seems like the various electronic lab notebooks try to solve that (sometimes with institutional storage)
19:52
ntallen
pameyer: It varies from a few KB per run up to a couple GB per run, with runs on the order of hours.
19:53
ntallen
Yeah, I should take a closer look at those.
19:53
pdurbin
Where do they store the data after a run now? Dropbox? S3? Somebody's laptop?
19:54
ntallen
Lab computers for starters. When we're in the field, we backup to multiple external drives and transfer data back to Harvard
19:54
ntallen
We have used DropBox for transfer, but have not really embraced it for storage
19:54
pameyer
we've got a setup where a data collection facility puts metadata (including storage info) into a dataverse instance automatically. doesn't solve the storage/backup issue directly
19:58
ntallen
We currently have a home-grown method of validating a data set so we can verify copies separately. I would like to have a tool for tracking multiple copies and facilitating periodic verification, for example.
20:01
pdurbin
Dataverse verifies checksums (MD5, SHA, etc.) but that's about it.
20:02
pdurbin
Speaking of electronic lab notebooks, RSpace also integrates with Dataverse, but I'm not very familiar with its features or if it's a good fit.
20:04
ntallen
Our checks are not much more than that, but our datasets generally consist of lots of files. Probably repackaging the data into something like netcdf would be a good start.
20:06
pdurbin
You can upload any type of file to Dataverse so that should be fine.
20:06
pameyer
ntallen: if you've got checksums and datasets on >1 storage media, you're probably ahead of a lot of researchers (and it makes me feel like I wasn't the only paranoid one)
20:08
ntallen
Well we take data on NASA aircraft along with a large group of other researchers, and each flight is a one-off, so we cannot afford to lose it!
20:09
pameyer
:) I completely agree!
20:10
pameyer
sometimes you don't get a chance to re-collect a dataset, and it's good when folks treat their data that way
20:14
pdurbin
sounds like valuable data
20:16
ntallen
Thanks for the info-- I've got some reading to do, and also a call I've got to jump into!
20:16
ntallen left #dataverse
20:16
donsizemore joined #dataverse
20:16
pameyer
every so often at meetings, a speaker talks about the replacement cost or total cost of a dataset
20:16
donsizemore
@pdurbin dumb question?
20:18
pdurbin
donsizemore: hit me 😁
20:19
donsizemore
thu-mai wants me to kill that ingest job. stopping dataverse didn't clear the job status, as it resumed once i restarted glassfish
20:19
pdurbin
Yeah, there's a queue.
20:19
pdurbin
We should probably document it.
20:20
donsizemore
in the DB , or admin console, or?
20:21
pdurbin
donsizemore: this might help: http://wiki.greptilian.com/java/glassfish/howto/purge-jms-queue/
20:22
donsizemore
is the default password assumed to be as it is in your example?
20:23
pdurbin
I think so.
20:23
donsizemore
Error [A3161]: Failed to read password in passfile: java.io .FileNotFoundException: /dev/fd/63 (No such file or directory)
20:28
donsizemore
but if i query it directly it responds, just doesn't return the information in your example
20:28
pdurbin
Huh. Is that my bash trick not working? The <() thing?
20:28
donsizemore
glassfish4/mq/bin/imqcmd -u admin -passfile <(echo imq.imqcmd.password=admin) query dst -t q -n DataverseIngest
20:29
donsizemore
reports the queue and the broker, then hangs
20:29
donsizemore
(that's on the query command, i haven't tried to purge just yet)
20:29
pdurbin
do it, do it
20:30
donsizemore
you're telling me it won't wipe out my entire production dataverse?
20:30
pdurbin
!
20:30
pdurbin
try it on a test system
20:31
donsizemore
idle test system returns the query zippily
20:31
pdurbin
that's good
20:31
donsizemore
but the commands and output look good
20:33
donsizemore
so this will purge the queue, but will it stop running jobs?
20:33
pdurbin
I don't think so. You'll probably need to restart Glassfish again.
20:33
donsizemore
i had to kill -9 the child processes to get it exit before =(
20:33
pdurbin
boo
20:34
donsizemore
the purge command hasn't returned
20:34
pdurbin
Maybe we should add a feature to Dataverse to kill a running ingest job.
20:34
donsizemore
I told them several times Harvard imposed an ingest filesize limit of 150MB
20:34
pdurbin
but did they listen?
20:34
donsizemore
it returned! and it stopped the jobs!
20:35
donsizemore
and we're back!
20:35
pdurbin
Nice. Take the rest of the day off.
20:35
donsizemore
I will gratefully submit a doc issue and PR based off your commands (tomorrow)
20:35
pdurbin
you da man
20:36
donsizemore
hee hee Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
20:36
donsizemore
(the VM has 64GB of RAM and a heap of 36GB)
20:37
pdurbin
makes sense, sadly
20:39
pameyer
now the fact that I generated 30GB of error logs seems less significant
20:41
pdurbin
I'll read them when I can't sleep.
21:05
pdurbin left #dataverse
22:20
Bala joined #dataverse
22:21
Bala
Hi all, I have a quick question. Subscribing for DOI or having a handle.net service, which is cost effective ?