Time
S
Nick
Message
03:33
jri joined #dataverse
05:10
candy` joined #dataverse
06:33
jri joined #dataverse
06:35
jri joined #dataverse
06:52
poikilotherm joined #dataverse
07:27
jri joined #dataverse
07:30
jri joined #dataverse
07:56
stefankasberger joined #dataverse
09:41
pdurbin
good morning
10:31
poikilotherm
Good morning :-)
10:35
pdurbin
This tweet makes me happy: https://twitter.com/JonathanBohan/status/1126216887254364161
10:36
pdurbin
It's a response to "there should be more library software that is developed by libraries" and Dataverse is given as a good example of this. :)
10:39
pdurbin
poikilotherm: oh and speaking of Twitter, you may or may not find this interesting since it's in German: https://twitter.com/katarinabarley/status/1125886048117112833 via https://op-co.de/tmp/messenger-regulation.html (which has a translation).
10:51
poikilotherm
One could debate about the motivation behind this and the lack of knowledge in using proper technical terms. I doubt her argument with data privacy. But in general, the idea behind a common standard and interfaces between those services is not that bad.
10:52
poikilotherm
Her comparison with cell phone providers is not very elegant, but kinda fits
10:52
poikilotherm
What makes me sad is all of those comments and tweets below... :-(
10:52
poikilotherm
Trolls all over
10:53
pdurbin
ok, focus on the library tweet instead then :)
10:53
poikilotherm
No debate but a lot of aggression and goofs
11:00
poikilotherm
Do you have a link to the original tweet?
11:00
pdurbin
https://twitter.com/librarythingtim/status/1126212346718953473
11:05
poikilotherm
Thx
11:08
pdurbin
poikilotherm: did you see Don talking about your k8s stuff here: http://irclog.iq.harvard.edu/dataverse/2019-05-08#i_92757 ?
11:08
poikilotherm
Nope, sr
11:08
poikilotherm
y
11:08
poikilotherm
Why do you think dv-k8s is complicated?
11:09
poikilotherm
It would be great to have some feedback
11:09
poikilotherm
I am all into things, so may be blind for problems
11:09
pdurbin
Heh. I just mean that there are more moving parts.
11:10
poikilotherm
More than in ...?
11:11
pdurbin
docker-aio
11:12
pdurbin
That said, I still think we should add your stuff to https://jenkins.dataverse.org some day. Whenever you're ready. :)
11:13
poikilotherm
I am not that sure about the docker-aio image having less moving parts than using k8s
11:14
poikilotherm
Scripting is almost the same
11:14
pdurbin
Scripts everywhere.
11:14
pdurbin
:)
11:14
pdurbin
lots of scripts
11:14
pdurbin
maybe you're right
11:15
poikilotherm
I would be glad to remove the scripts
11:15
poikilotherm
Lots of stuff could be done in other places
11:15
poikilotherm
But that would involve moving things to Payara 5
11:15
pdurbin
It's amazing how much Don is doing with Ansible.
11:16
poikilotherm
But he is also reusing existing setup scripts from the core
11:16
pdurbin
Some scripts, yes.
11:16
pdurbin
What's blocking us from moving to Payara 5?
11:17
poikilotherm
There seem to be non-trivial bugs
11:17
poikilotherm
MrK ran into those, too
11:17
poikilotherm
We chatted about those a few days ago
11:18
pdurbin
Right. Are the bugs too big for the community to work on?
11:19
poikilotherm
Dunno. Hard to tell. Currently I had been busy with other stuff and we need to get our service running. Time is running out on the project.
11:19
pdurbin
!
11:19
stefankasberger joined #dataverse
11:20
skasberger joined #dataverse
11:20
pdurbin
dataverse ansible now supports this... zipurl: http://dlc-cdn.sun.com/glassfish/4.1/release/glassfish-4.1.zip
11:20
pdurbin
I wonder if the first small chunk is the swap in payara 4
11:20
poikilotherm
http://irclog.iq.harvard.edu/dataverse/2019-04-30#i_92081
11:21
pdurbin
yes, I rememember :)
11:21
skasberger
so, now i am inside an irc client, not the web surface anymore. had continuous issues with disconnections.
11:21
pdurbin
skasberger: you are probably suffering from https://github.com/IQSS/chat.dataverse.org/issues/3 :(
11:23
poikilotherm
pdurbin: IMHO we should go for 5. IMHO it does not make sense to bet on a horse that is already on its way to the knackers
11:24
poikilotherm
In the near future, one needs to prepare for 11
11:24
poikilotherm
RHEL 8 was released just yesterday I think
11:25
pdurbin
with dataverse-ansible, you can now swap in the Java version too... and yeah, I'm looking forward to centos 8 coming out :)
11:25
poikilotherm
It offers switching between JDK 8 and 11
11:25
poikilotherm
https://developers.redhat.com/blog/2018/12/10/install-java-rhel8/
11:25
poikilotherm
You get it for free when using in containers :-)
11:26
pdurbin
Ok, so we'll continue to use `alternatives` to switch. As we document in our guides. Thanks. Nice post.
11:26
poikilotherm
https://www.redhat.com/en/blog/introducing-red-hat-universal-base-image
11:29
pdurbin
huh, UBI sounds interesting
11:29
pdurbin
skasberger: what do you think we should do? Did you take a look at that issue 3? Should we install TheLounge instead?
12:22
donsizemore joined #dataverse
12:33
donsizemore
@poikilotherm i am pleased to see that UBI still offers perl =)
12:37
pdurbin
I had to hack on some Perl the other day to make the most recent messages appear at http://chat.dataverse.org :) Well, didn't have to but I like the preview of messages there.
12:41
pdurbin
donsizemore: oh, I tried using sshkeys and it worked perfectly! Thank you! It pulled in https://github.com/pdurbin.keys just fine, I mean. Great stuff.
13:08
Richard_Valdivia joined #dataverse
13:13
Richard_Valdivia
pdurbin: Hi!! I made conversation with some people here and we are trying to contact the Oceanography staff. Next week I'll be in a conference about Archivematica. When we get back, I think we'll have the person's contact here to start the metadata conversation.
13:16
Richard_Valdivia
We like your idea to working with a metadata from marine biology and our expert is from Oceanography area. After that maybe I can work with metadata to project with "bones" :)
13:20
sivoais joined #dataverse
13:25
pdurbin
Richard_Valdivia: great! So you are willing to work on a custom metadata block that is useful for multiple installations of Dataverse first, such as oceanography? Before moving on to your specific "bones" project?
13:28
pdurbin
donsizemore: anything you need from me with regard to jenkins and the api test suite?
13:29
donsizemore
@pdurbin you got the same errors yesterday?
13:29
donsizemore
brb, coffee run
13:29
pdurbin
we're getting a whole variety of errors. :) please see https://github.com/IQSS/dataverse/issues/5827
13:30
pdurbin
and https://github.com/IQSS/dataverse/issues/5826
13:49
donsizemore joined #dataverse
13:50
donsizemore
@pdurbin yes i saw those, but i get past them (so kevin's supposition on timing sounds accurate)
14:11
pdurbin
ok, thanks
14:39
donsizemore
@pdurbin on 4.14 develop this morning i still make it to [ERROR] Tests run: 14, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 433.601 s <<< FAILURE! - in DatasetsIT [ERROR] testPrivateUrl Time elapsed: 315.83 s <<< FAILURE! junit.framework.AssertionFailedError: expected:<200> but was:<500> at edu.harvard.iq.dataverse.api.DatasetsIT.testPrivateUrl(DatasetsIT.java:943)
14:40
pdurbin
donsizemore: yuck. Can you please leave a comment on https://github.com/IQSS/dataverse/issues/5826 ?
14:40
donsizemore
by posting here i was half-asking it if was worth submitting my feedback ;)
14:48
pdurbin
absolutely worth submitting, thank you!
16:24
pdurbin
donsizemore: still there. We added some sleep statements. Can you please pull the latest and try again. :)
17:18
jri joined #dataverse
18:41
donsizemore joined #dataverse
18:41
donsizemore
@pdurbin i saw your sleep statements, and i ran again =)
18:42
donsizemore
@pdurbin though my terminal session timed out over lunch
18:43
pdurbin
I always forget to run stuff like that in screen.
18:43
pdurbin
And I've never tried tmux.
18:43
* pdurbin
hides
18:43
donsizemore
@pdurbin i've never had it run long enough that my laptop went to sleep ;)
18:44
pdurbin
maybe that's a good sign :)
18:44
pdurbin
getting farther
18:44
donsizemore
@pdurbin that was my next question. does the test suite merely log to terminal, or does maven write out something useful for jenkins?
18:44
pdurbin
maven writes out something useful for jenkins
18:45
pdurbin
but I don't understand how it works
18:45
pdurbin
early on I was saying stuff like "we could write the api test suite in python"
18:45
donsizemore
for the trends page, it reads the surefire subdirectory xml
18:45
pdurbin
but I couldn't figure out the magic of how the test results make it into jenkins
18:46
pdurbin
so we used java instead, rest assured
18:46
pdurbin
FooIT.java
18:47
donsizemore
it's writing them to target/surefire
18:47
pdurbin
some xml you say?
18:47
donsizemore
well, for the trends page. but so far there's a jar in surefire. i'll see what's there when the run completes
18:48
pdurbin
awesome
18:48
pdurbin
Meanwhile, I'm working on file type detection.
18:49
pdurbin
Unknown (1,383) in UNC Dataverse
18:49
pdurbin
Application (491) (probably application/octet-stream, which is the same as unknown)
18:50
pdurbin
What file types that are not being detected in your installation of Dataverse would you like to be detected?
18:50
pdurbin
andrewSC bjonnh bricas_ donsizemore juancorr pmauduit ^^
18:57
donsizemore
@pdurbin as of this morning i'm dying at [ERROR] Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 332.964 s <<< FAILURE! - in ConfirmEmailIT
18:57
donsizemore
@pdurbin (and by dying, i mean that everything after that fails)
18:58
pdurbin
sure, which commit please?
19:01
donsizemore
07c05b4 — errors start with "createRandomUser"
19:01
pdurbin
ok, so the very latest. hmm.
19:03
pdurbin
This is in a docker container running on https://jenkins.dataverse.org ?
19:03
donsizemore
on a test box of mine. 8 cores, 16GB RAM
19:05
pdurbin
ok, hmm
19:06
pdurbin
in a docker container on the test box?
19:06
donsizemore
i'm still poking. not much in server.log jumps out at me. in a docker container, yes. prep_it, run-test-suite
19:06
pdurbin
do you have a way to make the output of `mvn test` and server.log public?
19:07
donsizemore
sure thing. running mvn test now
19:12
donsizemore
@pdurbin http://tactus.irss.unc.edu/mvntest.zip
19:12
donsizemore
@pdurbin (though i don't mean to drag you away from file types)
19:12
pdurbin
well, do you know what all those "unknown" files are? :)
19:13
donsizemore
@pdurbin i posed your question to thu-mai and mandy, who responded: "all of them"
19:13
donsizemore
@pdurbin thu-mai suggests: "probably doc and docx --so that there would the possibility of prompting users to upload a pdf alternative."
19:14
pdurbin
doc and docx?
19:14
pdurbin
those aren't detected as word files?
19:14
donsizemore
i think that had more to do with preferred format than recognition
19:16
pdurbin
oh
19:16
pdurbin
I'm not talking about preferred formats
19:16
donsizemore
so... i only see three labelled "unknown" in datafile, and 0 with null contenttype
19:16
donsizemore
how do i grab a list of these problem files, and i could cook up say a histogram?
19:16
pdurbin
I'm talking about `file foo.bar` to find out what kind of file it is (but with java)
19:17
donsizemore
assuming the unix "file" utility is decently accurate with them
19:17
pdurbin
Unknown (1,383) https://dataverse.unc.edu/dataverse/unc?q=&fq1=fileTypeGroupFacet%3A%22Unknown%22&fq0=metadataSource%3A%22UNC+Dataverse%22&types=files&sort=dateSort&order=desc
19:19
pdurbin
donsizemore: I can shoot you an SQL query if you want.
19:20
pdurbin
Also, the mvntest.out you sent me ends with BUILD SUCCESS. I thought you were seeing failures.
19:20
pdurbin
and these seem to be the unit tests, not the rest assured tests
19:24
donsizemore
whoops, gimme another sec
19:24
donsizemore
and an SQL query would be great
19:27
donsizemore
p.s. now they're making their list. .yml, .slurm, .gwt, .gal, shape files
19:27
pdurbin
perfect
19:28
pdurbin
do you want the untested (by me) query? I found it our slack. I was thinking I'd test it quick.
19:29
donsizemore
@pdurbin .sas(?) and any code files. untested query is just fine, it's on a copy of our db =)
19:29
pdurbin
select regexp_matches(label,'\.[0-9a-z]{1,5}$') as ext, count(*) from datafile df, filemetadata fmd where df.id=fmd.datafile_id and label ~ '\.' and contenttype='application/octet-stream' group by ext having count(*) > 9 order by count(*) desc;
19:30
* pdurbin
drinks a slurm
19:30
pdurbin
I get ext | count (0 rows)
19:33
pdurbin
here's a more simple one that works better for my purposes:
19:33
pdurbin
select label,contenttype from datafile df, filemetadata fmd where df.id=fmd.datafile_id and contenttype='application/octet-stream';
19:34
pdurbin
README.md is unknown, for example
19:34
pdurbin
candy_trade.ipynb
19:34
pdurbin
fig1_happiness_of_individuals.py
19:34
pdurbin
etc
19:34
pdurbin
huh, Dataverse doesn't even detect python files? python is older than java
19:36
donsizemore
ooh, that one is nice.
19:37
donsizemore
i had made it to select dvobject.storageidentifier from dvobject inner join datafile on datafile.id=dvobject.id where datafile.contenttype='application/octet-stream'; but yours is more user-friendly
19:37
pdurbin
I'm not smart enough for the other one.
19:37
donsizemore
gimme a sec
19:38
donsizemore
also, mandy's official list: .yml .slurm .ado .shp .dbf .prj .sbn .sbx .shx .sas .gwt .gal
19:40
pdurbin
donsizemore: Mandy's list is gold. Thank you!
19:40
donsizemore
@pdurbin i got my histogram from our database. want it in e-mail?
19:41
pdurbin
sure, or you can attach it to https://github.com/IQSS/dataverse/issues/2202
19:42
donsizemore
our most common unknown filetype, by far, is .xz
19:43
pdurbin
but it's not on Mandy's list
19:44
* pdurbin
starts a "Don's list"
19:45
donsizemore
heh. i'll allow you to score mandy's list against the list from our DB
19:46
donsizemore
don't forget that UNC's holdings date back pre-DVN3 so there's lots of legacy data
19:46
pdurbin
right but what if we add an api to attempt to re-detect the file type? would you use it?
19:47
donsizemore
if our archivists got a useful list for editing they'd go nuts with it
19:47
donsizemore
auto-detection would be a home run but i know they'll want to correct
19:48
pdurbin
sounds like you want the api endpoint
19:49
donsizemore
for searching if nothing else.
19:53
pdurbin
so two api endpoints? one for finding the files marked as unknown? and another for attempting to re-detect the file type?
19:54
donsizemore
i'm imagining you can't automatically detect them all? so another to turn up 'application/octet-stream' could be handy
19:56
pdurbin
so three endpoints?
19:56
donsizemore
just two
20:00
donsizemore
@pdurbin updated http://tactus.irss.unc.edu/mvntest.zip
20:01
pdurbin
[ERROR] Tests run: 37, Failures: 19, Errors: 14, Skipped: 2
20:02
pdurbin
that's a lot more errors than we're seeing
20:02
pdurbin
or failures or whatever
20:02
donsizemore
i think this was a bad test — i re-ran the test suite without starting from scratch
20:02
donsizemore
something's making dataverse fall over at the createRandomUser step, then zilch.
20:02
donsizemore
tomorrow i'll nuke the entire source tree and start from scratch
20:03
pdurbin
ok
20:03
pdurbin
I was gonna say, are you following the readme in the docker-aio directory?
20:04
pdurbin
I run the two commands in the quickstart: https://github.com/IQSS/dataverse/blob/v4.13/conf/docker-aio/readme.md#quickstart
20:04
pdurbin
prep it bash
20:04
pdurbin
run test suite
20:04
pdurbin
I guess I should go run it again.
20:16
pdurbin
docker-aio takes a while
20:24
pdurbin
[INFO] BUILD SUCCESS
20:24
pdurbin
on 268a87cad
20:24
pdurbin
good
21:13
donsizemore joined #dataverse
21:15
donsizemore
@pdurbin that's exactly what i was doing, but... eh, i just nuked and clone the source... will try again
23:06
pdurbin
never be afraid to nuke Dataverse