IQSS logo

IRC log for #dataverse, 2019-08-23

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

All times shown according to UTC.

Time S Nick Message
06:48 jri joined #dataverse
08:18 jri joined #dataverse
08:49 jri joined #dataverse
09:24 Kamil84 joined #dataverse
11:53 pdurbin pmauduit: hi! Did you get a chance to see if Prometheus installed ok in Vagrant? :)
12:04 pmauduit I launched it, but my computer had an issue (unrelated to the process) and I had to reboot
12:04 pmauduit I did not check the state of the vagrant machine afterwards
12:14 donsizemore joined #dataverse
12:18 pdurbin pmauduit: ok, please keep me posted. :)
12:30 pdurbin donsizemore: morning. Do you think we should just merge that prometheus branch? You already verified it works, right? pmauduit and I are talking about adding Grafana next (in a subsequent pull request).
12:50 donsizemore @pdurbin i mean, it worked for me, i just wanted you to approve it
12:50 donsizemore @pdurbin and i welcome grafana
12:51 donsizemore @pdurbin (which i'll do if you want, but i've got a Dataverse UNC upgrade on Monday and a bunch of tests to verify ;) )
12:59 pdurbin donsizemore: I'll spin it up quick on EC2 and have a look around before merging.
12:59 donsizemore @pdurbin too late, i just merged it
12:59 pdurbin even better :)
12:59 donsizemore @pdurbin it's just a true/false in the group_vars, anyway.
13:00 pdurbin perfect
13:02 pdurbin donsizemore: do you know how you have blessed releases? Like 4.11 or whatever? I have bless group_vars that I'm maintaining as ec2config.yaml in https://github.com/IQSS/dataverse-sample-data :) . I'll try adding prometheus there. :)
13:04 jri joined #dataverse
13:08 pdurbin blessed*
13:08 pdurbin pmauduit: maybe once I have this EC2 instance spun up, you can help me get Grafana installed and configured with a basic dashboard. :)
13:12 donsizemore @pdurbin oh, i made a boo-boo yesterday. you have 147 tests beneath src/test/java/edu/harvard/iq/dataverse/
13:12 donsizemore @pdurbin so, i'll still iterate through them, but...
13:13 pdurbin Before you do that, would you be able to run just InReviewWorkflowIT to see if it passes?
13:15 donsizemore yeah, i was just making a 'short list'
13:15 pdurbin gotcha
13:15 pdurbin I'm psyched that you got smaller subsets of tests to pass. Great progress.
13:15 donsizemore i'll also archive each server.log to a separate web space for y'all's (isn't that a great contraction?) review
13:16 donsizemore i suppose it should be y'all's'
13:16 pdurbin heh
13:16 pdurbin Great! Would you be able to indicate the name of the job and the job run number? In the name of a folder or in the filename?
13:17 donsizemore i was going to create a Google Sheet to share
13:18 pdurbin Oh, sorry I meant "IQSS-Dataverse-Develop-testSubset/6/server.log" or whatever.
13:21 donsizemore yes i'm going to pull those over to a directory outside jenkins lest they get overwritten
13:21 pdurbin perfect
13:21 pdurbin the new spreadsheet is for your short list? or something else?
13:22 donsizemore um, i was going to start with my short list.
13:25 pdurbin That's fine. I'm just confused about what the spreadsheet would be for. I'm sure all will be revealed in time. :)
13:26 pdurbin TASK [dataverse : run sampledata] **** ... the server with prometheus (I hope) is almost up. :)
13:38 pdurbin Ok, the EC2 instance is up. I enabled both Munin and Prometheus. Munin is working fine and you can see the Munin graphs at http://ec2-3-81-53-52.compute-1.amazonaws.com/munin/localhost/localhost/index.html
13:42 pdurbin donsizemore: I'm seeing "punch prometheus firewall holes" in tasks/prometheus.yml. Do I need to poke some holes?
13:46 pdurbin http://localhost:9090/metrics shows me some prometheus stuff :)
13:46 pdurbin https://knowyourmeme.com/memes/i-have-no-idea-what-im-doing :)
13:46 pdurbin pmauduit: help! :)
13:48 donsizemore ^^ this was me once i stood it up
13:49 donsizemore there are JVM modules for Prometheus, but... I stopped at the initial install
13:53 pdurbin donsizemore: right. Hopefully pmauduit will explain all. :)
13:58 pmauduit I've not provisioned my vm yet !
13:58 pmauduit but now you should be able to add a prometheus datasource in grafana
13:59 pmauduit or first use the prometheus interface (I don't use it that often, but it should let you select a metrics from a dropdown list from what I remember)
14:00 pdurbin pmauduit: do you want me to give you a shell on this EC2 server?
14:01 pmauduit how long have you planned to keep it up ?
14:01 pdurbin I don't know. A day? A week? :)
14:02 pmauduit (i retried a vagrant provision, but the playbook does not converge and finish with an error)
14:02 pmauduit anyway I also got the prometheus interface
14:03 pmauduit from what I can see in my vagrant, there are no very interesting metrics gathered yet related to memory usage, so maybe next step will be to setup collectd (that should give "top-like" metrics)
14:05 pdurbin Not even free disk space?
14:05 pmauduit do you have access to your prometheus web ui ?
14:06 pdurbin well, if I do curl http://localhost:9090 I get <a href="/graph">Found</a>. Does that count? :)
14:06 pmauduit near the "execute" button, you can see all metrics known to prometheus, by default it seems focused on prometheus / go ones
14:07 pmauduit pdurbin: if you can ssh onto the EC2 machine, I'll suggest you to mount a socks tunnel (-D8123) and configure your browser accordingly
14:07 pmauduit using ssh
14:07 pmauduit this will allow you to use a regular web browser instead of curl ;)
14:07 pdurbin pmauduit: do you want to ssh into this EC2 machine? I could add your public keys from GitHub.
14:08 donsizemore @pmauduit i've never used prometheus before but i'm happy to implement whatever you suggest (preferably by github issue and possibly with light hand-holding)
14:09 donsizemore @pdurbin you can allow :9090 in the AWS settings for your VM; i thought i mapped ports in Vagrant
14:10 pmauduit pdurbin: ok I can have a look
14:12 pdurbin pmauduit: done. Please try centos@ec2-3-81-53-52.compute-1.amazonaws.com
14:13 pdurbin donsizemore: do you want in here too? If so, please shoot me your public ssh key (or upload it to GitHub). Instead of messing the the firewall I've been trying to ProxyPass to localhost:9090 but I haven't been able to get it working. :(
14:13 pmauduit pdurbin: I'm in
14:14 pdurbin nice, two lines from `who` now ;)
14:14 pmauduit with a french ip address with no reverse dns ;)
14:14 pdurbin heh, I trust you
14:14 donsizemore @pdurbin i'm retracing my steps for Monday's upgrade =) also logging our InReview failure
14:14 pdurbin pmauduit: can you please take a look at /etc/httpd/conf.d/http.proxy.conf ?
14:15 pdurbin donsizemore: no worries, if you free up you can just ping me :)
14:15 pdurbin pmauduit: I'm asking because I was hoping we could just proxy prometheus
14:16 pdurbin But I'm fine with whatever it takes to move forward. :)
14:16 pmauduit pdurbin: once configured, I don't know if having a hand on prometheus web ui is necessary
14:16 pmauduit but I can access it from firefox with a socks proxy
14:16 pdurbin already?
14:16 pdurbin with your tunnel thing?
14:16 donsizemore @pdurbin i'm interested, but... three things at a time =)
14:17 pmauduit yes, that's pretty simple: ssh -D8123 centos@ec2-3-81-53-52.compute-1.amazonaws.com
14:17 pdurbin donsizemore: we need to limit your Work In Progress :)
14:17 pmauduit then configure your firefox to use a socks proxy
14:18 pmauduit pdurbin: can I try to install collectd and configure the prometheus writer ?
14:18 pdurbin I've never configured firefox to use a socks proxy but I have the setting up.
14:18 pmauduit pdurbin: preferences > network settings
14:18 pdurbin pmauduit: sure! How about if I create a issue under datavese-ansible called collectd/grafana and you can copy and paste whatever commands you ran into comments. Sound ok?
14:19 pmauduit then manual proxy configuration / socks host: localhost port : 8123 / socks v5
14:19 pmauduit pdurbin: ok I'll try
14:19 pmauduit once you've your socks configuration in firefox, you can try to load ifconfig.io to make sure that your going outside via the aws instance
14:20 pmauduit Remote Host
14:20 pmauduit ec2-3-81-53-52.compute-1.amazonaws.com.
14:22 pdurbin pmauduit: I just created https://github.com/IQSS/dataverse-ansible/issues/99 . How does it look? :)
14:22 pmauduit I've no idea where goes the logs from collectd (the debian equivalent of /var/log/syslog)
14:26 pdurbin Maybe there are some clues in /etc/collectd.conf ?
14:27 pmauduit there is something about syslog
14:28 donsizemore if it's an RPM, try $ rpm -ql collectd
14:29 pdurbin "If no log plugin is loaded, collectd will write to STDERR." https://collectd.org/faq.shtml
14:31 pmauduit if my config worked, we should have a new server on port 9103
14:33 pmauduit does not seem to be the case, and I cannot find any module related to prometheus in the yum collectd setup
14:34 pdurbin pmauduit: I see "#<Plugin write_prometheus>" in /etc/collectd.conf . Does that help?
14:34 pmauduit yes, it means that it should be supported
14:34 pmauduit I put a file in the /etc/collectd.d/
14:34 pmauduit which does basically what is commented out
14:34 pdurbin ah, ok
14:34 pmauduit but, this should open a web interface to be scrapped by prometheus afterwards
14:34 pmauduit on port 9103
14:35 pdurbin Right, you already mentioned the file you created: https://github.com/IQSS/dataverse-ansible/issues/99#issuecomment-524335633
14:35 pmauduit but I cannot see the port coming up
14:35 pdurbin curl: (7) Failed connect to localhost:9103; Connection refused
14:36 pmauduit yes, I stopped the collectd service to be able to launch it by hand (keeping it in the foreground)
14:36 pdurbin ah, ok
14:36 pmauduit but no luck either, it does not produce more logs
14:37 pdurbin :(
14:39 pmauduit ok, journalctl -u collectd
14:40 pdurbin good
14:42 pdurbin donsizemore: judging from the server.log files links from the new spreadsheet ( https://docs.google.com/spreadsheets/d/1geJXE1Gv4iuoDtDUBItulVb3s145TSuHnWLuDVJms8g/edit?usp=sharing ), the 404 is because Dataverse isn't being deployed due to Flyway errors.
14:42 pdurbin pmauduit: sounds like you're still making progress. Go go go!
14:43 pmauduit pdurbin: the collectd.conf does mention the write_prometheus plugin, but it is actually not provided by the yum package
14:44 pdurbin oh
14:44 pmauduit it's in a separate yum package it seems
14:44 pmauduit [2019-08-23 14:44:43] plugin_load: plugin "write_prometheus" successfully loaded.
14:45 donsizemore @pdurbin Caused by: org.flywaydb.core.internal.command​.DbMigrate$FlywayMigrateException:  Migration V4.14.0.2__2043-split-gbr-table.sql failed
14:46 pdurbin pmauduit: it looks like you installed the collectd-write_prometheus-5.8.1-1.el7.x86_64 RPM. Cool.
14:46 pmauduit yup
14:47 pmauduit and the 9103 interface is working (I'm still with collectd in foreground though)
14:47 pmauduit no we have to reconfigure prometheus so that it scrapes this interface
14:47 pmauduit s/no/now/
14:47 pdurbin Ok, my convention is to make a copy of the file like this: cp -a foo.config foo.config.orig
14:48 pdurbin so I can diff it later
14:48 pdurbin donsizemore: yes, that flyway error :)
14:48 pmauduit ok
14:48 donsizemore @pdurbin bah. my fork wasn't current. starting over
14:49 pdurbin donsizemore: ok, did you catch that the entire api test suite completed ok on my Mac yesterday? I ran them against Dataverse running in docker-aio.
14:50 pmauduit pdurbin: hmmm ... having collectd in the foreground is ok in regards to the prometheus interface, but it fails to setup the plugin if launched via systemd
14:50 donsizemore @pdurbin yes
14:51 pdurbin pmauduit: yuck
14:51 pmauduit might be related to selinux or so
14:53 pdurbin Hmm, getenforce shows Enforcing. You are very welcome to turn off selinux if you want.
15:01 pmauduit pdurbin: any idea on how to do this ?
15:01 pdurbin setenforce
15:02 pdurbin `setenforce Permissive` should do it
15:02 pdurbin then you can check it with `getenforce`
15:02 pmauduit ok done
15:02 pmauduit and collectd correctly launched via systemd now
15:02 pdurbin nice!!
15:03 pdurbin I wrote about selinux at https://github.com/IQSS/dataverse/blob/v4.15.1/doc/sphinx-guides/source/developers/selinux.rst and in the future we can try to figure out how to get it working. Let's not worry about it now. :)
15:03 jri joined #dataverse
15:04 pmauduit pdurbin: do you know where is configured prometheus ?
15:04 pmauduit :)
15:04 pdurbin well, there should be some clues in https://github.com/IQSS/dataverse-ansible/pull/96/files
15:05 pdurbin I see --config.file=/usr/local/prometheus/prometheus.yml in /usr/lib/systemd/system/prometheus.service
15:06 pmauduit yup I found it also
15:06 donsizemore it's currently https://github.com/IQSS/dataverse-ansible/blob/master/files/prometheus.yml but it can be what you want it to be
15:07 pmauduit I'll try to find a config on our setups
15:07 pdurbin Not to distract anyone but I just noticed that this is already on the file system, so maybe we'll be able to monitor Solr too: /usr/local/solr/contrib/prometheus-ex​porter/conf/solr-exporter-config.xml
15:10 pmauduit pdurbin: found a config sample
15:10 pdurbin great!
15:10 pmauduit I just have to kill -HUP now
15:10 pmauduit and we should be good
15:13 jri_ joined #dataverse
15:17 pdurbin awesome
15:17 pmauduit ... it does not want to honour the scrape_config for collectd
15:20 pdurbin "scrape_config" is in a config file?
15:20 pmauduit yes
15:20 pmauduit there is a one by default (job_name: 'dataverse'), and I just added a one named collectd
15:22 * pdurbin runs diff /usr/local/prometheus/prometheus.yml.orig /usr/local/prometheus/prometheus.yml
15:22 pdurbin I might be a little turned around. This is all new to me.
15:23 pdurbin What is collectd collecting right now? :)
15:24 pmauduit pdurbin: you can have a look by curl'ing  http://localhost:9103/
15:25 pdurbin ok so collectd_cpu_total, collectd_memory, etc. thanks
15:25 pmauduit yup
15:25 pdurbin I guess I expected the data to go in /var/lib/collectd
15:26 pdurbin but it's empty. Is there a database or something?
15:26 pmauduit by default (under debian at least) it should generate rrd time series database in /var/lib/collectd IIRC
15:26 pmauduit but we don't need if we can plug it to prometheus
15:27 pmauduit (as prometheus will be our tsdb)
15:27 pdurbin oh, ok
15:27 pdurbin no need to store the data twice
15:27 pdurbin we'll just store it once in prometheus
15:27 pmauduit yup
15:27 pmauduit localhost:9090/config
15:27 pmauduit :(
15:28 pmauduit only one scrape_configs, where I defined 2
15:29 pdurbin really? I see 2
15:29 pdurbin - job_name: dataverse
15:29 pdurbin - job_name: collectd
15:30 pmauduit in the configuration file ? or via the web ui
15:31 pdurbin I mean curl http://localhost:9090/config | grep job_name
15:31 pmauduit if I tcpdump on the 9103  port I can see prometheus doing requests
15:32 pdurbin can we try a request against collectd with curl? or whatever? :)
15:34 pmauduit you can curl http://localhost:9103/ which is basically what collectd provides
15:34 pmauduit but even if it seems to be scraped by prometheus, if I lookup for example the "collectd_memory" in the prometheus UI, it cannot find any infos
15:35 pdurbin ok, I guess it's like looking at /proc :)
15:36 pmauduit maybe I'm missing some parameters
15:36 jri joined #dataverse
15:41 pdurbin in /etc/collectd.d/prometheus.conf or /usr/local/prometheus/prometheus.yml ?
15:41 pmauduit in the prometheus.yml
15:41 pmauduit I tried to stop it but it still alive, even if there are no processes Oo
15:42 pmauduit oh I know ...
15:42 pmauduit I'm still with my vagrant ...
15:42 pdurbin There's a conf.good.yml file linked from https://prometheus.io/docs/prometheus/latest/configuration/configuration/ if that helps :)
15:43 pmauduit i was not hitting the right service
15:44 pmauduit if you've got the ssh socks proxy running
15:44 pdurbin I don't. :(
15:44 pmauduit you should be able to load this page: http://ip-172-31-36-60:9090/graph?g0.range_input=1h&g0.expr=collectd_load_shortterm&g0.tab=0
15:44 pmauduit ok
15:44 pdurbin Can we expose that for everyone without socks? :)
15:45 pdurbin If I curl that URL I see "Prometheus Time Series Collection and Processing Server" :)
15:45 pmauduit then yes, the proxy file you mentioned
15:45 jri_ joined #dataverse
15:46 pdurbin I already made http.proxy.conf.orig. :) Do you want to try hacking on the original? :)
15:46 pdurbin er
15:46 pdurbin I mean hacking on the real file.
15:51 pmauduit I'm mismatching configuration for apache vs nginx configuration
15:55 pdurbin :)
15:56 pmauduit https://serverfault.com/questions/924238/prometheus-1-5-2-behind-apache-2-4-reverse-proxy
15:57 pmauduit pdurbin: I fear that some endpoints could be used by dataverse
15:57 pdurbin Oh? What are you worried about?
15:58 pmauduit it would be cleaner to tell prometheus that it should be reachable under /prometheus/
15:58 pdurbin I'm not very worried about clean right now. :)
15:59 pmauduit doesn't work either anyway
15:59 pdurbin T-T
16:00 pdurbin I don't know if this helps, but I was looking at this earlier: https://stackoverflow.com/questions/45914235/configure-apache-with-multiple-proxypass/45916572#45916572
16:01 pmauduit makes sense, but I think prometheus is sending redirects onto /
16:02 pmauduit ...
16:02 pmauduit prometheus is currently listening on all interfaces
16:02 pmauduit which means http://ec2-3-81-53-52.compute-1.amazonaws.com:9090/graph works already
16:03 pmauduit or it works because I'm reaching via my socks proxy
16:03 pmauduit yes, that's why :(
16:05 pdurbin :(
16:05 pmauduit anyway, can we reach the grafana instance now ?
16:05 pmauduit (we don't need to be able to browse prometheus directly, grafana can do it for us)
16:05 pdurbin Good point. So the next step is to install grafana?
16:07 pmauduit yes
16:08 pdurbin Isn't it getting late for you? :)
16:08 pmauduit it's 6PM here already, you're right
16:08 pdurbin Should we pick this up next week?
16:09 pmauduit If I find some time, sure :)
16:10 pdurbin great! thanks!
17:36 donsizemore @pdurbin DatasetsIT broketh by its lonesome https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-testSubset/16/console
17:38 pdurbin any flyway errors?
17:39 donsizemore no, those were a 'me' problem
17:39 donsizemore i started over
17:39 pdurbin phew
17:41 pdurbin I need to drive my kids to the airport to visit my parents. I'll be back online in an hour or two.
18:26 jcain joined #dataverse
18:26 jcain just wanted to check in and see how much space an institution using datavarse on Harvards service was permitted
19:33 jcain joined #dataverse
19:56 pdurbin whoops, missed them
19:57 pdurbin bjonnh might know :)
21:51 pdurbin ok, folks, I'm out, have a good weekend!
21:51 pdurbin left #dataverse

| Channels | #dataverse index | Today | | Search | Google Search | Plain-Text | plain, newest first | summary

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.