IRC log for #dataverse, 2015-07-22

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.

All times shown according to UTC.

Time	Nick	Message
15:40		axfelix joined #dataverse
16:57		donsizemore joined #dataverse
18:11	donsizemore	Hello! just curious if anyone is around and interested in troubleshooting pegged CPU?
18:16	pdurbin	donsizemore: well, I can try a bit. Is this for DVN 3.x?
18:17	donsizemore	correct - an Dataverse 4 upgrade is in our near future. and thank you for your help!
18:19	donsizemore	i've been poring through the domain server, access and export logs but don't see a particular culprit. Akio and I went down a few rabbit holes earlier this afternoon
18:22	pdurbin	donsizemore: I'm going to guess you don't want to simply restart glassfish
18:23	donsizemore	already done, happy to do it again
18:23	pdurbin	huh. and still pegged
18:24	pdurbin	what's the load from `uptime`?
18:24	donsizemore	we were getting a number of harvesting errors, but they had me disable the schedule for that
18:24	donsizemore	it's been hovering around 5 all day. i found one IP in the access log that had sent an enormous number of requests earlier but that stopped around 1230 EDT
18:25	pdurbin	5 isn't crazy high at least
18:29	pdurbin	donsizemore: I assume you have symptoms as well. site is slower than usual or whatever
18:29	donsizemore	this particular VM has a ton of resources thrown at it, so the machine (and Glassfish) are perfectly responsive. but the load is usually closer to 0
18:30	pdurbin	ok. responsive is good :)
18:31	donsizemore	we spent part of the morning trying to correlate export errors / nullpointer exception to bad data of any sort, but can't pin anything down
18:36	pdurbin	I wonder if that's related or not.
18:41	donsizemore	on launch the glassfish log gets a string of javax.enterprise.system.container.ejb.com.sun.ejb.containers\|_ThreadID=17;_ThreadName=Thread-2;\|EJB5184:A system exception occurred during an invocation on EJB StudyServiceBean, method: public void edu.harvard.iq.dvn.core.study.StudyServiceBean.exportStudy(java.lang.Long) errors, but once exporting finishes the CPU doesn't die down
18:42	pdurbin	donsizemore: so if you restart glassfish and don't try to export everything is fine? it's only when you restart glassfish and try an export that the CPU gets pegged?
18:43	donsizemore	it seems to be launching export on its own
18:44	pdurbin	huh
18:45	pdurbin	I assume you don't want that.
18:46	pdurbin	donsizemore: this might help: http://wiki.greptilian.com/java/glassfish/howto/purge-jms-queue/
18:47	pdurbin	if export is done via a JMS queue, which I'm not sure of
18:55	pdurbin	donsizemore: I'd start by trying to list anything in any JMS queues
18:58	donsizemore	DSBIngest Queue RUNNING 0 - 1 - 0 0 0 0.0
18:58	donsizemore	IndexMessage Queue RUNNING 0 - 1 - 5 0 5 366.0
18:58	donsizemore	mq.sys.dmq Queue RUNNING 0 - 0 - 0 0 0 0.0
19:00	pdurbin	so a count of 5 in the IndexMessage queue?
19:00	donsizemore	correct
19:00	pdurbin	is that expected?
19:11	donsizemore	Akio is checking documentation. if I try to view the jms/DSBIngest Connection Factory in the Glassfish console, I get SEVERE\|glassfish3.1.2\|org.glassfish.admingui\|_ThreadID=37;_ThreadName=Thread-2;\|RestResponse.getResponse() gives FAILURE. endpoint = 'https://localhost:4848/management/domain/resources/connector-connection-pool/jms%2FIndexMessage'; attrs = '{}'
19:12	pdurbin	ok
19:18	pdurbin	donsizemore: my only point is that when unexpected stuff starts happening when you start glassfish it may be because stuff is in the JMS queue.
19:21	donsizemore	point absolutely well taken - just to be sure, the JMS queue is safe to purge?
19:22	pdurbin	well, I would think you'd be able to restart indexing anyway
19:23	donsizemore	10-4 -- please forgive me for being squeamish; i'm new =)
19:24	pdurbin	probably good to be a bit squeamish
19:24	pdurbin	:)
19:24	pdurbin	donsizemore: please feel free to open a ticket for all this. I'm just trying to figure out why your CPU is pegged. supportdataverse.org
19:26	donsizemore	#224689 from this morning
19:26	donsizemore	the purge returned successful but i still have a consumer count of 1 in DSBIngest and 5 in IndexMessage - should I purge IndexMessage specifically?
19:27	pdurbin	huh. sounds like it didn't do anything
19:34	donsizemore	Lucene seems to be the CPU hog
19:37	pdurbin	donsizemore: yeah? how can you tell?
19:38	donsizemore	turned on threading in "top" (which I should've done in the first place but I'm learning)
19:38	donsizemore	then stracing those processes shows Lucene contending over write-locks
19:39	donsizemore	stat("/usr/DVN/lucene/index-dir/write.lock", {st_mode=S_IFREG\|0644, st_size=0, ...}) = 0
19:39	pdurbin	how do you turn on threading in top?
19:39	donsizemore	on Linux, with a capital H
19:39	donsizemore	i meant to look that up earlier and got sidetracked messing with jstat etc
19:40	donsizemore	i had also seen Lucene complaining about write locks timing out earlier, but thought that was a symptom (timeout) rather than a cause
19:40	pdurbin	in the ticket you opened I see this: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/usr/DVN/lucene/index-dir/write.lock
19:41	donsizemore	(actually, it may still be a symptom)
19:41	donsizemore	will stopping Glassfish and manually removing the lock cause things to go any wonkier than present?
19:42	pdurbin	not sure
19:43	pdurbin	sorry, I've spent more time on Dataverse 4 by now
19:46	donsizemore	it was Lucene's write lock. right in front of my (blushing) face
19:47	donsizemore	and we, too, hope to be on Dataverse 4 soon
19:48	pdurbin	donsizemore: you removed the lock and you're all set now?
19:51	donsizemore	things look wonderfully normal. i apologize for my many dumb questions but i'm learning new-to-me technologies, the Dataverse among them
19:51	pdurbin	no no, it's all good. good reminder of how useful strace is :)
20:13	* pdurbin	replies on https://help.hmdc.harvard.edu/Ticket/Display.html?id=224689
22:00		garnett joined #dataverse

Connect via chat.dataverse.org to discuss Dataverse (dataverse.org, an open source web application for sharing, citing, analyzing, and preserving research data) with users and developers.