Register | Log-in

> General discussion

Machine captioning SRT testing

1 2 3 4 5
Re: Re: Re: Re: Machine captioning SRT testing
javabeanies
2021-07-06 13:21:14
Decent srt with sync n punctuation. Got it. Let me work something out over the wkend. Will be testing it out on 1st 5 min of Nima-007. Mio Kimishima deserves more recognition lol
Re: Re: Re: Re: Re: Machine captioning SRT testing
truc1979badge
2021-07-06 13:52:00
Hi, Ok, I reviewed this 1st minute and I needed 30 minutes for this... - I formated the dialogs, - added [[...]] for obvious mistakes (but I don't know what is really said), - added missing words (in english) As you can see, a lot of work... for someone with my very low level, it would be at least 2 weeks of full time job for an accuracy of 80%... [[ひ ひ ふ ん ええ うん ござい ます うん]] (no dialogs here, false positive) Mother: - I'm back. は あ, 暑い わ ね. Girl: - 帰り [[に うん]] (no dialogs, false positive) Mother: - ああ, また そんな なんか 食べ て Girls - まだ [[日本]] two pieces は ある Mother: - [[私 は]] だ から ね. 全く, 夏休み で 帰省し た と 思え ば, 家 で ごろごろ し て ばかり な ん だ から けど 良かっ た, あんた が 家 に い て くれ て Girl: - うん? Mother: - [[まあ]] Mama これから 祭 の 打ち合わせ が あっ て. あんた, 裏 の お 爺 ちゃん の ご飯 作り に 行っ て き な よ Girl: - なんで 私 が? Mother: - いいじゃ ない , どうせ [[ヒマ]] right now でしょ あの お 爺 ちゃん も 年 だ し [[え き]] (no dialogs, false positive) たまに [[様子見]] が
Re: Re: Re: Re: Re: Re: Machine captioning SRT testing
javabeanies
2021-07-06 14:39:37
Hmmm how do u find the quality of the transcription? Was the transcribing against the video accurate? Eg if 3 syllables was spoken, did the transcription capture those 3 syllables? This may mean a need to clean or boost the audio file for translation purposes or to lower the confidence threshold. Will try tat out If the transcription ain't good, it's unlikely the translation is worth the time.
Re: Re: Re: Re: Re: Re: Re: Machine captioning SRT testing
javabeanies
2021-07-06 17:06:32
Hey I've created a 3 transcription. possible to help me take a look? I am looking more at the quality of transcription and the translation that was most "workable" in your opinion. Do avoid translating ya. I am still undergoing quality transcription stage. https://bit.ly/3jPrJ78
Re: Re: Re: Re: Re: Re: Re: Re: Machine captioning SRT testing
swierszczyk69badge
2021-07-07 00:20:42
Wow, and I thought that creating English subtitles and then translating them into Polish is not a very easy thing, but good luck. Japanese movies don't turn me on.
: Re: Machine captioning SRT testing
javabeanies
2021-07-07 01:39:08
Yea translation is made way harder if it's not Ur 2nd Lang at least. Wat r u into? Mio Kimishima was wat got me started lkg for subtitles and the breadcrumb led here.
Re: : Re: Machine captioning SRT testing
truc1979badge
2021-07-07 09:48:56
@javabeanies: Actually, it's not bad at all. The main problem is my Japanese skills. Not so much words are missing. I noticed the "tadaima" at the beginning, one syllable before "帰り" to make "okaeri", a "まあ" rather than a "まま"... I'd say 50% of the errors I noticed are on words which have the same prononciation, or very close. For instance, 日本 and 二本 may have the same pronounciation, but the first mean "Japan" while the latter means "two pieces". If the whole sentence is well formated and if it has no other mistakes, and if one have the context and knows both words, it's easy to fix. But that's a lot of 'if'... I could correct 日本 by 二本, but I have no idea how to replace 様子見. In fact, I think you need a real Japanese speaker for proof reading, or someone with a quite high level. I'll have a look at your files. @swierszczyk69: Haha :) I like any kind of adult movie as long as it's hot and comedy-style!
Re: Re: : Re: Machine captioning SRT testing
javabeanies
2021-07-07 12:31:05
Hey any recommendation on good comedic titles? The contextual part for audio transcription can be improved thru the use of custom vocabulary. Eg the 2 pieces example? We can force the engine to use Japan if contextually this is more commonly used in jav. So jav transcription would carry a unique vocab set as compared to say one for sports Any good top 10 phrases that is commonly mis transcribed?
Re: Re: Re: : Re: Machine captioning SRT testing
truc1979badge
2021-07-08 08:58:57
That's a very interesting feature. Unfortunately, I have a lack of experience to tell you the "usual mistakes"... And about "Japan" and "2 pieces", I think both have the same probability to appear. You're looking for recommendations in JAV or in other categories? I don't easily remember the JAV ID, but I had fun watching IPX-412 ( https://www.avsubtitles.com/movie.php?movid=818 ). This is more a "funny situation" than actual comedy, because I think Japanese "real" comedy are a little weird for me. Same situation : PPPD-488 https://www.avsubtitles.com/movie.php?movid=678 MIMK-082 was fun too if I remember well: https://www.avsubtitles.com/movie.php?movid=2279 In US/Euro porn, from the top of my head I would say: - Haunted Nights (1993) https://www.avsubtitles.com/movie.php?movid=127 - Scooby Doo (2011) https://www.avsubtitles.com/movie.php?movid=136 - Operation Desert Stormy (2007) https://www.avsubtitles.com/movie.php?movid=1362 In softcore, Bikini Jones (2010) https://www.avsubtitles.com/movie.php?movid=111 Sorry, I still haven't watched your files...
Re: Re: Re: Re: : Re: Machine captioning SRT testing
javabeanies
2021-07-08 10:24:58
Ooh will check those titties out. I mean titles. Nah no rush. It was just something straightfwd for me to wanna hear the nima007 conversation tat led to this. Jus a fun side project since tech is more accessible these days.
1 2 3 4 5