UTAU Tutorial Part 2: Create your own UTAUloid

13 min read

Deviation Actions

World-of-Synthesis's avatar
Published:
10.1K Views
Previous part: fav.me/d78fbld

Why do I make a tutorial on how to make an UTAUloid before explaining the use of the program? Because:
- other people's UTAUloids don't have all the phonemes we would like to use
- many UTAUloids, even "official" ones, are not well tuned
- using your own voice will help to get confidence with the program
- by tuning your own voice you will learn to tune all the other ones



INDEX


What is an UTAUloid?
Name and design
The concept art
The avatar
CV phonemes
Creating the voicebank's folder
Recording the voicebank
Making an empty oto.ini file and a test ust
Tuning the oto.ini: the values
Tuning the oto.ini: the alias
Tuning the oto.ini: the offset
Tuning the oto.ini: the consonant
Tuning the oto.ini: the cutoff
Tuning the oto.ini: the preutterance
Tuning the oto.ini: the overlap
Tuning the oto.ini: checking the tunes
Tuning the oto.ini: breath sounds
Tuning the oto.ini: unplayed phonemes
The character file
The readme file





What is an UTAUloid?


An UTAUloid is a character and a voicebank for the program Vocal Synthesis Tool UTAU. Everybody can make his/her own one. To create a full UTAUloid, you have to record the voicebank and make the character.
A voicebank is a set of phonemes and sounds which are used to create the words of the song. There are different kinds of voicebanks: in this tutorial, I will only write about CV voicebanks.
CV means "consonant-vowel" and indicates all the phonemes which start with a consonant and end with a vowel. It is the easiest way of recording a voicebank. Some people thinks it is not much realistic, but it is possible to obtain great results also with this kind of recording.

Name and design


You can choose the name and the design you prefer for your UTAUloid.
Just two suggestions:
- Don't use a name from a language you don't know or, if you use it, check what it means before. Also, don't compose names using parts of words from a language you don't know: there is the possibility that you won't get the meaning you want.
- Some people don't appreciate the so-called "Miku formula", that is a design very similar to Miku Hatsune's. It is always better to be original.

The concept art


The concept art is a full body drawing of your UTAUloid. It is often added to the UTAUloid's voicebank so that people who download it can know how the character looks like.

The avatar


You can make an avatar picture for your UTAUloid, which will be shown in UTAU while using its voicebank. The avatar usually is the picture of the head of the character, it must be a bitmap picture of 100x100 pixels.

CV phonemes


UTAU can support almost any language, but it needs the proper phonemes to do it. Since the most used langauge, in UTAU, is Japanese, it is better to record at least the standard Japanese phonemes. They are:
a, e, i, o, u
ba, be, bi, bo, bu
bya, byo, byu
cha, chi, cho, chu
da, de, do
fu
ga, ge, gi, go, gu
gya, gyo, gyu
ha, he, hi, ho
hya, hyo, hyu
ja, ji, jo, ju
ka, ke, ki, ko, ku
kya, kyo, kyu
ma, me, mi, mo, mu
mya, myo, myu
n
na, ne, ni, no, nu
nya, nyo, nyu
pa, pe, pi, po, pu
pya, pyo, pyu
ra, re, ri, ro, ru
rya, ryo, ryu
sa, se, so, su
sha, shi, sho, shu
ta, te, to
tsu
wa, we, wi, wo
ya, yo, yu
za, ze, zo, zu
After these, you can record all the phonemes you want to include in your voicebank (for example, you can record syllables starting with " L "). Just make a list of all the ones you want, so that you can be sure to not forget any of them (however, it will be possible to record again anything you want).
You can find a complete reclist here: fav.me/d7qgqde

NOTES: In UTAU, "r" is considered to be "rest", so you can not name any phoneme as "r" because the program won't read it.
Using Latin letters or hiragana: if your voicebank has been named using ideograms, many computers will not read them even if it has romaji aliases, and it will usually need a manual convertion. If you want your voicebank to be used easily by anyone, it is much better if you name the phonemes using letters and add hiragana aliases then.
Also, don't use capital letters in the name of the phonems: it will be impossible to tune them.

Creating the voicebank's folder


Before starting to record your voicebank, you have to make the folder in which you will save it.
Open the folder in which you have installed UTAU, then the one named "voice". Inside it, create a new folder and name it with the name of your UTAUloid.

Recording the voicebank


To record the voicebank, you need a computer microphone and a recording program. An excellent, free recording program is Audacity; you can download it here: audacity.sourceforge.net/?lang… (from this point of the tutorial, I will write the recording instructions for Audacity).

After you have installed and opened the program, you can start to record. When you record your voice, be sure that there isn't a background noise.
To start a recording, you have to click on the red circle button (on Audacity). Then, pronounce or sing the phoneme you need. There are no limits, but it is better if it isn't too long or too short. Try to make it about 1-2 seconds long. When recording it, don't modulate the voice and don't change the note you are singing in that moment.
You should get something like this (I recorded the phoneme "ka"): oi62.tinypic.com/vr9h51.jpg
Now, select the phoneme. Leave some silence before and after the syllable (picture: oi62.tinypic.com/1e1c2p.jpg ). You can save it: click on "File", then on "Export selection". Save the recording as a WAV file, inside the folder of your UTAUloid, and give it the name of the phoneme you have recorded (in my case "ka"). Picture: oi59.tinypic.com/16nixj.jpg

Now that you have recorded the first phoneme, you have to record all the others, using always this method.

Making an empty oto.ini file and a test ust


A ust is an "UTAU Script File", a file which contains the UTAU song.
To tune your UTAUloid, you need a test ust which contains all the phonemes you have recorded. In order to do it, you'll need the Essential UTAU Toolkit ( fav.me/d78fbld ).
First of all, start UTAU and select your voicebank (go to "Project" ---> "Project properties"). Then, go to "Tools" ---> "Voicebank settings". Click on the button "Set".
With this step, you have generated an oto.ini file. It is the file in which are contained the information about the tune of your voicebank.
Now, close UTAU and go to the "TestUst" folder of the Essential UTAU Toolkit. Follow the instuctions in the readme file inside that folder.
After this, you'll have a test ust for your UTAUloid.

Tuning the oto.ini: the values


Now, you're ready to tune your voicebank. Open the test ust and choose your voicebank as singer. Open the "Voicebank settings".

In the "voicebank settings" there is a grid. It contains 6 main columns:
NAME: indicates the name of the phoneme
ALIAS: you have to set this manually. It indicates an alias for the phoneme, this means that, when in a region you write an alias, UTAU will read it as the main phoneme
OFFSET: indicates when the playing of the phoneme starts
CONSONANT: selectes the consonant part of the phoneme so that it won't be stretched
CUTOFF: indicates when the phoneme ends
PREUTTERANCE: indicates how much of the considered phoneme is anticipated and played before the actual start
OVERLAP: indicates how much of the precedent phoneme is palyed under the beginning of the one considered

You can set some values clicking on "launch editor" to set them graphically, or you can set some of them manually changing the values on the right and clicking on "Set" (this last option is used only for the alias and for small changes).
In the next steps, I will analyze every column of the oto.ini.

By knowing the following rules, you will be able to tune all the phonemes you'll need, not only the ones written here.

Tuning the oto.ini: the alias


Adding aliases in your voicebank is very useful, especially when you want it to be written in both hiragana and Latin letters (romaji). You can add the aliases as you prefer, but remember that every phoneme can have only an alias and that, if you write as an alias the same name of an actual pohoneme, none of the two will be played.

Tuning the oto.ini: the offset


The offset value is the "Top of Data" in the editor. You have to select, in blue, all the silence before the beginning of the phoneme.

Tuning the oto.ini: the consonant


The consonant value is used to select the whole consonant in the phoneme and begins soon after the "Top of Data". It is the pink section in the editor.
VOWEL SOUNDS: a, e, i, o, u and n (alone)
You have to select the beginning part of the vowel
CONSONANT SOUNDS: all the other ones
You have to select the whole consonant, the "y" part in phonemes like "kya", "rya", ecc., and part of the vowel.

Tuning the oto.ini: the cutoff


The cutoff is the value of the "End of Data" in the editor. You have to select in blue the whole silence after the phoneme and the part of the recording in which the phoneme is fading.

Tuning the oto.ini: the preutterance


The preutterance is the red line in the editor.
There are different kind of consonants and they have to be tuned in different ways.
VOWEL SOUNDS:  a, e, i, o, u and n (alone)
Set the preutterance at about 50 ms from the Top of Data.
LONG CONSONANTS:  f, h, m, n (before a vowel), s, sh
If they are too long, the phoneme will usually sound innatural. You can cut part of them by setting the "Top of Data" in the middle or near the end of the consonant. They must be no more long than 0.15 secs.
In these consonants, the preutterance must be set between the consonant and the vowel or, if they are long enough, in the middle of the consonant.
SHORT CONSONANTS: b, d, g, k, p and t
The preutterance must be set between the consonant and the vowel or, if the consonant is short enough, near to the beginning of the vowel.
MEDIUM LENGHT CONSONANTS: ch, j, r and z
They can be pronounced as long or short. Tune them following the proper rules depending on their lenght.
"Y" AND "W" CONSONANTS: they are considered middle lenght ones if they are not preceed by an other consonant. When there is an other consonant before them (like in "kya" or "rya") the preutterance value must be set considering "y" and "w" as a part of the vowel.
"TS" CONSONANT: if it is too long, you have to record the phoneme again. You cannot cut the consonant "s" without cutting the "t" part, so you need phonemes in which the "ts" sound lasts no more than 0.2 secs.

Tuning the oto.ini: the overlap


The overlap is the green line in the editor. It is usually setted before the preutterance, not much distant from it.
In long consonants, if the preutterance has been setted between the consonant and the vowel, you can also set it in the middle of the consonant.
In short consonants, if the preutterance has been setted after the beginning of the vowel, you can also set it between the consonant and the vowel.

Tuning the oto.ini: checking the tunes


After having tuned all the phonems, check them. Close the "voicebank settings" and, in the ust, select the region you want to play, then click on "start". You can delete from the sequence the phonemes which are well tuned, and you have to re-tune the other ones.

Tuning the oto.ini: breath sounds


If you want to add breath sounds to your voicebank, you'll have to tune them too. Select the offset and the cutoff as normal. You can select the consonant as you prefer. Set small values of preutterance and overlap.

Tuning the oto.ini: unplayed phonemes


Some phonemes could be not played from UTAU. In that case, open the Voicebank Settings, select the phoneme not played and click on "Edit freq. map.". On the top left, you have to write a number, usually between 150 and 300 (check the values of other phonemes which are played). Then, click on the second button from the top (the one before the last button) and click on "OK".
Now, you should have your voicebank completly tuned.

The character file


The "character" file is a text file in which there are the basical information of an UTAUloid. It is better to write it in English to make it understandable by everybody.
Open your voicebank's folder and create a new text file called "character".
Copy in it the following lines:
name=Suishou Suine
image=Suishou.bmp
author=
sample=
Then write
- the name of your UTAUloid after "name="
- the name of the avatar file after "image=", followed by .bmp
- your nickname, if you want, after "author="
- if you want, the name of the phoneme you want as sample after "sample=", followed by .wav
You can also write other information about the character, like its height, weight, age, personality, ecc.

The readme file


The "readme" file is a text file in which there are the technical information and usage clauses of an UTAUloid. It is better to write it in English to make it understandable by everybody.
Open your voicebank's folder and create a new text file called "readme".
You can write on it:
- some information about your UTAUloid, if you haven't written them on the "character" file
- credits for people who have helped you to make your UTAUloid (who have designed it, tuned it, ecc.)
- usage clauses of your UTAUloid
- anything you want



Now your UTAUloid should be finished. If you want to share it, you only have to add to an archive file your voicebank and upload it on a sharing website!
© 2014 - 2024 World-of-Synthesis
Comments9
Join the community to add your comment. Already a deviant? Log In