arrow_back목록으로

생성형 AI AI 코딩 성능 테스트 (Google Bard, Open ai ChatGPT)

박종영 2023년 6월 15일

생성형 AI

AI 코딩 성능 테스트

(Google Bard, Open ai ChatGPT)

테스트 프롬프트-1)time series data "timeseries.csv" 에서는 DateTime column과 Value column이 있다. 이 csv파일을 읽고 Anomaly detection 하는 코드로. 알고리즘은 ISOLATION FOREST사용하고 시각화 하는 코드

1) 구글 바드

물론 시각화 코드와 함께 CSV 파일을 읽고 격리 포리스트를 사용하여 변칙 검색을 수행하는 코드는 다음과 같습니다.

[ ]에서:

import pandas as pd

import matplotlib.pyplot as plt

# Read the CSV file

df = pd.read_csv("D:\\DataSet\\Aiot\\machine_temperature.csv")

# Create the date and value columns

df["date"] = pd.to_datetime(df["Datetime"])

df["value"] = df["Value"].astype(float)

# Create the isolation forest model

iforest = IsolationForest(n_estimators=100, max_features=2, contamination=0.1)

# Fit the model to the data

iforest.fit(df["value"])

# Predict the anomaly scores

y_pred = iforest.predict(df["value"])

# Identify the anomalies using the isolation forest model

anomalies = y_pred == -1

# Visualize the predictions

plt.plot(df["value"].values)

plt.plot(anomalies, "r")

plt.show()

2) 챗GPT

Isolation Forest는 트리 기반의 머신러닝 모델로, 데이터의 이상치를 찾는데 특화된 알고리즘입니다. 다음은 Isolation Forest를 사용하여 이상치를 검출하고 이를 시각화하는 파이썬 코드입니다. 이 코드는 pandas, matplotlib, 그리고 scikit-learn 라이브러리를 사용합니다.

코드는 Isolation Forest를 사용하여 'Value' 컬럼의 이상치를 찾습니다. contamination='auto'는 이상치의 비율이 자동으로 설정되도록 하는 옵션입니다. 이상치와 정상치는 그래프에 각각 빨간색과 파란색으로 표시됩니다. 이상치는 데이터에서 유독 동떨어진 부분을 잘 찾아낼 수 있습니다. 그러나 이상치 탐지는 매우 복잡하고 도메인에 따라 다르므로, 이 코드가 모든 시나리오에 적합하다고 할 수는 없습니다. 이 코드는 간단한 예시일 뿐이며, 실제 사용시에는 도메인 지식과 추가적인 데이터 전처리가 필요할 수 있습니다.

[ ]에서:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn.ensemble import IsolationForest

# 데이터 로드

df = pd.read_csv("D:\\DataSet\\Aiot\\machine_temperature.csv")

# DateTime 컬럼을 datetime 객체로 변환

df['Datetime'] = pd.to_datetime(df['Datetime'])

df = df.set_index('Datetime')

# Isolation Forest 모델 생성과 학습

model = IsolationForest(contamination='auto')

df['anomaly'] = model.fit_predict(df[['Value']])

# 이상치(-1로 표시)와 정상치(1로 표시) 분리

anomalies = df[df['anomaly'] == -1]

normal = df[df['anomaly'] == 1]

# 결과 시각화

plt.figure(figsize=(18,10))

plt.plot(normal.index, normal['Value'], color='blue', label='Normal')

plt.scatter(anomalies.index,anomalies['Value'], color='red', label='Anomaly')

plt.legend()

plt.title('Anomaly Detection using Isolation Forest')

plt.xlabel('DateTime')

plt.ylabel('Value')

plt.show()

테스트 프롬프트-2)

"time series data인 'timeseries_data.csv' 에서는 DateTime column과 Value column이 있다. csv파일 을 읽고 Anomaly detection 하는 RNN과 LSTM을 사용하는 코드와 시각화 코드""

1) 구글 바드

물론 시각화 코드와 함께 CSV 파일을 읽고 RNN 및 LSTM을 사용하여 이상 검색을 수행하는 코드는 다음과 같습니다. 이 코드는 CSV 파일을 읽고, 날짜 및 값 열을 만들고, 데이터를 학습 및 테스트 집합으로 분할하고, RNN 모델을 만들고, 모델을 학습 데이터에 맞추고, 테스트 데이터의 변칙을 예측하고, 변칙을 식별하고, 예측을 시각화합니다.

[20]에서:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

# Read the CSV file

df = pd.read_csv("D:\\DataSet\\Aiot\\machine_temperature.csv")

# Create the date and value columns

df["date"] = pd.to_datetime(df["Datetime"])

df["value"] = df["Value"].astype(float)

# Split the data into train and test sets

train_data = df.loc[:int(len(df) * 0.8)]

test_data = df.loc[int(len(df) * 0.8):]

# Create the RNN model

model = Sequential()

model.add(LSTM(128, input_shape=(1, train_data.shape[1])))

model.add(Dense(1, activation="sigmoid"))

model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

# Fit the model to the train data

model.fit(train_data, epochs=10, batch_size=32)

# Predict the anomaly scores for the test data

y_pred = model.predict(test_data)

# Identify the anomalies

anomalies = np.where(y_pred > 0.5)[0]

# Plot the data

plt.plot(train_data["value"], label="Train data")

plt.plot(test_data["value"], label="Test data")

plt.plot(anomalies, test_data.iloc[anomalies]["value"], "o", label="Anomalies")

plt.legend()

plt.show()

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

Cell In[20], line 23

20 model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

22 # Fit the model to the train data

---> 23 model.fit(train_data, epochs=10, batch_size=32)

25 # Predict the anomaly scores for the test data

26 y_pred = model.predict(test_data)

File D:\ProgramData\Anaconda3\envs\edu\lib\site-packages\keras\utils\traceback_utils.py:67, in filter_traceback.<locals>.error_handler(*args, **kwargs)

65 except Exception as e: # pylint: disable=broad-except

66 filtered_tb = _process_traceback_frames(e.__traceback__)

---> 67 raise e.with_traceback(filtered_tb) from None

68 finally:

69 del filtered_tb

File D:\ProgramData\Anaconda3\envs\edu\lib\site-packages\tensorflow\python\framework\constant_op.py:102, in convert_to_eager_tensor(value, ctx, dtype)

100 dtype = dtypes.as_dtype(dtype).as_datatype_enum

101 ctx.ensure_initialized()

--> 102 return ops.EagerTensor(value, ctx.device_name, dtype)

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).

ChatGPT)

먼저, Python의 pandas 라이브러리를 사용하여 csv 파일을 읽고 이를 이용하여 LSTM과 RNN 모델을 구현합니다. TensorFlow와 Keras를 사용하여 LSTM과 RNN을 생성하고 학습합니다. matplotlib를 사용하여 결과를 시각화합니다.

In [27]:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, LSTM, Dropout, SimpleRNN

from sklearn.preprocessing import MinMaxScaler

from sklearn.metrics import mean_squared_error

# 데이터 로드

df = pd.read_csv("D:\\DataSet\\Aiot\\machine_temperature.csv")

# "DateTime" column을 datetime 객체로 변환

df['Datetime'] = pd.to_datetime(df['Datetime'])

df = df.set_index('Datetime')

# 데이터 스케일링

scaler = MinMaxScaler(feature_range=(0, 1))

df['Value'] = scaler.fit_transform(np.array(df['Value']).reshape(-1,1))

# train-test split

train_size = int(len(df) * 0.7)

test_size = len(df) - train_size

train, test = df.iloc[0:train_size,:], df.iloc[train_size:len(df),:]

# 데이터셋 생성

def create_dataset(dataset, time_step=1):

dataX, dataY = [], []

for i in range(len(dataset)-time_step-1):

a = dataset[i:(i+time_step), 0]

dataX.append(a)

dataY.append(dataset[i + time_step, 0])

return np.array(dataX), np.array(dataY)

time_step = 100

X_train, y_train = create_dataset(train.values, time_step)

X_test, y_test = create_dataset(test.values, time_step)

# reshape input to be [samples, time steps, features] which is required for LSTM

X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)

X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)

# LSTM model

model_lstm = Sequential()

model_lstm.add(LSTM(50, return_sequences=True, input_shape=(time_step, 1)))

model_lstm.add(LSTM(50, return_sequences=True))

model_lstm.add(LSTM(50))

model_lstm.add(Dense(1))

model_lstm.compile(loss='mean_squared_error', optimizer='adam')

# training the model

model_lstm.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=50, batch_size=64, verbose=1) # was epochs=100

# 예측 수행

train_predict = model_lstm.predict(X_train)

test_predict = model_lstm.predict(X_test)

# 예측 결과 역스케일링

train_predict = scaler.inverse_transform(train_predict)

test_predict = scaler.inverse_transform(test_predict)

# plot 생성

plt.figure(figsize=(8,4))

plt.plot(df['Value'], label='True')

plt.plot(pd.Series(np.ravel(train_predict), index = df.index[:len(train_predict)]), label='Train Predict')

plt.plot(pd.Series(np.ravel(test_predict), index = df.index[len(train_predict)+time_step+1:len(df)-1]), label='Test Predict')

plt.legend()

plt.show()

Epoch 1/50

71/247 [=======>......................] - ETA: 4s - loss: 0.0299

---------------------------------------------------------------------------

KeyboardInterrupt Traceback (most recent call last)

Cell In[27], line 51

48 model_lstm.compile(loss='mean_squared_error', optimizer='adam')

50 # training the model

---> 51 model_lstm.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=50, batch_size=64, verbose=1) # was epochs=100

53 # 예측 수행

54 train_predict = model_lstm.predict(X_train)

File D:\ProgramData\Anaconda3\envs\edu\lib\site-packages\keras\utils\traceback_utils.py:64, in filter_traceback.<locals>.error_handler(*args, **kwargs)

62 filtered_tb = None

63 try:

---> 64 return fn(*args, **kwargs)

65 except Exception as e: # pylint: disable=broad-except

66 filtered_tb = _process_traceback_frames(e.__traceback__)

File D:\ProgramData\Anaconda3\envs\edu\lib\site-packages\keras\engine\training.py:1414, in Model.fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)

1412 logs = tmp_logs # No error, now safe to assign to logs.

1413 end_step = step + data_handler.step_increment

-> 1414 callbacks.on_train_batch_end(end_step, logs)

1415 if self.stop_training:

1416 break

File D:\ProgramData\Anaconda3\envs\edu\lib\site-packages\keras\callbacks.py:438, in CallbackList.on_train_batch_end(self, batch, logs)

431 """Calls the `on_train_batch_end` methods of its callbacks.

432

433 Args:

434 batch: Integer, index of batch within the current epoch.

435 logs: Dict. Aggregated metric results up until this batch.

436 """

437 if self._should_call_train_batch_hooks:

--> 438 self._call_batch_hook(ModeKeys.TRAIN, 'end', batch, logs=logs)

File D:\ProgramData\Anaconda3\envs\edu\lib\site-packages\keras\callbacks.py:297, in CallbackList._call_batch_hook(self, mode, hook, batch, logs)

295 self._call_batch_begin_hook(mode, batch, logs)

296 elif hook == 'end':

--> 297 self._call_batch_end_hook(mode, batch, logs)

298 else:

299 raise ValueError(

300 f'Unrecognized hook: {hook}. Expected values are ["begin", "end"]')

File D:\ProgramData\Anaconda3\envs\edu\lib\site-packages\keras\callbacks.py:318, in CallbackList._call_batch_end_hook(self, mode, batch, logs)

315 batch_time = time.time() - self._batch_start_time

316 self._batch_times.append(batch_time)

--> 318 self._call_batch_hook_helper(hook_name, batch, logs)

320 if len(self._batch_times) >= self._num_batches_for_timing_check:

321 end_hook_name = hook_name

File D:\ProgramData\Anaconda3\envs\edu\lib\site-packages\keras\callbacks.py:356, in CallbackList._call_batch_hook_helper(self, hook_name, batch, logs)

354 for callback in self.callbacks:

355 hook = getattr(callback, hook_name)

--> 356 hook(batch, logs)

358 if self._check_timing:

359 if hook_name not in self._hook_times:

File D:\ProgramData\Anaconda3\envs\edu\lib\site-packages\keras\callbacks.py:1034, in ProgbarLogger.on_train_batch_end(self, batch, logs)

1033 def on_train_batch_end(self, batch, logs=None):

-> 1034 self._batch_update_progbar(batch, logs)

File D:\ProgramData\Anaconda3\envs\edu\lib\site-packages\keras\callbacks.py:1107, in ProgbarLogger._batch_update_progbar(self, batch, logs)

1104 if self.verbose == 1:

1105 # Only block async when verbose = 1.

1106 logs = tf_utils.sync_to_numpy_or_python_type(logs)

-> 1107 self.progbar.update(self.seen, list(logs.items()), finalize=False)

File D:\ProgramData\Anaconda3\envs\edu\lib\site-packages\keras\utils\generic_utils.py:976, in Progbar.update(self, current, values, finalize)

973 info += '\n'

975 message += info

--> 976 io_utils.print_msg(message, line_break=False)

977 message = ''

979 elif self.verbose == 2:

File D:\ProgramData\Anaconda3\envs\edu\lib\site-packages\keras\utils\io_utils.py:78, in print_msg(message, line_break)

76 else:

77 sys.stdout.write(message)

---> 78 sys.stdout.flush()

79 else:

80 logging.info(message)

File D:\ProgramData\Anaconda3\envs\edu\lib\site-packages\ipykernel\iostream.py:480, in OutStream.flush(self)

478 self.pub_thread.schedule(evt.set)

479 # and give a timeout to avoid

--> 480 if not evt.wait(self.flush_timeout):

481 # write directly to __stderr__ instead of warning because

482 # if this is happening sys.stderr may be the problem.

483 print("IOStream.flush timed out", file=sys.__stderr__)

484 else:

File D:\ProgramData\Anaconda3\envs\edu\lib\threading.py:581, in Event.wait(self, timeout)

579 signaled = self._flag

580 if not signaled:

--> 581 signaled = self._cond.wait(timeout)

582 return signaled

File D:\ProgramData\Anaconda3\envs\edu\lib\threading.py:316, in Condition.wait(self, timeout)

314 else:

315 if timeout > 0:

--> 316 gotit = waiter.acquire(True, timeout)

317 else:

318 gotit = waiter.acquire(False)

KeyboardInterrupt:

[Error] Plot에서 Error가 난 것같아서 Error 코드를 주고 새로운 코드를 받음

Prompt) ValueError: Length of values (36281) does not match length of index (36381)

그래도 계속 Error가 남

In [28]:

# plot 생성

plt.figure(figsize=(8,4))

plt.plot(df['Value'], label='True')

plt.plot(df.index[time_step:len(train_predict)+time_step], train_predict.flatten(), label='Train Predict')

plt.plot(df.index[len(train_predict)+(2*time_step)+1:], test_predict.flatten(), label='Test Predict')

plt.legend()

plt.show()

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

Cell In[28], line 5

3 plt.plot(df['Value'], label='True')

4 plt.plot(df.index[time_step:len(train_predict)+time_step], train_predict.flatten(), label='Train Predict')

----> 5 plt.plot(df.index[len(train_predict)+(2*time_step)+1:], test_predict.flatten(), label='Test Predict')

6 plt.legend()

7 plt.show()

File D:\ProgramData\Anaconda3\envs\edu\lib\site-packages\matplotlib\pyplot.py:2812, in plot(scalex, scaley, data, *args, **kwargs)

2810 @_copy_docstring_and_deprecators(Axes.plot)

2811 def plot(*args, scalex=True, scaley=True, data=None, **kwargs):

-> 2812 return gca().plot(

2813 *args, scalex=scalex, scaley=scaley,

2814 **({"data": data} if data is not None else {}), **kwargs)

File D:\ProgramData\Anaconda3\envs\edu\lib\site-packages\matplotlib\axes\_axes.py:1688, in Axes.plot(self, scalex, scaley, data, *args, **kwargs)

1445 """

1446 Plot y versus x as lines and/or markers.

1447

(...)

1685 (``'green'``) or hex strings (``'#008000'``).

1686 """

1687 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D)

-> 1688 lines = [*self._get_lines(*args, data=data, **kwargs)]

1689 for line in lines:

1690 self.add_line(line)

File D:\ProgramData\Anaconda3\envs\edu\lib\site-packages\matplotlib\axes\_base.py:311, in _process_plot_var_args.__call__(self, data, *args, **kwargs)

309 this += args[0],

310 args = args[1:]

--> 311 yield from self._plot_args(

312 this, kwargs, ambiguous_fmt_datakey=ambiguous_fmt_datakey)

File D:\ProgramData\Anaconda3\envs\edu\lib\site-packages\matplotlib\axes\_base.py:504, in _process_plot_var_args._plot_args(self, tup, kwargs, return_kwargs, ambiguous_fmt_datakey)

501 self.axes.yaxis.update_units(y)

503 if x.shape[0] != y.shape[0]:

--> 504 raise ValueError(f"x and y must have same first dimension, but "

505 f"have shapes {x.shape} and {y.shape}")

506 if x.ndim > 2 or y.ndim > 2:

507 raise ValueError(f"x and y can be no greater than 2D, but have "

508 f"shapes {x.shape} and {y.shape}")

ValueError: x and y must have same first dimension, but have shapes (6709,) and (6708,)

[Error] 전체 Code를 주고 수정해 달라고 함

Prompt) 아래 코드에서 ValueError: Length of values (36281) does not match length of index (36381) 가 있음. 코드 수정해주세요

ChatGPT) 문제는 create_dataset 함수에서 데이터를 window로 잘라낼 때, 끝부분의 데이터가 잘려나가면서 발생합니다. create_dataset 함수는 항상 time_step+1 만큼의 데이터를 잃게 됩니다. 따라서, LSTM 모델로부터 예측된 값의 수는 원래의 시계열 데이터 수보다 time_step+1 만큼 적습니다. 이를 고려하여 plot 생성 부분의 index 설정을 다음과 같이 수정하면 문제를 해결할 수 있습니다.

위 코드에서 df.index[time_step:len(train_predict)+time_step]는 train 데이터 예측 결과의 인덱스를 조정하고, df.index[len(train_predict)+(2*time_step)+1:]는 test 데이터 예측 결과의 인덱스를 조정합니다. 이렇게 수정하면, 원래의 시계열 데이터와 예측 결과 데이터의 인덱스가 일치하게 되어 그래프를 정확하게 그릴 수 있습니다.

In [33]:

# plot 생성

plt.figure(figsize=(16,10))

#plt.plot(df['Value'], label='True')

# 변경: 예측 결과의 크기를 맞춰주기 위해, time_step+1 만큼의 인덱스를 건너뛰고 시작

plt.plot(df.index[time_step+1:len(train_predict)+time_step+1], train_predict.flatten(), label='Train Predict')

plt.plot(df.index[len(df)-len(test_predict):], test_predict.flatten(), label='Test Predict')

plt.legend()

plt.show()

텍스트, 스크린샷, 그래프, 라인이(가) 표시된 사진

자동 생성된 설명

# 최종 완성된 Code # Anomaly Detection Code using ChatGPT

텍스트, 스크린샷, 폰트, 노랑이(가) 표시된 사진

자동 생성된 설명

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, LSTM, Dropout, SimpleRNN

from sklearn.preprocessing import MinMaxScaler

from sklearn.metrics import mean_squared_error

# 데이터 로드

df = pd.read_csv("D:\\DataSet\\Aiot\\machine_temperature.csv")

# "DateTime" column을 datetime 객체로 변환

df['Datetime'] = pd.to_datetime(df['Datetime'])

df = df.set_index('Datetime')

# 데이터 스케일링

scaler = MinMaxScaler(feature_range=(0, 1))

df['Value'] = scaler.fit_transform(np.array(df['Value']).reshape(-1,1))

# train-test split

train_size = int(len(df) * 0.7)

test_size = len(df) - train_size

train, test = df.iloc[0:train_size,:], df.iloc[train_size:len(df),:]

# 데이터셋 생성

def create_dataset(dataset, time_step=1):

dataX, dataY = [], []

for i in range(len(dataset)-time_step-1):

a = dataset[i:(i+time_step), 0]

dataX.append(a)

dataY.append(dataset[i + time_step, 0])

return np.array(dataX), np.array(dataY)

time_step = 100

X_train, y_train = create_dataset(train.values, time_step)

X_test, y_test = create_dataset(test.values, time_step)

# reshape input to be [samples, time steps, features] which is required for LSTM

X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)

X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)

# LSTM model

model_lstm = Sequential()

model_lstm.add(LSTM(50, return_sequences=True, input_shape=(time_step, 1)))

model_lstm.add(LSTM(50, return_sequences=True))

model_lstm.add(LSTM(50))

model_lstm.add(Dense(1))

model_lstm.compile(loss='mean_squared_error', optimizer='adam')

# training the model

model_lstm.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=150, batch_size=64, verbose=1) # was epochs=100

# 예측 수행

train_predict = model_lstm.predict(X_train)

test_predict = model_lstm.predict(X_test)

# 예측 결과 역스케일링

train_predict = scaler.inverse_transform(train_predict)

test_predict = scaler.inverse_transform(test_predict)

# plot 생성

plt.figure(figsize=(18,10))

#plt.plot(df['Value'], label='True')

# 변경: 예측 결과의 크기를 맞춰주기 위해, 각각의 예측 결과 시작 지점을 time_step 지점으로 옮김

plt.plot(df.index[time_step:time_step+len(train_predict)], train_predict.flatten(), label='Train Predict')

plt.plot(df.index[len(train)+time_step+1:len(train)+time_step+1+len(test_predict)], test_predict.flatten(), label='Test Predict')

plt.legend()

plt.show()

Epoch 1/150

247/247 [==============================] - 13s 36ms/step - loss: 0.0118 - val_loss: 0.0010

Epoch 2/150

247/247 [==============================] - 8s 31ms/step - loss: 7.1250e-04 - val_loss: 7.8339e-04

Epoch 3/150

247/247 [==============================] - 8s 31ms/step - loss: 5.2907e-04 - val_loss: 6.6401e-04

Epoch 5/150

247/247 [==============================] - 8s 30ms/step - loss: 3.5331e-04 - val_loss: 4.3508e-04

Epoch 6/150

Epoch 148/150

247/247 [==============================] - 8s 31ms/step - loss: 8.4254e-05 - val_loss: 9.9607e-05

Epoch 149/150

247/247 [==============================] - 8s 31ms/step - loss: 8.5116e-05 - val_loss: 9.2239e-05

Epoch 150/150

247/247 [==============================] - 8s 31ms/step - loss: 8.2236e-05 - val_loss: 9.8458e-05

494/494 [==============================] - 6s 11ms/step

210/210 [==============================] - 2s 11ms/step

박종영 전문위원(AI연구회 회장) 명

데이터링크 주식회사 / 제조 AI

트랜스포머 이해하기

실시간 용접 검사 및 예방 딥러닝 시스템 (Welding AI)

0개의 댓글

로그인

로그인 이후 댓글 쓰기가 가능합니다.